
(Via Piled Higher and Deeper)
{ 0 comments }
Posts tagged as:
A funny thing happened in the Bulgarian national lottery this week: the same numbers were drawn as last week.
The BBC and the AP both report the odds at 1 in 4 million; ABC Australia calls it 1 in 14 million. People are demanding that the Bulgarian lottery perform an investigation because no one can believe the result. Now, it’s not every day this sort of probability question comes up in the news, let’s take a second to walk through the problem.
The Bulgarian lottery format consists of 6 numbers drawn from a collection of 42, without replacement. Let’s start by considering the probability of observing any single combination of balls. You might begin like this: The first ball could be any of 42. Once it is chosen, the next ball could be any of the remaining 41. After that, the third ball has 40 possibilities, etc. The number of possible outcomes is therefore

However, this would consider 1 – 2 – 3 – 4 – 5 – 6 to be different from 6 – 5 – 4 – 3 – 2 – 1, which is wrong because order does not matter in the Bulgarian lottery. To figure out the correct number of combinations, we instead need to use the choose function, or the binomial coefficient.
The choose function
is pronounced as “n choose k” and yields the number of ways that samples of size k can be chosen from a population of size n, if order does not matter. This is exactly what we are looking for – how many combinations of 6 balls can be formed from a group of 42, irrespective of order? The answer is

So the chance of seeing any specific outcome in the lottery (or, put another way, the chance of winning the lottery) is 1 in 5.2mm.
What’s the probability of seeing the same outcome in two consecutive weeks? One’s first impulse might be to say it’s 1 in 5.2 million squared, or nearly 1 in 28 trillion. But, unsurprisingly, our first instinct is wrong. The chance of seeing a specific combination of balls (such as 1 – 2 – 3 – 4 – 5 – 6) in two consecutive weeks is indeed 1 in 28 trillion. However, the chance of seeing any combination of consecutive draws is… 1 in 5.2 million.
How so? Start with what we know: for a given outcome, the chance is 1 in 28 trillion of seeing it twice in a row. But there are 5.2 million possible outcomes, any of which could have a double header. Thus, the math for any outcome repeating is the 1 in 28 trillion chance of a repeat times the 5.2 million different outcomes, for a final likelihood of 1 in 5.2 million.
This holds true for any population with n choices – and is always one factor smaller than what the human brain naively believes. Consider a coin flip. It has two outcomes, heads and tails (n = 2). After f flips, there are
possible outcomes, and exactly n of those outcomes exhibit the same result in every flip. This is because the number of outcomes increases geometrically but the number of repeated items can never exceed the number of initial states. So, after two flips there are four outcomes (HH, TT, HT, TH) with two repeated results and after three flips there are eight outcomes (HHH, TTT, HHT, HTT, HTH, THT, THH, TTH) and still just two repeats. We may conclude that after any number of flips, the probability of seeing a repeated outcome is

After two coin flips, the probability is 2 in 4 or 50%; after three flips it is 2 in 8 or 25%. These simple cases extend nicely to the case where n = 5.2 million and f = 2, where it should be 5.2 million in 5.2 million squared. That simplifies back to 1 in 5.2 million.
I prefer to think of the problem like this: “Given a draw in the first week, what’s the probability of seeing that draw again?” In other words, conditional on the first week’s number, what’s the probability that the second week’s number is the same as the first? Since all combinations are equally likely, the first week’s numbers have only a 1 in 5.2 million chance of being drawn the second week as well. Some people may not like this logic because they feel it ignores the 5.2 million outcomes in the first week, but they actually are accounted for. By conditioning on the first week, we no longer need to consider all of its possibilities.
Let’s take it one step further. These calculations gives the probability of seeing consecutive draws in a two week span, but ignore the fact that this lottery is played every single week. In a year, that’s 51 chances at getting a consecutive draw – surely that improves the odds of observing this result! The probability of not drawing consecutive outcomes in any two weeks is

It is expressed as the number of non-consecutive outcomes divided by the total number of outcomes. Unsurprisingly, it is 5.2 million-less-one to 1 against. After 51 weeks, the probability of not seeing any consecutive outcomes is that number to the 51st power, or 0.9999903. Therefore, the probability of at least one repeat during the year is the complementary probability, or just more than 1 in 100 thousand. Take that out over a number of years and the odds of observing repeats continues to increase. Remember that even the most unlikely event has a fairly high chance of being observed if the outcome is run many times. If you asked the question, “What’s the probability of seeing consecutive draws at any point in the history of the Bulgarian lottery?” you might find the answer surprisingly high. It’s just the probability of this specific week in September 2009 being the repeat which is so low.
The most interesting thing to me is that no one won the lottery the first week but a record 18 people won the second week – it would appear that playing the previous week’s numbers is a strategy people follow. Since the numbers are independent, there’s nothing smart or foolish about this from a probabilistic standpoint, though if you employed a little psychology you would stay away from “popular” numbers in order to avoid sharing the pot if you won.
Anyway, I digress. I put the probability of this observed outcome at 1 in 5.2 million. I’m curious to know how the other news agencies calculated their odds; perhaps there’s some twist in the lottery I’m unaware of?
{ 4 comments }
My Google Reader was filled with a lot of headlines on Google’s new Fast Flip service this morning, but none of them amused me quite as much as Silicon Alley Insider’s confused monologue:
7:45 a.m.: Google’s Fast Flip Makes Reading Print Publications Online Easier
8:40 a.m.: Google FastFlip Is A Gigantic Step Backwards
10:29 a.m.: Google FastFlip Is Latest Attack On Amazon Kindle
12:25 a.m.: Google FastFlip Is Especially Useless On The iPhone
(Yes, SAI decided that “Fast Flip” wasn’t Web 2.0-ish enough and removed the offending space themselves.)
The most recent two were written by Dan Frommer; the previous two are by two other authors. The 8:40 piece is actually cross-posted from another blog. Wouldn’t it be nice if media publications had cohesive narratives?
The truth is, I’m most inclined to agree with the 8:40 post. I don’t think this release is a particularly big deal; it certainly doesn’t deserve the attention being lavished on it. At it’s core, it is simply a very fast way of serving up low-bandwidth screenshots of websites. It fits somewhere below Google News and Google Reader. It has some good search and browsing features but I find it a bit too dumbed-down and the previews take up too much space – I feel like I’m missing content because my screen is filled with pictures rather than information.
I know, the point of the service is to present news in a style more akin to scanning a physical page for interesting items – but in such a case, I’m searching within a single graphical style, not between styles as I must on Fast Flip (I sound like I’m teaching ANOVA!). This is a major difference. If I’m scanning a page of the NYT, or browsing a magazine, I’m looking for headlines that interest me; I have already internalized the design/layout of that publication and essentially ignore it (even if it’s what attracted me to that particular paper in the first place). Imagine if the NYT had a vastly different design on every page – it would be a nightmare trying to browse quickly for interesting content, since with each page turn I’d have to internalize a new layout.
Yet that is the metaphor Fast Flip currently embraces. I’d much rather see them reformat the text of every source onto a single template, and present the result as a giant uniform news service. This wouldn’t be so far off from the “expanded” view of my Google Reader. Also, it would let me truly browse for content rather than design. And let’s face it – once I move to any sort of aggregation, I’m effectively giving up on design anyway, since I’ve moved away from the host site entirely.
{ 0 comments }
Bubble 2.0 datapoint of the day: StreamBase has announced that their CEP (complex event processing) software for algorithmic trading now supports Twitter.
One CIO admits in an otherwise Hallelujah-esque article that “traders he has spoken to haven’t yet jumped onto the Twitter bandwagon.” But here’s the clincher (emphasis mine):
A key benefit of Twitter is that it forces everyone to a 140-character limit for single tweets, therefore providing traders with a fast snippet of information rather than them having to sift through pages and pages of research, he points out.
“There’s a thousand web sites out there driving you to research,” [the CIO] says. “We’re not going to spend thousands looking at that information. Blogs might not have enough credibility to put dollars behind them. But a tweet is different. The value of Twitter is that if we can accept it’s a rumor mill, that will tell us where the fear driving the herd is going.”
Investment professionals don’t have time to spend on “research reports” and “data”, and bloggers can’t be trusted. But 10-word anonymous outbursts from across the globe — how could that go wrong?
On top of all this, I’m left wondering what information could possible come from Twitter that needs to be analyzed in real time. I will buy that real sentiment can be extracted from the service -but over a period measured in hours and days, not nanoseconds. The example actually being used is that when a bomb goes off, Twitter reports it first because “no one cares about the spelling.” But financial news typically isn’t like a bomb, unexpected but tangibly observable; instead, it’s disseminated via press release or announcement, and covered by (embargoed) streaming news services well before Twitter picks it up.
But I’m getting way ahead of myself, since all StreamBase’s software does is scan Twitter – it doesn’t perform any sort of semantic analysis at all.
Thus far, Twitter’s biggest contribution to the financial world (other than Libor tweets) is StockTwits, a site which has succeeded in moving the mindless drivel of the Google Finance boards into Real Time. Is StreamBase’s development going to be the straw that forces every desk to mandate twitter clients for traders?
I’m not holding my breath.
{ 0 comments }
The WSJ is reporting today on Nassim Taleb and Mark Spitznagel’s new hyperinflation fund.
It’s basically the same story they reported two weeks ago when this news broke.
And it’s not a hyperinflation fund, anyway.
{ 0 comments }
Here is the front page of MarketWatch around noon today:
Click the image to zoom in on the two most popular stories of the day…
{ 0 comments }
Today Nigerian police arrested a goat on suspicion of armed robbery.
Yes.
It seems that two men were caught trying to steal a Mazda 323. One of them, clearly the brains of the group, ran away; the other used black magic to turn himself into a goat. Naturally, the police arrested the goat.
In fairness, the BBC presents a version of the story which is somewhat less sensational. In fact, the police appear considerably more adept:
“But of course goats can’t commit crime.”
But of course.
{ 0 comments }