September 17, 2009 in

A funny thing happened in the Bulgarian national lottery this week: the same numbers were drawn as last week.

The BBC and the AP both report the odds at 1 in 4 million; ABC Australia calls it 1 in 14 million. People are demanding that the Bulgarian lottery perform an investigation because no one can believe the result. Now, it's not every day this sort of probability question comes up in the news, let's take a second to walk through the problem.

The Bulgarian lottery format consists of 6 numbers drawn from a collection of 42, without replacement.  Let's start by considering the probability of observing any single combination of balls. You might begin like this: The first ball could be any of 42. Once it is chosen, the next ball could be any of the remaining 41. After that, the third ball has 40 possibilities, etc. The number of possible outcomes is therefore

$42\times41\times40\times39\times38\times37=3.8 \textrm{ billion.}$

However, this would consider 1 - 2 - 3 - 4 - 5 - 6 to be different from 6 - 5 - 4 - 3 - 2 - 1, which is wrong because order does not matter in the Bulgarian lottery. To figure out the correct number of combinations, we instead need to use the choose function, or the binomial coefficient.

The choose function $\binom{n}{k}$ is pronounced as "n choose k" and yields the number of ways that samples of size k can be chosen from a population of size n, if order does not matter. This is exactly what we are looking for - how many combinations of 6 balls can be formed from a group of 42, irrespective of order? The answer is

So the chance of seeing any specific outcome in the lottery (or, put another way, the chance of winning the lottery) is 1 in 5.2mm.

What's the probability of seeing the same outcome in two consecutive weeks? One's first impulse might be to say it's 1 in 5.2 million squared, or nearly 1 in 28 trillion. But, unsurprisingly, our first instinct is wrong. The chance of seeing a specific combination of balls (such as 1 - 2 - 3 - 4 - 5 - 6) in two consecutive weeks is indeed 1 in 28 trillion. However, the chance of seeing any combination of consecutive draws is... 1 in 5.2 million.

How so? Start with what we know: for a given outcome, the chance is 1 in 28 trillion of seeing it twice in a row. But there are 5.2 million possible outcomes, any of which could have a double header. Thus, the math for any outcome repeating is the 1 in 28 trillion chance of a repeat times the 5.2 million different outcomes, for a final likelihood of 1 in 5.2 million.

This holds true for any population with n choices - and is always one factor smaller than what the human brain naively believes. Consider a coin flip. It has two outcomes, heads and tails (n = 2). After f flips, there are $n^f$ possible outcomes, and exactly n of those outcomes exhibit the same result in every flip. This is because the number of outcomes increases geometrically but the number of repeated items can never exceed the number of initial states. So, after two flips there are four outcomes (HH, TT, HT, TH) with two repeated results and after three flips there are eight outcomes (HHH, TTT, HHT, HTT, HTH, THT, THH, TTH) and still just two repeats. We may conclude that after any number of flips, the probability of seeing a repeated outcome is

After two coin flips, the probability is 2 in 4 or 50%; after three flips it is 2 in 8 or 25%. These simple cases extend nicely to the case where n = 5.2 million and f = 2, where it should be 5.2 million in 5.2 million squared. That simplifies back to 1 in 5.2 million.

I prefer to think of the problem like this: "Given a draw in the first week, what's the probability of seeing that draw again?" In other words, conditional on the first week's number, what's the probability that the second week's number is the same as the first? Since all combinations are equally likely, the first week's numbers have only a 1 in 5.2 million chance of being drawn the second week as well. Some people may not like this logic because they feel it ignores the 5.2 million outcomes in the first week, but they actually are accounted for. By conditioning on the first week, we no longer need to consider all of its possibilities.

Let's take it one step further. These calculations gives the probability of seeing consecutive draws in a two week span, but ignore the fact that this lottery is played every single week. In a year, that's 51 chances at getting a consecutive draw - surely that improves the odds of observing this result! The probability of not drawing consecutive outcomes in any two weeks is

It is expressed as the number of non-consecutive outcomes divided by the total number of outcomes. Unsurprisingly, it is 5.2 million-less-one to 1 against. After 51 weeks, the probability of not seeing any consecutive outcomes is that number to the 51st power, or 0.9999903. Therefore, the probability of at least one repeat during the year is the complementary probability, or just more than 1 in 100 thousand. Take that out over a number of years and the odds of observing repeats continues to increase. Remember that even the most unlikely event has a fairly high chance of being observed if the outcome is run many times. If you asked the question, "What's the probability of seeing consecutive draws at any point in the history of the Bulgarian lottery?" you might find the answer surprisingly high. It's just the probability of this specific week in September 2009 being the repeat which is so low.

The most interesting thing to me is that no one won the lottery the first week but a record 18 people won the second week - it would appear that playing the previous week's numbers is a strategy people follow. Since the numbers are independent, there's nothing smart or foolish about this from a probabilistic standpoint, though if you employed a little psychology you would stay away from "popular" numbers in order to avoid sharing the pot if you won.

Anyway, I digress. I put the probability of this observed outcome at 1 in 5.2 million. I'm curious to know how the other news agencies calculated their odds; perhaps there's some twist in the lottery I'm unaware of?