Puzzling the Dow

August 14, 2011 in

After the severe market volatility this past week, a close friend forwarded me the following email that pointed out a curious pattern in the index changes:

Every day this week, the sum of the digits of the Dow's change have added up to 26:
Monday: -634.76 = 26
Tuesday: +429.92 = 26
Wednesday: -519.83= 26
The odds of this must be millions to one!

Needless to say: challenge accepted. I love this sort of problem, because the actual odds are often much more likely than they appear.

How do we go about getting an answer? First, it depends on how many discrete changes we consider. For example, if we examine all five-digit numbers (actually three digits and two decimals) there are 200,000 possibilities, of which exactly 10,560 have digits adding up to 26. That's a frequency of 5.28%, implying a 0.015% likelihood over three consecutive days -- hardly millions to one, more like 1 in seven thousand.

But we're talking about the Dow here, so looking at all five-digit changes isn't quite right -- nor is treating them all as equally likely. I need to make a couple simplifying assumptions now, or we'll be here all day, but my code is attached below so feel free to enter your own values later.

First, we need an appropriate set of changes to consider. Let's say the Dow is (was!) at 11,500, and has 40% annual volatility (I'm roughing the vol off the VIX). The daily volatility in points is therefore 40% * 11,500 / sqrt(250), or about 291. Let's consider any number within 5 standard deviations an eligable change (this assumption won't really matter, as you'll see), which gives us a range from -1455.00 to +1455.00 containing roughly 291,000 discrete possibilities.

Now that we have a range of values, we must keep in mind that they are not all equally likely. Larger changes are less probable than small ones. To keep things simple, let's toss a normal distribution over the range, centered on zero, with a standard deviation equal to the daily volatility of 291. The probability of drawing any point x is therefore P(x) - P(x-0.01), where P() is the CDF of our normal distribution. This is a discrete approximation to the normal PDF, binned by each hundreth decimal place (which we assume is the smallest possible change on the Dow).

Ok, we're ready for some calculations. First, there are 14,622 changes in our range whose digits add up to 26. Given that we have 291,000 possibilities, that's a frequency of about 5.02%. Now, for each of those changes, what's the probability of actually seeing it? For this we calculate the sum of the probabilities of each change under our normal distribution. The probability-weighted frequency is 3.83%. Why has the frequency gone down? Because it is easier for digits to add up to 26 with larger numbers (as they have more digits), but changes of those magnitudes are less likely, so on balance the likelihood of seeing a day that adds up to 26 goes down. This is also why the choice of standard deviations doesn't matter, so long as it is sufficiently high: after a certain number of standard deviations (I generally use 5 as a rule of thumb), the probability goes to zero.

So for a given day, with the Dow at 11,500 and 40% volatility, the likelihood of seeing a change whose digits add up to 26 is 3.83%. On three consecutive days, the probability is 0.0056%, or about 1 in 17,000. It's not quite millions to one, but it is interesting nonetheless.

Update: After Pat's comment below, I quickly ran the probability of seeing any number three times in a row, not just 26. It comes out to a very surprising 2.5%! I've updated the code to include this calculation as well.

Below is a brief python script which performs all of these calculations. The `digit_sum` function can be dramatically sped up using the @memoize decorator, but you'll only see a benefit if you plan on running it more than once. The decorator is commented here so that the script can be executed as-is, but feel free to uncomment it for interactive use. (Specifically, I put the memoize class at the end for clarity -- you need to declare that class first, and then the memoized function. It won't work if the class isn't declared before the function.)

```import numpy as np
import scipy.stats as stats

#uncomment the decorator in interactive mode,
#but first declarethe memoize class below

#@memoize
def digit_sum(n):
#returns the sum of a number's digits
#ignoring negative signs and decimals
return sum(map(int, str(np.abs(n)).replace('.','')))

def discrete_prob(x, mean = 0, std = 1, diff = .01):
#computes the probability of the interval [x-diff, x]
#under a normal distribution
prob = stats.norm.cdf(x, loc = mean, scale = std)
prob -= stats.norm.cdf(x - diff, loc = mean, scale = std)
return prob.sum()

#assumptions: dow at 11,500 with 40% annual vol,
#looking for days that add up to 26 with 2 decimal places
volatility_in_pts = 0.40 * 11500 / sqrt(250)
critical_value = 26
decimals = 2

#consider 5 standard deviations for completeness
low_change = -1 * round(5 * volatility_in_pts, decimals)
high_change = round(5 * volatility_in_pts, decimals)
#have to round all_changes for numerical issues
all_changes = np.around(np.arange(
low_change, high_change, 10 ** -decimals), 2)

#find changes whose digits add up to the critical value (26)
possible_changes = np.array(1)

#compute probability of seeing any one of those changes
one_day_prob = discrete_prob(possible_changes,
mean = 0, std = volatility_in_pts)

#compute probability of three consecutive days
three_day_prob = one_day_prob ** 3
print three_day_prob

#disregard this class unless you are running in
#interactive mode and want to use the @memoize decorator
class memoize:
def __init__(self, function):
self.function = function
self.memoized = {}

def __call__(self, *args):
try:
return self.memoized[args]
except KeyError:
self.memoized[args] = self.function(*args)
return self.memoized[args]

```