November 28, 2009 in Data
Nate Silver writes about the dropping cost of air fares – yes, you read that correctly – over at Five Thirty Eight. His writing, as always, is excellent – I only want to point out a chart he uses and how it can be dangerous to draw conclusions at a glance (or, if you prefer, how similar charts can be used to mislead people).
Here’s the chart in question, showing the cumulative percent change in inflation-adjusted air fares since 1995:

At a glance, the chart is convincing: fares are off about 15% since 1995. But how meaningful is that number?
The chart exhibits a very noisy pattern. Just a year ago, Nate could have written an article about fares being unchanged over more than a decade, and he could have noted a steady rise in price following 9/11! It should be clear that the point in time at which the measurement is made is extremely important.
Additionally, the reference or base year matters a lot as well, from a perceptual standpoint. If the y-axis were zeroed on 1996 or 2004, a very different chart would result. Sure, the shape would be the same, but the present chart is almost entirely in negative territory; a different base year would put more points in positive territory. This makes me wonder if 1995 wasn’t just another spike like 1996, 2001, 2006 and 2008. I believe the dataset only goes back to 1995, so this is far from an accusation of cherrypicking data, but it’s possible that a 1994 base would reveal a very different story – either higher or lower.
Finally, people frequently make the mistake with charts like these of observing the gap area (the grey vertical bars) and attributing meaning to it across its entire length. In this case, that means looking at the two lines and making a statement like “the top 25 airports continued to outpace the rest of the airports in the last decade.” In reality, however, the two groups are almost exactly the same from 2003 to 2009. There is a one-time structural break following 9/11 and lasting about a year or two, during which time the top 25 markets experienced greater price drops than the rest. After that, the price changes are in lockstep. If both time series were zeroed on 2003, the lines would move in tandem following that date. I see this mistake frequently in interpreting the difference between two stocks – a divergence in prices, no matter how stable, always seems to imply a persistent difference even if the split was a one-time event.
My thoughts here have absolutely nothing to do with Nate’s post – please read it as I haven’t covered his reasoning at all – I merely want to take advantage of his graph to demonstrate these potential pitfalls. How’s that for some Saturday afternoon reading material?
Google Insights recently rolled out a new feature: 12 month search forecasts. The forecast comes from a relatively simple decomposition of the search volume into trend, seasonal and residual components. The model’s out-of-sample performance is tested on the most recent 12 month period; if that prediction proves accurate, then the model is accepted. Here’s what it looks like when searching for “Google” (blue), “summer” (yellow), and “weather” (red):

The “Google” volume shows a clear macro trend with little seasonal impact; “summer” is of course the opposite. “Weather” proved too unpredictable and no forecast was generated.
A complete description of the methodology is available in this paper by Google (pdf link).
Click here for a live view of these trends.
And speaking of forecasts, I’m reminded today of one of my favorite forecasting errors: the echo. This morning, the manufacturing survey missed the forecasted amount, and many pundits commented that it contributed heavily to the market’s fall.
Here is a plot of the manufacturing survey level as reported each month in red (prior to any revisions, though there haven’t been any substantial revisions) and the forecasted level in green:

You can plainly see that the forecast in each month is more or less the reported level from the month before! This is what I call an “echo” forecast. Note that the echo is more pronounced since 2008. Prior to 2008, the forecast is a near-perfect echo which has been vertically scaled (more on that in a minute).
Recall that one of the hallmarks of a simple random walk is that its expected value at each step is the value of the previous step:
. Bearing that in mind, the forecasted values of the manufacturing survey are a random walk with respect to the survey itself!
Again, it blows my mind that these forecasts are taken seriously. I can do just as well as this “informed forecast” by using the previous month’s survey value as my naive forecast! As a rule, if the second derivative is negative, then the forecast will be too high. If the second derivative is positive, the forecast will be too low. In a sense, therefore, the data is self-perpetuating as long as the forecast is taken seriously, since good news will look better (beats estimates!) and bad news will look worse (misses estimates!). The fact that the echo is more pronounced since 2007 means that the forecast became more random right when (people though) it mattered most.
To be fair, the model isn’t a pure echo (or it wasn’t before 2007). Instead of taking the previous value as the forecast, it appears that an adjustment was made first. Wherever there was a large surprise (i.e. a reported value well under or over the forecast), then the new forecasted was adjusted in the opposite direction of the surprise. That big dip in 2005 should have been the next forecasted value, but because it was such an outlier, surveyed economists bet on mean reversion and adjusted their forecast upwards. Through the entirety of 2006-7, the forecast is equal to the previous value less about 50% of the forecast. You can quickly build a regression model that includes the surprise as a variable to test this if you think I’m laying too much of my opinion into this analysis. To me, this suggests that in 2008, economists lost faith in their ability to forecast this indicator to the point that they stopped adjusting their naive guesses – a terrifying prospect for anyone following their opinions.
Now for a more nerdy sidenote: one must be extremely careful of the echo problem when running time series regressions, because without careful controls the analysis will return misleading significance. It’s easy to see why if you consider the expectation equation I printed earlier – if the best guess of a random walk’s current value is its previous value, than a statistical model which simply uses the previous value will seem to give extremely good results when in fact it gives no extra information. The R2 in particular will be extremely high. I just generated 100 points from a Gaussian random walk and regressed them naively on their lags, and came up with an R2 of 98%. Of course, time series data can’t simply be regressed in the first place, but let this be an illustrative lesson.