Random forecasts (with echoes!)

June 15, 2009 in

And speaking of forecasts, I'm reminded today of one of my favorite forecasting errors: the echo. This morning, the manufacturing survey missed the forecasted amount, and many pundits commented that it contributed heavily to the market's fall.

Here is a plot of the manufacturing survey level as reported each month in red (prior to any revisions, though there haven't been any substantial revisions) and the forecasted level in green:

You can plainly see that the forecast in each month is more or less the reported level from the month before! This is what I call an "echo" forecast. Note that the echo is more pronounced since 2008. Prior to 2008, the forecast is a near-perfect echo which has been vertically scaled (more on that in a minute).

Recall that one of the hallmarks of a simple random walk is that its expected value at each step is the value of the previous step: $E(X_t) = X_{t-1}$. Bearing that in mind, the forecasted values of the manufacturing survey are a random walk with respect to the survey itself!

Again, it blows my mind that these forecasts are taken seriously. I can do just as well as this "informed forecast" by using the previous month's survey value as my naive forecast! As a rule, if the second derivative is negative, then the forecast will be too high. If the second derivative is positive, the forecast will be too low. In a sense, therefore, the data is self-perpetuating as long as the forecast is taken seriously, since good news will look better (beats estimates!) and bad news will look worse (misses estimates!). The fact that the echo is more pronounced since 2007 means that the forecast became more random right when (people though) it mattered most.

To be fair, the model isn't a pure echo (or it wasn't before 2007). Instead of taking the previous value as the forecast, it appears that an adjustment was made first. Wherever there was a large surprise (i.e. a reported value well under or over the forecast), then the new forecasted was adjusted in the opposite direction of the surprise. That big dip in 2005 should have been the next forecasted value, but because it was such an outlier, surveyed economists bet on mean reversion and adjusted their forecast upwards. Through the entirety of 2006-7, the forecast is equal to the previous value less about 50% of the forecast. You can quickly build a regression model that includes the surprise as a variable to test this if you think I'm laying too much of my opinion into this analysis. To me, this suggests that in 2008, economists lost faith in their ability to forecast this indicator to the point that they stopped adjusting their naive guesses - a terrifying prospect for anyone following their opinions.

Now for a more nerdy sidenote: one must be extremely careful of the echo problem when running time series regressions, because without careful controls the analysis will return misleading significance. It's easy to see why if you consider the expectation equation I printed earlier - if the best guess of a random walk's current value is its previous value, than a statistical model which simply uses the previous value will seem to give extremely good results when in fact it gives no extra information. The R2 in particular will be extremely high. I just generated 100 points from a Gaussian random walk and regressed them naively on their lags, and came up with an R2 of 98%. Of course, time series data can't simply be regressed in the first place, but let this be an illustrative lesson.

Previous post: