Overcharting: airfare edition

November 28, 2009 in Data

Nate Silver writes about the dropping cost of air fares - yes, you read that correctly - over at Five Thirty Eight. His writing, as always, is excellent - I only want to point out a chart he uses and how it can be dangerous to draw conclusions at a glance (or, if you prefer, how similar charts can be used to mislead people).

Here's the chart in question, showing the cumulative percent change in inflation-adjusted air fares since 1995:

At a glance, the chart is convincing: fares are off about 15% since 1995. But how meaningful is that number?

The chart exhibits a very noisy pattern. Just a year ago, Nate could have written an article about fares being unchanged over more than a decade, and he could have noted a steady rise in price following 9/11! It should be clear that the point in time at which the measurement is made is extremely important.

Additionally, the reference or base year matters a lot as well, from a perceptual standpoint. If the y-axis were zeroed on 1996 or 2004, a very different chart would result. Sure, the shape would be the same, but the present chart is almost entirely in negative territory; a different base year would put more points in positive territory. This makes me wonder if 1995 wasn't just another spike like 1996, 2001, 2006 and 2008. I believe the dataset only goes back to 1995, so this is far from an accusation of cherrypicking data, but it's possible that a 1994 base would reveal a very different story - either higher or lower.

Finally, people frequently make the mistake with charts like these of observing the gap area (the grey vertical bars) and attributing meaning to it across its entire length. In this case, that means looking at the two lines and making a statement like "the top 25 airports continued to outpace the rest of the airports in the last decade." In reality, however, the two groups are almost exactly the same from 2003 to 2009. There is a one-time structural break following 9/11 and lasting about a year or two, during which time the top 25 markets experienced greater price drops than the rest. After that, the price changes are in lockstep. If both time series were zeroed on 2003, the lines would move in tandem following that date. I see this mistake frequently in interpreting the difference between two stocks - a divergence in prices, no matter how stable, always seems to imply a persistent difference even if the split was a one-time event.

My thoughts here have absolutely nothing to do with Nate's post - please read it as I haven't covered his reasoning at all - I merely want to take advantage of his graph to demonstrate these potential pitfalls. How's that for some Saturday afternoon reading material?

Leave a Comment

Previous post:

Next post: