Very amusing… and true:

I especially love “The HDR Hole.” Presumably the y-axis is measured in percent of personal potential… there must be all sorts of Bayesian self-reflection stuff going on there.
(Via DataViz)
{ 0 comments }
Posts tagged as:
Very amusing… and true:

I especially love “The HDR Hole.” Presumably the y-axis is measured in percent of personal potential… there must be all sorts of Bayesian self-reflection stuff going on there.
(Via DataViz)
{ 0 comments }
Alex Lundry, Vice President and Director of Research of the consulting firm Target Point, has published a brief talk called Chart Wars which is simply brilliant, serving as an excellent but brief (5 minutes!) overview of how easy it is to manipulate infographics and what tricks to be wary of. His specific focus is a chart (which was covered on TGR previously) whose designs – and it went through many iterations – were politically motivated. While there is no doubt about which charts are more clear, his implicit question – which charts are right? – resonates philosophically.
Here’s the video of his talk:
(Via Information Aesthetics)
{ 0 comments }
Nate Silver writes about the dropping cost of air fares – yes, you read that correctly – over at Five Thirty Eight. His writing, as always, is excellent – I only want to point out a chart he uses and how it can be dangerous to draw conclusions at a glance (or, if you prefer, how similar charts can be used to mislead people).
Here’s the chart in question, showing the cumulative percent change in inflation-adjusted air fares since 1995:

At a glance, the chart is convincing: fares are off about 15% since 1995. But how meaningful is that number?
The chart exhibits a very noisy pattern. Just a year ago, Nate could have written an article about fares being unchanged over more than a decade, and he could have noted a steady rise in price following 9/11! It should be clear that the point in time at which the measurement is made is extremely important.
Additionally, the reference or base year matters a lot as well, from a perceptual standpoint. If the y-axis were zeroed on 1996 or 2004, a very different chart would result. Sure, the shape would be the same, but the present chart is almost entirely in negative territory; a different base year would put more points in positive territory. This makes me wonder if 1995 wasn’t just another spike like 1996, 2001, 2006 and 2008. I believe the dataset only goes back to 1995, so this is far from an accusation of cherrypicking data, but it’s possible that a 1994 base would reveal a very different story – either higher or lower.
Finally, people frequently make the mistake with charts like these of observing the gap area (the grey vertical bars) and attributing meaning to it across its entire length. In this case, that means looking at the two lines and making a statement like “the top 25 airports continued to outpace the rest of the airports in the last decade.” In reality, however, the two groups are almost exactly the same from 2003 to 2009. There is a one-time structural break following 9/11 and lasting about a year or two, during which time the top 25 markets experienced greater price drops than the rest. After that, the price changes are in lockstep. If both time series were zeroed on 2003, the lines would move in tandem following that date. I see this mistake frequently in interpreting the difference between two stocks – a divergence in prices, no matter how stable, always seems to imply a persistent difference even if the split was a one-time event.
My thoughts here have absolutely nothing to do with Nate’s post – please read it as I haven’t covered his reasoning at all – I merely want to take advantage of his graph to demonstrate these potential pitfalls. How’s that for some Saturday afternoon reading material?
{ 0 comments }
I’ve previously covered the danger of attributing meaning to a forecast which is obviously based on little or no information. In that case, it was the manufacturing survey, which one might dismiss as a more obscure measure. Recently, however, Ken Houghton has written a pair of posts on inflation forecasts that bring me back to that argument.
In his first, he presents a study that seems to show that, indeed, inflation expectations tend to assume that the future will look just like the present:
Again, this does not surprise me, as the futre expectation of a random walk is its present value. In the second post, the time series of inflation vs expectations is presented:
With the additional dimension of time, I can see a simple heuristic for inflation expectations: consumers think that inflation will stay at roughly the same level that it is on any given day, with some slight reversion to the Fed target, unless inflation is currently below the target, in which case they think it will rapidly bounce back to – or above – that level.
You can see the inflationary spikes in 2006 echoed in the 2007 forecast; the sharp 2008 increase and subsequent fall are mirrored in the 2009 forecast for the time they remain above the target, at which point they halt their slide.
These charts tell me two things. First, that consumers have very little insight into future inflation levels, to the point that they are unwilling to even choose a simple number like 3% and prefer instead to say that the future level will be similar to today’s. Second, that consumers have blind faith in the Fed’s ability to keep inflation at or above its target level – even in the face of evidence against that power.
{ 0 comments }
The always-excellent How I Met Your Mother addresses a major social problem:
(via FlowingData)
{ 0 comments }
Finally, a radial visualization which serves a purpose rather than just looking cool. Getting Genetics Done has a tutorial on using clustering functions in R. In it, they show how this this analysis:
is much better represented like this:
There’s nothing wrong with making a chart which looks good – in fact it’s encouraged - so long as the visual niceties enhance the message of the graphic. Radial graphics are all the rage these days, but they rarely help with information communication (and in many cases they detract!). It’s nice to see a truly constructive application of the technique.
(via Revolutions)
{ 0 comments }
Datavisualization.ch has a helpful step-by-step on how to turn this (from a Mashable post):

into this:

Of course, the motivation is worth more than the mechanics.
{ 0 comments }
I spoke too soon – another post from ReadWriteWeb manages to frustrate yet again. In an article claiming that teenage use of Twitter is on the rise, they present this chart:

Let’s do what RWW did not and actually think about what this graph is showing. For each age group, their use of Twitter is plotted over time, relative to their use of the internet as a whole. In other words, this is a visualization of the relative composition of the Twitterverse. If all age groups used Twitter similarly to their overall internet consumption, then all the lines would be at 100.
I do find it amusing that RWW has a almost cliched “statistics can be misleading” section in its article, which fails to note the single most important caveat (unsurprisingly, given their misinterpretation of the chart): increased participation by any one age group must be offset by decreasing participation by another. So the rise in the “12-24″ line is equally and exactly offset by declines in the adult groups. Kind of a different headline, isn’t it: “Adults Abandon Twitter!” And yet, it’s based on the exact same information.
At this time we should note that just two days ago, the Times ran an article called “Who’s Driving Twitter’s Popularity? Not Teens.”
The key here is that we don’t know whether teens are using Twitter more or adults are using it less. All we know is that if you look at the Twitter userbase, teenagers form a greater percent of the community than they used to – even though the absolute number of teenage Twitterers could be static or even dropping (if adult use was falling off at a greater rate).
What’s much more interesting is that for the first time, teens are using Twitter disproportionately – they are a larger demographic of the Twitterverse than the internet generally. But again this gives us no context, and that fact could arise from their increased participation or adult accounts going stagnant.
It’s interesting and informative to note that young people are a steadily growing percentage of the Twitterverse. It is a mistake to make assumptions about their number from the graph, however.
I fully expect an article from RWW examining the “massive rise” in “2-11″ Tweets – who are these tweeting toddlers? What do they tweet about? And most importantly, how can your marketing strategy take advantage of this trend?
Update: I am not surprised to learn that this graph comes from Silicon Alley Insider’s Chart of the Day column. I cringe at the thought of that site’s influence.
{ 1 comment }
A post on Junk Charts sent me reading about Stevens’ power law, which supplies a quantification of a problem I’ve discussed before: the danger of representing single-dimensional data with two-dimensional graphics.
Stevens’ law measures the amount by which humans over- or under-perceive a stimulus, relative to its actual intensity. For example, the coefficient for “visual length” is 1, meaning that humans accurately gauge the true difference between lines of various lengths. However, the coefficient for “visual area” is just 0.7, meaning we underestimate differences in area by 30%!
This follows from the arguments laid out previously – area increases with the square of the one dimensional metric; therefore, as we look to that single measurement’s representation in a two-dimensional graph (say, the radius of a circle), we fail to account for the compounding effect of squaring it as it grows. This leads to an underperception of relative differences in area. Using a single-dimensional metric, like pure length in a bar chart, is much more appealing because our perception of variation will scale linearly with the actual measurements.
{ 0 comments }
House Minority Leader Boehner recently released this “infographic” (I use the term loosely) in order to demonstrate his frustration with the House Democrats’s heath proposal:
The chart really is an absolute nightmare: the colors, layout, and hidden connections contribute to an absolutely impossible-to-read image, which is exactly what Rep Boehner wants.
Recently, Robert Palmer, a graphic designer in California, took it upon himself to untangle this mess. Here is his version of the chart (click to zoom):
Now, neither chart makes a strong case for or against the policy itself; both attempt merely to show all the affected parties. But the fact that Robert Palmer was able to lay out an extraordinarily clear picture of all participants demonstrates that Rep. Boehner’s chart was intentionally obfuscated in order to mislead and confuse. The only other explanations are that whoever put it together a) didn’t understand the layout or b) didn’t understand how to present it. Ignorance, in this case, is not bliss.
When we are handed data or statistics, we have an enormous power to construct convincing arguments and clear presentations of otherwise complicated ideas. To abuse those tools (and the public’s faith in those tools) by using them to construct a bad analysis is a poor policy choice – not only is it easily falsifiable, but it erodes the ability to effectively communicate at all.
Lies, damn lies and statistics… the two charts above claim to show the exact same situation. Undoubtedly, there are many more graphics that could be constructed – are any of them actually “right”? Hard to say, but I feel that the first chart is “wrong” without question because it breaks every rule of effective design. The tax may well be a beaurocratic nightmare, as Rep. Boehner claims. And Palmer’s chart does not show a lack of bureaucracy, it merely lays out the connections clearly. But by constructing a graphic which willfully corrupts its own message, Rep. Boehner undermines his argument: if his chart shows a tangled mess but Palmer can untangle it, then the public will conclude that Boehner was wrong. He would have done better to have shown Palmer’s chart in the first place and claim that there are too many connections on it – that way any refute would live only in the realm of opinion, not demonstrable fact.
{ 0 comments }