# Overcharting

June 2, 2009 in Data

An article in Friday's NYT called "Let the Kid Be" was accompanied by this graphic:

I don't like this presentation because I think it is misleading. But first, a little history:

You may remember last year a JP Morgan chart was circulated which showed the deterioration of bank's market values. The old and new market values were represented by shaded circles. Unfortunately, JP Morgan chose to scale each circle's radius with regard to the market values, rather than the circles' areas, which made the decline appear much more dramatic than it actually was (since area is proportional to the square of the radius). The firm caught a lot of flak for this mistake and reissued the chart with the circles properly scaled.

It's a similar problem to that faced by the NYT's graphic artists. And here's the truth - the NYT chart is technically correct. The area of the inside boxes is in the proper proportion to the are of the grey outer boxes. But if I showed you the red box in the upper left, with no label, and asked you what percentage of the grey box's area it represented, would you say 58%? I would have said somewhere between 80% and 90%, and way overestimated the proportion.

The truth is, I feel that both the JP Morgan and NYT charts are very wrong. Each chart is attempting to communicate a 1-dimenional value (meaning it is just that: a singular value). Why, then, are they using 2-dimensional diagrams? I call it "over-charting" and I disagree with it. If any of the measured variables changed, these charts would change in a non-obvious (meaning non-linear) way, because they divide the change in that variable across two different dimensions. For example, if the number of people in the NYT's "Less" category went from 4% to 8%, then the area of the square would double. But it would do so by increasing each of its two dimensions - its length and width - by a factor of the square root of two. Hardly intuitive.

Both charts should display their values using one-dimensional metrics like line charts, bar charts, or pie charts. "Wait a second!" you cry, "Bar charts and pie charts are two dimensional, as they represent values with areas!" That's true, but they only contain quantitative data along one of their dimensions. The other dimension is purely for differentiating between qualitative variables. This means a change in the values will get reflected completely and linearly in only one dimension, not two. If we really want to start considering pie charts as two dimensional, than the NYT graphic above is surely three dimensional: it uses two dimensions (height and width) to display percentages, and a third dimension (space) to differentiate among the three categories.

So, glossing over the necessary dimensionality to differntitate between qualitative variables, a one-dimensional chart is one that only reflects quantitative changes in one dimension. An optional second dimension is purely for visual purposes, and may be disregarded as long as it is constant across all variables. Consider this representation of the NYT data:

Very simple and easy to understand, isn't it? Note that each of the bars is indeed a two-dimensional object. But they only contain quantitative information in one dimension: height. If more people were in the "Less" category, its bar would grow taller; if the number of  "More" responders decreased, its bar would get shorter. No bar ever gets wider, and so the graph updates in only one dimension.

A pie chart is very similar in construction:Again, the values are represented by areas which are visually two dimensional. But unlike the JP Morgan chart's circles, they only change in one dimension: angle. Critically, the change in angle linearly impacts the area reflected in the chart. 50% area is represented by an angle of 180 degrees. Want to show just a 25% area? Cut the angle in half to 90 degrees. Simple. As with the NYT example, the JP Morgan chart would have had to decrease its radius by the square root of 2 in order to reflect a halving in market value.

By now it should be becoming clear why the NYT chart is misleading - it distributes its values over more dimensions than necessary. It's a simple heuristic, and an important one in accurately conveying data: only use as many dimensions as you need. There's actually something called the Curse of Dimensionality and while it doesn't have much to do with this example, it's message rings clear: don't use more dimensions than you have to. Even if it looks cool.