# Critiquing the Crimson

June 9, 2009 in Data

The Harvard Crimson has published its annual senior survey, which is making headlines in part because very few seniors are going into finance. Selected results were presented in an interesting visualization (the image below links to a full size pdf):

Now that my brother has graduated after successfully steering the Crimson's business operations to one of their best years in memory (in the most challenging environment, no less!), I am led to abandon one of my favorite (psuedo-nostalgic) pastimes: Crimson-bashing. In parting, however, I have a few data-related observations:

The map in the upper left suffers from a common problem I discussed recently, in which one-dimensional information is distorted by a two-dimensional representation. In this case, just as with the critiqued JP Morgan piece, the circles are scaled by their radii rather than their areas (i.e. the square root of their radii). To see this illustrated, look at the "Northeast" circle and the "Boston" circle. Boston has less than three times the amount as the rest of the Northeast, but many more than three Northeast circles would fit inside the Boston one.

Because the circles are laid out on a map, the choice of a two-dimensional representation is more appropriate than if they were in isolation (in which case a simple pie chart would have sufficed). However, a better choice might have been height bars or any one-dimensional metric, as the circles cause confusion.

The line chart in the lower left shows the percentage of students going off to work, graduate school, and finance/consulting as a subset of work. The labeling of the latter category is unclear: it is noted as "Finance and consulting (of work)." This presumably means that the 20% figure in this category is really 20% of the 60% of working students. In this case, the chart's scale is misleading - for two lines, it is the percent of the student population, but for another line it is the percent of working students.

A scale should never have a double interpretation. Either two scales should be used, or a data series should be rescaled. For example, the finance category could have been rescaled to a percent of the entire student body, and the "work" line could have been reduced by the same amount and retitled "non-finance work". If the author wanted, there could still be a "total work" line. In this example, every line uses the same scale and avoids confusion.

I like the chart in the bottom center very much. I haven't seen an illustration like that before. The colors are a little confusing and I had to study it for a few minutes before I fully understood it, but once interpreted it provides a very interesting perspective.

The histogram in the upper right has its axis backward. The GPA numbers start at 4.0 and decrease to 2.9 as the axis extends. Presumably this was to orient the skew of the distribution with that of the chart below, or possibly to place the finite upper limit (4.0) at the origin. The latter is hardly a sufficient reason; the former could be a viable justification IF it were disclosed and made sense (which, in this case, it does not). A very odd design choice.

The third and fourth histograms seem out of place given that the rest of the visualizations are all about post-school plans. Could it be the designers ran out of space? Surely there was more that could have been done with the data! Then again, I guess I should never underestimate the power of a graph labeled "Number of sexual partners" to draw the reader's eye, despite how questionable an inclusion it may be.

Previous post:

Next post: