May 24, 2010 in

The NYT is running a great article about the influx of data in today's world. The prime argument borrows from Einstein's quote, "Not everything that can be counted counts, and not everything that counts can be counted."

I think this speaks volumes and should be heeded by the sites that persist in churning out infographics that do little to educate (or illustrate) about anything, except maybe how easy it is to draw monochromatic pie charts. A notable (and humorous) exception may be seen here.

One of the article's most salient points is that it is not enough to take raw data, run it through a battery of statistical tests, and publish the results. And yes, pie charts are a statistical test. The data must be understood and interpreted - and statisticians will use a first set of tests to illuminate the nature of the data, even before we begin testing hypotheses. After all, how can you answer a question without truly understanding what it is? Remember that any statistical test involves a null hypothesis and an alternative - without understanding exactly what the data represents, it is impossible to properly express those options.

But the statistician's work is not done once the data is understood and the tests are performed - the results of those tests must be interpreted as well. "Lies, damn lies and statistics" isn't just an anecdote - it's truth! Show me a result from a dataset, and I'll show you a convincing way to present an alternative conclusion. It is only by ensuring the integrity of the data and the tests, by knowing exactly what questions are being asked and the manner in which they will be answered, that we can have confidence in our results.

I think it's wonderful that the tools of statistics have become democratized. But we need to make sure that statistical thinking is as widely disseminated as that math. Tools aren't much use without the knowledge to wield them. I can hold a hammer and screwdriver, sure, but I'm no master carpenter. Until we can be confident that our statistics come from statisticians, it will remain necessary to question all analyses. As I write that, I'm well aware in that scenario we'd probably need a healthy dose of skepticism, just the same. Who better to disguise meaning than the master statisticians themselves?