The language of statistics

June 24, 2010 in Data

Joseph Rickert has written a piece calling R "the language of statistics," which I feel is a deserved title. As he puts it:

I don’t just mean that R “is spoken” by many or even most statisticians. R’s superiority for statistics is deeper than that. R is a language with syntax and structure that have been explicitly designed to formulate expressions about statistical objects. At this time, it may be le premier langue for statistical thinking that enables the formulation of ideas, and notions about statistical models and data that are difficult to express succinctly in other languages including mathematical notation.

Unfortunately, the rest of the article is a bit difficult to swallow unless (or, in my case, even if) the reader is well-versed in R. The examples are a bit too complicated or special-case to really demonstrate the language's power and scope. That's not to take away from the message, but I worry that it will be lost on its target audience.

Having defended the choice of R to colleagues in both academic and professional roles (and personal ones too, in a few especially nerdy cases), I've found that the entire pitch rests on the richness of the syntax "clicking" in someone's head. Otherwise, the whole idea is shot down by a few "Why wouldn't I just use program X?" rejoinders. Ultimately, I've found that the multi-disciplinary aspects of R are what seal the deal -- there's no need (usually) to recast datatypes or write new procedures, because it's almost guaranteed that someone out there has done that work for you (in a succinct and compatible way, to boot!).

R should take a cue from Apple's ads: Do you need to run summary statistics, followed by a linear regression, followed by a cluster analysis, followed by a genetic exploration... and then a series of publication-quality graphs? There's a package for that.

Leave a Comment

Previous post:

Next post: