Posts about the analysis and presentation of:


Exploring Python

December 27, 2012

GHL has written two very nice Python-related posts. The first is for highly-technical readers who want to use the new Blaze package ("NumPy 2.0") so badly they'll do so even if it doesn't work yet -- and yes, I'm in that camp! Here are instructions for building Blaze. Note that due to its rapid development it's […]

0 comments Read the whole post →

The Signal and the Noise: errata

December 13, 2012

Nate Silver's new book, The Signal and the Noise: Why So Many Predictions Fail -- but Some Don't, is, on the whole, an excellent overview of statistical thinking. I think most of my readers would enjoy it. However, it is plagued by some bizarre mistakes that left me unable to completely trust that every detail is correct. […]

53 comments Read the whole post →

Deep learning goes mainstream

November 24, 2012

Another day, another surprise from the New York Times! This time it's a front page article on "deep-learning," an integral part of my own work and something that defies many attempts at simple explanation. Sadly, that's also true of the Times article, which never actually explains what deep learning is! Indeed, the reader is left to wonder […]

4 comments Read the whole post →

To surprise of pundits...

November 7, 2012

Thanks, xkcd: As of this writing, the only thing that's 'razor-thin' or 'too close to call' is the gap between the consensus poll forecast and the result.

2 comments Read the whole post →

Politics & Statistics

November 4, 2012

I'm a big fan of Nate Silver -- he consistently demonstrates that he is one of the best and brightest statisticians around. I like to say that statisticians (and risk managers) are professional skeptics; our job is to let data speak for itself, not to speak on its behalf. Nate Silver does that better than […]

0 comments Read the whole post →

Missing pieces

August 1, 2012

Two important things to keep in mind about Mountain Lion, entirely reblogged from Adam Laiacano because his pictures are worth exactly two thousand of my words: One saving grace -- at least on my machine -- a Java runtime was automatically installed while I was installing Python. Why not just preinstall it in the first place? […]

0 comments Read the whole post →

Compiling SciPy on Mountain Lion

July 27, 2012

Update 8/1: The fix I described below has just been added to the development branch. Mountain Lion users can install the development branch with: pip install -e git+ (note this requires a Fortran compiler; see here for more detail) I've been updating my post on installing Python/NumPy/SciPy/IPython on Lion to work with Mountain Lion. For […]

11 comments Read the whole post →

Charting iPhone growth

June 30, 2012

To mark the iPhone's 5 year anniversary, comScore has released a chart of mobile use by iPhone type: The most striking thing about the chart, to me, is how steady the width of the bands remains through time. There is a perception that a small but dedicated group of iPhone owners upgrade their hardware with […]

0 comments Read the whole post →

Turing's Cathedral

June 27, 2012

I've just finished Turing's Cathedral, a wonderful new book by George Dyson about John von Neumann's team at Princeton that built one of the first computers. In the title chapter, there are a few excellent quotes: "I asked [Turing] under what circumstances he would say that a machine is conscious," Jack Good recalled in 1956. "He […]

0 comments Read the whole post →

Autoencoders go mainstream

June 27, 2012

My inbox has been buzzing with links to an interesting new research paper from a team at Google led by Andrew Ng (of Stanford AI fame) and Jeff Dean. However, I'm receiving far more links to an NYT piece covering the research. It's great that the work is getting mainstream coverage, but somewhat unfortunate because […]

4 comments Read the whole post →

But what does "exponentially higher" actually mean?

February 10, 2012

An NYT article about a text-message-based ad that aired during the Super Bowl talks about the high follow-through rate that the ad earned for its creator, the NFL. In fact, the ad did so well that one executive described it like this: While Mr. Berman [general manager of NFL Digital Media] declined to say exactly […]

1 comment Read the whole post →

Data does not make decisions

February 7, 2012

Darren Herman gets it: This is important.  Data alone does not make decisions. An organization built for the next century is one who has to be able to wonk through large datasets, find insights and action them.  Just having data alone is not a winning proposition.  It’s the application of data, the extrapolation, and understanding that […]

0 comments Read the whole post →

I can chartjunk and so can you!

February 6, 2012

Here's a brilliant post by Andrew Gelman, highlighting a tutorial that will actually destroy information in 25 steps, allowing you (yes, you!) to create this anti-masterpiece: People who treat chartjunk infographics as real data visualizations should be redirected to a GeoCities archive every time they access Wikipedia.

0 comments Read the whole post →

Facebook au lait

February 5, 2012

The NYT's Bits section, which up until now I thought was doing a wonderful job of evolving technology reporting to a higher, "post-blog" level, has left me stunned with a bizarre editorial in which the author requests compensation for his contribution to Facebook's success. Is it just a tongue-in-cheek opinion designed to attract eyeballs and -- yes […]

5 comments Read the whole post →

High tech's hottest calling

January 26, 2012

The NYT's Bits blog has a new post on "high tech’s hottest calling:" statistical analysis. The article isn't just about the jobs market, focusing as well on students' increased demand for statistics classes at top universities. The opening anecdote will be familiar to anyone in the field: “Most of my life I went to parties […]

0 comments Read the whole post →

"Big Data" is meaningless

January 20, 2012

Roger Ehrenberg gets it: Every so often a term becomes so beloved by media that it moves from “instructive” to “hackneyed” to “worthless,” and Big Data is one of those terms.... Every business generates data, but it is a far smaller number that view data as a strategic asset that is actively managed for the benefit […]

0 comments Read the whole post →

Stanley Kubrick: data scientist?

January 18, 2012

Here's a fascinating essay by Mike Kaplan, who oversaw marketing for the movies 2001 and A Clockwork Orange, which explains how Stanley Kubrick became one of the first commercial data scientists. In 1971, as Kaplan and Kubrick were trying to determine which theaters should show the new movie, they realized that Variety published box office totals for individual cinemas in […]

0 comments Read the whole post →

...and not a drop of value

January 5, 2012

Bryce Roberts gets it: Here’s the thing. Data, big, medium or small, has no value in and of itself. The value of data is unlocked through context and presentation.

0 comments Read the whole post →

Stat is magic

October 13, 2011

I really love the latest post on Lessons from my Twenties, called Stat Is Magic. Sometimes, things are better left as magic.

0 comments Read the whole post →

Quick sepia images in WordPress

September 30, 2011

The other day, I was unexpectedly asked, "What's the easiest way to make a sepia-toned image in WordPress?" The questioner has a blog with an "antique" theme, and wanted to use the sepia images inline. However, the blog is quite image-heavy and she (understandably) didn't want to dive into Photoshop for every single post. She […]

0 comments Read the whole post →

Eloquent JavaScript: an interactive programming tutorial

September 30, 2011

Via my friend Will Gaybrick (@gaybrick), I discovered an excellent programming tutorial called Eloquent JavaScript. Not only is it extremely well-written, clear and friendly, but it features a completely interactive console allowing readers to run and experiment with every single example. You'll never have to struggle to decipher what a piece of code is doing […]

4 comments Read the whole post →

Unknown unknowns

September 19, 2011

After observing a pair of poorly-rebadged cars, a series of thoughts about Rumsfeld's "known knowns," "known unknowns," and "unknown unknowns."

0 comments Read the whole post →

"Highly skilled, nerdy-cool"

September 15, 2011

More good news for data scientists, this time from Fortune: The unemployment rate in the U.S. continues to be abysmal (9.1% in July), but the tech world has spawned a new kind of highly skilled, nerdy-cool job that companies are scrambling to fill: data scientist.

0 comments Read the whole post →

Syncing settings across computers

September 15, 2011

Using Dropbox and shell scripts to automatically sync settings and configurations.

8 comments Read the whole post →

"The application of data is what is fascinating"

September 15, 2011

My friend Darren Herman recently tweeted a statement I couldn't agree more with (I'm linking to his blog post rather than the tweet itself; as we all know, attempting to take advantage of Twitter's disastrous data model is like trying to catch water in a sieve): ”The data itself isn’t overly interesting.  The application of data is what […]

0 comments Read the whole post →

Google Correlate

September 6, 2011

For some time, we ran a popular series on TGR called "Trends" -- you can see 'em all right here. We used Google Trends and Google Insight to uncover interesting behavioral relationships. Now Google has gone and stolen our thunder, releasing Google Correlate to the world. Google Correlate lets you directly compare the search histories […]

0 comments Read the whole post →

Bayes, prior to reading

August 16, 2011

I may have to go pick up this book, which was reviewed in the NYT last week, if only because it opens with a favorite quote from Keynes. Titled The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy (Wow, titles are getting […]

0 comments Read the whole post →

Puzzling the Dow

August 14, 2011

What is the probability that the sum of the digits of the Dow's change would add up to the 26 on three consecutive days?

3 comments Read the whole post →

Data science in the mainstream

August 14, 2011

AOL Jobs has posted an article titled "Data Scientist: The Hottest Job You Haven't Heard Of" -- except, of course, that you have. But you TGR readers would make up a very small fraction of AOL's traffic (trust me -- it doesn't take a data scientist to figure that one out), so let's take this […]

2 comments Read the whole post →

Installing Python, virtualenv, NumPy, SciPy, matplotlib and IPython on Lion or Mountain Lion

August 12, 2011

A guide to installing Python, virtualenv, NumPy, SciPy, matplotlib and IPython on Mac OS 10.7 Lion

181 comments Read the whole post →