Posts about the analysis and presentation of:

Data

High tech's hottest calling

January 26, 2012

The NYT's Bits blog has a new post on "high tech’s hottest calling:" statistical analysis. The article isn't just about the jobs market, focusing as well on students' increased demand for statistics classes at top universities. The opening anecdote will be familiar to anyone in the field: “Most of my life I went to parties [...]

0 comments Read the whole post →

"Big Data" is meaningless

January 20, 2012

Roger Ehrenberg gets it: Every so often a term becomes so beloved by media that it moves from “instructive” to “hackneyed” to “worthless,” and Big Data is one of those terms.... Every business generates data, but it is a far smaller number that view data as a strategic asset that is actively managed for the benefit [...]

0 comments Read the whole post →

Stanley Kubrick: data scientist?

January 18, 2012

Here's a fascinating essay by Mike Kaplan, who oversaw marketing for the movies 2001 and A Clockwork Orange, which explains how Stanley Kubrick became one of the first commercial data scientists. In 1971, as Kaplan and Kubrick were trying to determine which theaters should show the new movie, they realized that Variety published box office totals for individual cinemas in [...]

0 comments Read the whole post →

Call 'em like you see 'em

January 5, 2012

Anderson Cooper, on air, referring to CNN's nonsensical "Social Media Screen": The social media screen, again with the social media screen. My Lord. This is the third hit, I still don't understand what the hell this thing shows! It's a shame that it takes a 3am broadcast for someone to let the emperor know his clothes are missing. [...]

0 comments Read the whole post →

...and not a drop of value

January 5, 2012

Bryce Roberts gets it: Here’s the thing. Data, big, medium or small, has no value in and of itself. The value of data is unlocked through context and presentation.

0 comments Read the whole post →

Stat is magic

October 13, 2011

I really love the latest post on Lessons from my Twenties, called Stat Is Magic. Sometimes, things are better left as magic.

0 comments Read the whole post →

Quick sepia images in WordPress

September 30, 2011

The other day, I was unexpectedly asked, "What's the easiest way to make a sepia-toned image in WordPress?" The questioner has a blog with an "antique" theme, and wanted to use the sepia images inline. However, the blog is quite image-heavy and she (understandably) didn't want to dive into Photoshop for every single post. She [...]

0 comments Read the whole post →

Eloquent JavaScript: an interactive programming tutorial

September 30, 2011

Via my friend Will Gaybrick (@gaybrick), I discovered an excellent programming tutorial called Eloquent JavaScript. Not only is it extremely well-written, clear and friendly, but it features a completely interactive console allowing readers to run and experiment with every single example. You'll never have to struggle to decipher what a piece of code is doing [...]

3 comments Read the whole post →

Unknown unknowns

September 19, 2011

After observing a pair of poorly-rebadged cars, a series of thoughts about Rumsfeld's "known knowns," "known unknowns," and "unknown unknowns."

0 comments Read the whole post →

"Highly skilled, nerdy-cool"

September 15, 2011

More good news for data scientists, this time from Fortune: The unemployment rate in the U.S. continues to be abysmal (9.1% in July), but the tech world has spawned a new kind of highly skilled, nerdy-cool job that companies are scrambling to fill: data scientist.

0 comments Read the whole post →

Syncing settings across computers

September 15, 2011

Using Dropbox and shell scripts to automatically sync settings and configurations.

8 comments Read the whole post →

"The application of data is what is fascinating"

September 15, 2011

My friend Darren Herman recently tweeted a statement I couldn't agree more with (I'm linking to his blog post rather than the tweet itself; as we all know, attempting to take advantage of Twitter's disastrous data model is like trying to catch water in a sieve): ”The data itself isn’t overly interesting.  The application of data is what [...]

0 comments Read the whole post →

Google Correlate

September 6, 2011

For some time, we ran a popular series on TGR called "Trends" -- you can see 'em all right here. We used Google Trends and Google Insight to uncover interesting behavioral relationships. Now Google has gone and stolen our thunder, releasing Google Correlate to the world. Google Correlate lets you directly compare the search histories [...]

0 comments Read the whole post →

Bayes, prior to reading

August 16, 2011

I may have to go pick up this book, which was reviewed in the NYT last week, if only because it opens with a favorite quote from Keynes. Titled The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy (Wow, titles are getting [...]

0 comments Read the whole post →

Puzzling the Dow

August 14, 2011

What is the probability that the sum of the digits of the Dow's change would add up to the 26 on three consecutive days?

3 comments Read the whole post →

Data science in the mainstream

August 14, 2011

AOL Jobs has posted an article titled "Data Scientist: The Hottest Job You Haven't Heard Of" -- except, of course, that you have. But you TGR readers would make up a very small fraction of AOL's traffic (trust me -- it doesn't take a data scientist to figure that one out), so let's take this [...]

2 comments Read the whole post →

Installing Python, virtualenv, NumPy, SciPy, matplotlib and IPython on Lion

August 12, 2011

A guide to installing Python, virtualenv, NumPy, SciPy, matplotlib and IPython on Mac OS 10.7 Lion

55 comments Read the whole post →

And eat it, too

July 28, 2011

Mark at Epic Graphic presents a metaphor for the data/knowledge process: While I love the idea, I think it's missing the most important thing -- the recipe! I'm most interested in how we get from data (raw ingredients) to information (consumible product). Do we follow a specific process - taken straight from a cookbook, for example? [...]

0 comments Read the whole post →

Trials of the early adopter

July 25, 2011

Update: this post is now completely obsolete. I've posted a much more comprehensive guide to installing Python, NumPy, SciPy, matplotlib and IPython on Lion here. This post is meant as a public service announcement for an extremely small audience. If you don't think this is directed at you, then it almost certainly isn't. I'm happy [...]

0 comments Read the whole post →

QOTD: hindsight edition

July 10, 2011

Kaiser Fung writes on uncertainty and thinking probabilistically about events that have already transpired. The full post is worth a read, but this line sticks out for me: The fact that you won the lottery does not change the fact that economically, it was silly to play the lottery in the first place. This fallacy pops [...]

0 comments Read the whole post →

Data science vs business intelligence

June 30, 2011

Steve Miller has written a nice two-part piece on data science for Information Management. Part 1 overviews the topic, including links to many pieces that have been profiled on TGR. Part 2 is a more direct comparison of data science and "business intelligence," a somewhat lackluster (but growing) field of data analytics. One quote stood [...]

0 comments Read the whole post →

Information economics

April 25, 2011

An excellent article in the NYT suggests that "information economics" is starting to have a demonstrably positive impact on businesses that harness data well. The most important observation, in my opinion, comes a bit earlier in the article: In a modern economy, information should be the prime asset — the raw material of new products [...]

5 comments Read the whole post →

Holographic GapMinding

November 30, 2010

Hans Rosling -- whose lectures are always fascinating -- is hosting a new documentary for the BBC called "The Joy of Stats." A 5 minute clip has been released on YouTube showing a faux-holographic version of Hans' GapMinder visualization package. The graphic overlay is very well done and lets Hans describe the data in an [...]

0 comments Read the whole post →

Visualizing politics through time

November 22, 2010

We love choropleths here at TGR, and here's a really great set -- David Sparks has mapped US presidential voting patterns through time to create an excellent visualization of ebbing (and sometimes volatile) political attitudes: Best of all, he did it with R. Please see David's website for more details. Some of his other projects [...]

0 comments Read the whole post →

UCF cheating scandal

November 18, 2010

A major cheating scandal at UCF was discovered - and resolved - through a relatively simple statistical analysis of midterm results. The team was able to identify students who cheated on their midterm exam with high confidence. Professor Richard Quinn's announcement of those findings was captured in this video of his lecture: (Via The Daily [...]

0 comments Read the whole post →

Tiny, Large, Very, Nice, Dumbest.

November 12, 2010

Here's a great analysis from Ben Blatt of the Harvard Sports Analysis Collective. He looked at three well-known sports writers -- Bill Simmons, Rick Reilly and Jason Whitlock -- and performed a lexical analysis to create a statistical representation of their writing styles. What can you do with that analysis? Well, you can see what [...]

0 comments Read the whole post →

Modeling how cats drink

November 11, 2010

I thought this was fascinating -- scientists have modeled how cats drink. Naturally, once you have a model, you want to see how well if fits the data. For example, is there an optimal lapping speed? After calculation of things like the Froude number and the aspect ratio, they were able to figure out how [...]

1 comment Read the whole post →

Google Refine

November 11, 2010

Google has launched a new open-source project called Refine (formerly Metaweb's Freebase Gridworks) which allows users to easily clean up and transform large datasets. There is nothing more painful than cleaning data at the command line - I'd even go so far as to say it's impossible to do a good job. Sorry, R. Excel [...]

0 comments Read the whole post →

The turkey statistician

November 10, 2010

C links to a Thanksgiving-appropriate essay by Nassim Taleb in which he presents a a story about a turkey statistician. For 100 days, the turkey is fed and cared for by humans. He arrives at the statistically-significant conclusion that humans genuinely care about his well-being. On the 101st day, the turkey is slaughtered. Most interesting [...]

0 comments Read the whole post →

Chicken soup for the global economy

November 8, 2010

Just replace "technology" with "stress": (via Dilbert)

1 comment Read the whole post →