Posts about the analysis and presentation of:

Data

Google Refine

November 11, 2010

Google has launched a new open-source project called Refine (formerly Metaweb's Freebase Gridworks) which allows users to easily clean up and transform large datasets. There is nothing more painful than cleaning data at the command line - I'd even go so far as to say it's impossible to do a good job. Sorry, R. Excel [...]

0 comments Read the whole post →

The turkey statistician

November 10, 2010

C links to a Thanksgiving-appropriate essay by Nassim Taleb in which he presents a a story about a turkey statistician. For 100 days, the turkey is fed and cared for by humans. He arrives at the statistically-significant conclusion that humans genuinely care about his well-being. On the 101st day, the turkey is slaughtered. Most interesting [...]

0 comments Read the whole post →

Chicken soup for the global economy

November 8, 2010

Just replace "technology" with "stress": (via Dilbert)

1 comment Read the whole post →

Misreading misleading charts: rally edition

November 4, 2010

From the "question everything" file, an excellent example of why you should never trust anything without verifying it yourself: a number of conservative mailing lists are forwarding the following image comparing the size of Glen Beck's rally to Jon Stewart's: Note the circles at the bottom, which purport to show the areas involved. The first [...]

0 comments Read the whole post →

Election night technology, redux

November 3, 2010

Two years ago, I wrote about how impressed I was by CNN's use of technology in their election night broadcast. They employed iPhone-inspired multitouch screens to access and browse data visually (the iPad, sadly, was but a dream at that point). The screens could display charts, sorted data, maps, results... anything the anchors required. One [...]

4 comments Read the whole post →

Breaking up is hard to do (especially on Christmas)

November 2, 2010

David McCandless's TED talk on data visualization is excellent -- you can catch it here -- and Mathias Mikkelsen has highlighted a single analysis that investigates when people are most likely to break up (according to Facebook) (Update: original here): What makes the chart so appealing is how easy it is to understand, despite the [...]

0 comments Read the whole post →

The data supply chain

October 27, 2010

Pete Warden has written a post on extracting value from data. Early on, he compares the data itself to raw minerals - it's difficult to sell it at a premium because the eventual buyer will have to invest time and money extracting value from the commodity. Now, data may not be commoditized (yet) but I [...]

0 comments Read the whole post →

World Statistics Day

October 21, 2010

World Statistics Day was yesterday, October 20th. Here's how the United States marked the occasion: In order to celebrate WSD, U.S. associations and federal statistical agencies will conduct a breakfast briefing and open house on Capitol Hill to celebrate the contributions of statistics toward informing public policy and improving human welfare. Party hats ON, people! If [...]

0 comments Read the whole post →

The data science Venn diagram

October 14, 2010

Here's an infographic we can get behind -- Drew Conway's data science Venn diagram: Please take the time to read Drew's post on the subject (and his other ones) - they are excellent as always.

0 comments Read the whole post →

Tower graphics

October 14, 2010

Max Gadney writes on the rise of "tower graphics" - those giant infographics popping up all over the net which require scrolling endlessly to follow their narratives. He notes: Every time I try to hate these, I imagine people who are just interested in the facts finding them easy to use. (albeit hard to search [...]

0 comments Read the whole post →

Statistical literacy

October 13, 2010

Wired has put together a list of 7 essential skills you didn't learn in college but will need to navigate the 21st century. Skill number 1: statistical literacy. (Skill number 7 is domestic tech -- could that be the new home ec?) (via Kottke)

0 comments Read the whole post →

Google's programming initiative

July 12, 2010

Google has introduced software that allows non-programmers to create relatively simple Android applications. The program wraps pre-written pieces of code in bite sized visual representations that can be linked together to create complex behaviors. The software can tap many areas of the Android API, including hardware functions like the accelerometer, and can autonomously respond to [...]

0 comments Read the whole post →

Risk & risk management

June 30, 2010

An overview of financial risk and the risk management process.

10 comments Read the whole post →

A bit more on speech

June 26, 2010

As if responding to my thoughts on communicating with machines, Isaac Asimov's classic novel Second Foundation provides the following: Speech, originally, was the device whereby Man learned, imperfectly, to transmit the thoughts and emotions of his mind. By setting up arbitrary sounds and combinations of sounds to represent certain mental nuances, he developed a method [...]

0 comments Read the whole post →

Speech recognition (is more prevalent than you think)

June 25, 2010

The NYT has published the second article in their "Smarter Than You Think" series on artificial intelligence (TGR covered the first here and again here). This time, the focus is on speech recognition and natural language processing. A couple passages really stood out to me in this more abbreviated overview of the technology: Computers with [...]

0 comments Read the whole post →

The language of statistics

June 24, 2010

Joseph Rickert has written a piece calling R "the language of statistics," which I feel is a deserved title. As he puts it: I don’t just mean that R “is spoken” by many or even most statisticians. R’s superiority for statistics is deeper than that. R is a language with syntax and structure that have [...]

0 comments Read the whole post →

Holding a mirror to artificial intelligence

June 23, 2010

Assessing the human computer.

3 comments Read the whole post →

Irrational exuberance, indeed

June 23, 2010

Here's an amusing chart showing the percent of stocks that sell-side analysts have rated "sells", on average: There's a million junk-chart bloggers who will tell you how much is wrong with this graph (myself included) - starting with the left hand scale, which should go up to 10% rather than 100%. But in a rare [...]

0 comments Read the whole post →

Twitter's firehose problem

June 22, 2010

Esquire confirms what we already knew: Twitter is a waste of time. The information "firehose" has more in common with the Deepwater site, spewing redundant and useless information at a constant pace. In that regard, truth be told, it's not much different from any other communications service - except that alternatives have either explicit or implicit filtering [...]

1 comment Read the whole post →

Elementary, my dear Watson

June 17, 2010

In a pleasant surprise, the NYT Magazine has published an excellent article on artificial intelligence. What's more, it appears to be the first in a series. The article is well-written and accessible; it doesn't delve with any of the math, just the inspirations for and results of the AI procedures. It really speaks to the [...]

0 comments Read the whole post →

Sweating the small stuff

June 9, 2010

An excellent (and humorous) TED talk by Rory Sutherland on the importance of detail and clarity:

0 comments Read the whole post →

Off the grid 2: here there be tourists

June 9, 2010

Eric Fischer has updated the Geotagger's World Atlas (previously covered on TGR here) by overlaying an analysis of photographers on the geo-located picutures. The result is even more stunning, capturing the different behaviors of locals (blue) and tourists (red): He drew conclusions by examining other photos by the same photographer. If they had taken photos [...]

0 comments Read the whole post →

What is data science?

June 3, 2010

The latest in a series of articles on the topic, Mike Loukides of O'Reilly Radar asks, "What is data science?": We've all heard it: according to Hal Varian, statistics is the next sexy job. Five years ago, in What is Web 2.0, Tim O'Reilly said that "data is the next Intel Inside." But what does that statement [...]

1 comment Read the whole post →

A Data Visualization Manifesto

May 31, 2010

Words of wisdom from Andrew Gelman: What harm is done, if any, by having ambiguous labels, uninformative orderings of variables, inconsistent scaling of axes, and all the rest? From a psychological or graphical perception perspective, maybe these create no problem at all. Perhaps such glitches (from my perspective) are either irrelevant to the general message [...]

0 comments Read the whole post →

Off the grid: NYC photoplot

May 25, 2010

Eric Fischer posts the Geotaggers' World Atlas - a collection of urban networks revealed by the location of pictures taken along their routes. The geographic data comes from Flickr and was clustered and plotted to reveal various city grids. A fairly straightforward mashup of data and geography coupled with a clean visualization... I love this [...]

0 comments Read the whole post →

Beware statisticians bearing gifts

May 24, 2010

The NYT is running a great article about the influx of data in today's world. The prime argument borrows from Einstein's quote, "Not everything that can be counted counts, and not everything that counts can be counted." I think this speaks volumes and should be heeded by the sites that persist in churning out infographics [...]

0 comments Read the whole post →

Where do R commands come from?

May 13, 2010

Ever wondered why R commands have those funny and sometimes confusing abbreviations? I admit I always found "c" (which [c]ombines elements) confusing... especially when I was starting out, and would bind it to test variables. In the spirit of upholding my end of TGR's bargain (in which I provide items of nerdy interest and you [...]

1 comment Read the whole post →

Precision Information Environments

May 11, 2010

The last time I posted a video for all the futurists out there, we'd never even heard of an "iPad." It's amazing how that device has made clips like these seem so much closer to reality. This one is based on research from the Pacific Northwest National Laboratory on a class of emergency management interfaces called PIE's: Precision Information Environment. [...]

0 comments Read the whole post →

Data, data, everywhere

May 7, 2010

Doug Glanville on baseball scouting, but he could have been writing about any modern data-driven industry: But when all is said and done, if you don’t have instincts for what is happening, a perpetual stream of information just becomes a time-stealing vortex, and useless at best — even though you may know a lot more [...]

0 comments Read the whole post →

The revolution will be translated

March 9, 2010

From an NYT article on Google's translation services, this excerpt sums up the most critical transition in machine learning that has happened thus far: Creating a translation machine has long been seen as one of the toughest challenges in artificial intelligence. For decades, computer scientists tried using a rules-based approach — teaching the computer the [...]

0 comments Read the whole post →