The data supply chain

October 27, 2010 in Data

Pete Warden has written a post on extracting value from data. Early on, he compares the data itself to raw minerals - it's difficult to sell it at a premium because the eventual buyer will have to invest time and money extracting value from the commodity. Now, data may not be commoditized (yet) but I really like the metaphor and wish Pete had carried it further. In my mind, it really all comes down to refining data. Pete outlines a couple of the steps in that process, which I expand upon here:

  1. Raw data
  2. Sorted data
  3. Simple charts
  4. Aggregation/Reports
  5. Correlations
  6. Recommendations
  7. Insight
  8. Generation/Participation

Companies in this space are rapidly making their way through this supply chain. Sites like InfoChimps and Freebase have taken the first steps in raw data collection -- they are the mining companies.

Next, companies like Tableau (and a host of other "BI" solutions) and do-it-yourselfers like Excel take the data and churn out various summary views and tables -- these are the refineries and distributors.

Finally, enter the statisticians. Mostly branded as "consultants" or specialized divisions of larger firms (think:  Amazon, IBM, Netflix, Facebook, actuaries), this is where the real value is extracted as the data is cut and polished (or run through an engine. Or baked into bread. Roll your own metaphor <here>.) No two datasets are alike, and the tools and techniques applied to one may fail to glean insight from another.

I think there's another level starting to emerge here which goes a step beyond those statistical teams. In a somewhat contrived extension of my materials metaphor, this is most akin to the speculators in financial markets who deal in commodity derivatives. I'm talking about companies that successfully make a living interpreting data. These are the machine learners, the A.I.'s. They take data and extrapolate it into new data - which is fed back into the system and used to prolong the life of the routine. This is still largely the realm of academics and even more highly specialized divisions than the statisticians. But I believe it's maturing much faster than you'd think.

Leave a Comment

Previous post:

Next post: