Steve Miller has written a nice two-part piece on data science for Information Management. Part 1 overviews the topic, including links to many pieces that have been profiled on TGR. Part 2 is a more direct comparison of data science and "business intelligence," a somewhat lackluster (but growing) field of data analytics.
One quote stood out to me:
Although there are many very large data warehouses in the BI world, data science seems obsessed with handling “big data – when the size of the data itself becomes party of the problem.”
I actually dislike the popular equivalence of "big data" and "data science". While massive volumes of data -- both in observations (rows) and number of variables (columns) -- certainly necessitated the development of quantitative and infrastructural tools that are central to "data science", the field is by no means limited to large datasets. A good data scientist should be able to find insight in any type of data, big or small. This Kaggle contest speaks to the dangers of overfitting, a problem which doesn't go away just because the number of observations gets higher. I'm all for the "big data" movement, but "data science" is a larger field than just working with massive datasets.
Miller offers this contrastive chart:
To me, BI is "diet data science". BI is not interested in modeling the processes that generate the observed data; it is interested in correlating the observations themselves. One of the best BI tools that I've used is Tableau, a sort of pivot table on steroids which makes it very easy to graph and view the relationships of various variables. But it doesn't offer much for extrapolating new meaning from the data, or applying insights to new data. BI is what data science would be if there were no latent processes, and showing that "these two things move up together" was a sufficient characterization.
I think Miller's chart does capture the chief differences between the two sciences (except for the very last point) but, again, I see no reason for BI to persist as DS methods become more commonplace and accessible. DS is not different from BI, it is just better. Every linear regression and correlation that produces "exact" BI results (as opposed to this assumption that DS only gives "approximate" answers) is in fact a key tool in the data scientist's belt.