Data science in the mainstream

August 14, 2011 in Data

AOL Jobs has posted an article titled "Data Scientist: The Hottest Job You Haven't Heard Of" -- except, of course, that you have. But you TGR readers would make up a very small fraction of AOL's traffic (trust me -- it doesn't take a data scientist to figure that one out), so let's take this opportunity to welcome data science to the mainstream, because that's really what this article represents to me.

I guess we shouldn't be too surprised though, as the field has matured considerably in the last year or two -- it has gained recognizable leaders, conferences with large turnouts, excellent non-technical books, etc. And yet, with this one article -- which doesn't even contain any new information -- I suddenly feel like my kid has gone to college. The field is most certainly not out of our hands, but it is now in those of many, many others. And that is a wonderful thing.

I am apprehensive about one thing, to be honest. With any more mainstream field comes imitators and knock-offs. In a field like this, where so much is exploratory and experimentation, it can be extremely difficult to tell the two apart. Furthermore, trying to correct someone else's poor work can appear vindictive or cruel. If someone makes a simple error like a linear regression of logarithmic data, who is going to step up and correct it? Is it possible to have a set of standards for projects that can't be defined a priori?

It will be crucial for the data science community to work together to educate not only its own members but the public as well. Up until now, it has been safe to assume that people in this community share a certain level of knowledge, because without it they simply wouldn't have chosen to participate. The first challenge we face is simply expanding that circle and welcoming new people who are driven more by their interest than their experience. The second challenge is effectively explaining and demonstrating this field to the public, both as a matter of conduct but also as a defensive measure against the inevitable cons that follow.

The article does make an interesting and somewhat related point:

Since data scientists spend a significant amount of time using computer programs and algorithms, it may seem logical that a computer science degree would be preferable for these professionals. However, many argue that a degree in physics makes more sense. Loukides writes that physicists not only have mathematical and computing skills but also an ability to see the "big picture."

I agree, although I'm  not sure why "physics" gets the front billing here. Nothing against physicists, I just think there are many fields that foot the bill just as well. The point is that being able to program isn't enough, which is true. Data science is about exploration, communication and, above all, a deep understanding of the topic at hand. Creativity plays a surprisingly large role, as we know. And let's not forget artistic ability (or at least a good sense of design)! The end product needs to be presented cleanly and intelligibly -- you'd be shocked how many times I've been given raw R output as a final draft. Presentation is not just for show -- it is the show.

Moreover:

Daniel I. Shostak, President of Strategic Affairs Forecasting, has been tracking changes in the field of analytics for several years and says that those interested in working as a data scientist need more than just computer skills. "[They] need to demonstrate very good communication skills because many folks are very skeptical about the value of data driven analysis."

Absolutely true -- but let's stress that the value of communication skills is not just to defend one's worth, but to explain results. The most brilliant data scientist in the world might never get hired if he can't stand up in a room of people and explain what he did without a word of math or programming jargon. I was once in a meeting with a guy who, when asked how he came up with his results, actually said, "It's simple," caught his breath, and launched into a highly technical description of the architecture he designed for the analysis. Don't be that guy. If a client doesn't feel like he can have a conversation with you over coffee, he certainly isn't going to take your work without a grain of salt.

People distrust what they don't understand. The data science community's top priority should therefore be education. It is very possible to miss this opportunity, and slam many doors in the process. However, I have complete faith that will not happen. I admire the work that this community has done and am proud to have contributed to it. These are excellent, brilliant individuals who are now at the end of a rapidly growing movement. They understand the importance of getting the word out and easing newcomers into the fold. I'm making a note here: huge success.

Data science, welcome to the mainstream. Be careful out there.

Update: Pat makes an excellent point in the comments that I didn't mention "curiosity" as a necessary trait for data scientists -- what an oversight! It should be at the very top of the list.

{ 2 comments… read them below or add one }

Leave a Comment

Previous post:

Next post: