The million dollar question

August 20, 2009 in Data,Internet

Straight from GigaOm, emphasis mine:

Despite all the hype and excitement around the real-time web, access to real-time information online is hardly a new phenomenon. That fact stuck with me after talking to Chris Cox, Facebook’s product director, last week at the social networking company’s headquarters. As he noted, “Real time has been around since [the launch of] Technorati,” referring to the blog search engine founded by Dave Sifry in 2002 that aggregates hot stories from across the web. Yet seven years later, we still haven’t figured out how to handle the inundation of real-time information.

At the risk of redundancy, real time search isn't the next challenge; data organization is. The upwelling of "real time" sites (the new social media, if you will) has resulted in a new form of data - non-contextual, rapidfire and fleeting. The banner, therefore, has become "we need a faster search engine" or "real time search" - but that's not it at all. What we need is a different search engine, not a faster one.

Right now, our best search engines work by determining relevance or authority based on the number of people who indicate preference for an item, usually by linking to it. This has the obvious drawback of being dependent on links, which means it can not possibly index something earlier than it receives its first link. Thus, this paradigm doesn't work for "real time" data.

On the other end of the spectrum, sites like Twitter determine relevance purely by time - if your tweet was published more recently, it is more important (and positioned more prominently in results) than other tweets. But this is a broken paradigm as well - it can't possibly deliver the most relevant information unless, by construction, each published piece of information is increasingly relevant to me as time goes by!

Where does that leave us? Mostly at the whim of a few intrepid machine learning entrepreneurs. A hybrid approach is necessary at the least; a brand new approach is preferable.

And this question is worth well more than a million dollars.

{ 1 comment… read it below or add one }

Leave a Comment

Previous post:

Next post: