June 24, 2009 in Data,Internet

Google Reader has become an inexorable part of my daily life. It's the only way I can keep up with the amount of reading I do each day, and as much as I love the service, there are a few things I miss.

Here's my wishlist for Google Reader:

Intelligent favorites: Right now, I have a "favorites" folder, which includes feeds I designate as (drumroll) my favorites. My Reader loads the favorites at startup. Determining my favorite feeds automatically would be a trivial exercise for a Bayesian filter (the same sort of mechanism that decides whether email is spam or not). It could even be time sensitive, so that feed I stopped reading a few months ago wouldn't be included.

Intelligent presentation: Right now, reader has a sort setting called "auto" which moves feeds that post infrequently to the top of the list. This is a nice start, but I think a few extra steps are needed before I make this my default sort. First, the algorithm boosts posts from a little too far back and puts them a little too high on my list. For some folders this works, for others it does not, depending on the rate of publishing. At a minimum, I wish I could adjust the settings. Relatedly, perhaps a different sort method is not the best way of presenting this information - an alternative would be fading out "less interesting" posts, making the posts that I'm more likely to want to read the ones I'm more likely to see as I browse.

Intelligent relevance: Related to the sorting method, frequency of posting is not necessarily how I determine relevance. I would also implement a Bayesian filter here to more intelligently guess what items I find interesting. But that's not the only way - one of the benefits of a central aggregation system is that Google knows how many other people have read, starred, shared or commented (via Reader) on each post I subscribe to. Surely this should be indicative of relevance, to the extent that my behavior tends to mirror that of other readers.

Saved searches: Let me save searches the way Outlook does, in dynamic folders. This way, I could create a dynamic "Mets" folder which would include a post from a finance blog that nonetheless mentioned David Wright.

Better searches: And while I'm searching, this is a Google product, so why is search so limited? Let me restrict my search by author, title, or content, and let me sort by relevance instead of recency. Time isn't necessarily the principal component of my search.

Grouping posts: Frequently, the same story is reported by various sources. A quick semantic analysis should be able to identify these posts and group them together, preferably with one of my preferred feeds as the top item. Google News does it. This is a little different from what Gmail does, however, since conversation tracking links emails that are explicitly related and this needs to imply similarity.

Filtering: What if I want to get Engadget's feed, except for posts that dare mention Apple? Let me set up filters to customize my feeds. Allowing me to save advanced searches would accomplish the same goal, since search recognizes operators like + and - and I can restrict my search to a specific feed. In the meantime, services like Feed Rinse and Yahoo Pipes are my options.

And that's all I've got off the top of my head. Basically, a mix of applied machine learning and explicit parameter definitions aimed at making Google Reader something more than just a chronological list of syndicated news. In particular, the trouble with Google's current autosort is that when I turn it on, I get the feeling that all is not quite right. A good behind-the-scenes relevance engine will feel "right" because it aligns with what I want to see. Unfortunately, one can't always depend on users to express what they feel, which is why explicitly defined filtering systems often fail (or at least are suboptimal). Bayesian filters and the like have the advantage of learning behavior; their development and implementation is nothing new and I think there are few areas begging to be addressed in this way as much as Google Reader.

What would you change?

