Google has launched a new open-source project called Refine (formerly Metaweb's Freebase Gridworks) which allows users to easily clean up and transform large datasets.
There is nothing more painful than cleaning data at the command line - I'd even go so far as to say it's impossible to do a good job. Sorry, R. Excel is really the only tool I know which makes those operations tolerable, and even there it can be painful (after all, Excel is a tool for cutting data, not cleansing it). Google Docs just can't handle large enough files to be worthwhile.
Enter Refine - which ought to be incorporated directly into Google docs, the more I think about it. The tool makes cleaning data look easy, and I'm absolutely incapable of describing it in text, so I'm going to let Google do it for me: