Zipf's law is another mathematical phenomenon not entirely unrelated to Benford's law (in fact, some think that Benford is a special case of Zipf). (Aside, it's funny how after you discuss something, it seems to pop up everywhere - Kahneman and Tversky would have a lot to say on that, I'm sure.) Zipf's law is used to describe datasets in which an item's occurrence is inversely related to its rank. For example, the most frequently observed element occurs X times. The second-most frequently observed element X/2 times, the third ranking element X/3 times, etc.
Zipf, a linguist at Harvard, first noticed the pattern which would bear his name while studying word frequencies in texts. For a time, it was thought that Zipf's observation was a mathematical representation of some underlying language process. However, if text is randomly generated by uniformly picking from 26 letters and a space, the resulting corpus also exhibits Zipf's law, suggesting it is more of a statistical artifact than evidence of a deeper semantic process.
In today's The Wild Side blog of the NYT, guest blogger Steven Strogatz, a professor of applied math at Cornell, discusses Zipf's law in relation to city populations. Strogatz states that Zipf actually discovered his pattern while looking at urban populations, which surprised me since I was familiar with the word-frequency story - it turns out both are true: he presented the words and then the cities as two examples of the pattern. There is some doubt that the fact that city sizes follow Zipf's law actually means anything, just as it appears to be a statistical artifact in linguistics. Other statistical distributions provide a good fit, but Zipf's law is "nice" because of its utter simplicity.
The remainder of the blog is on other urban mathematics, including evidence of economies of scale, before connecting such observations back to biology and cellular processes (the usual focus of the blog - gotta keep the readers satisfied!). Strogatz draws an interesting connection between the mechanism for transporting nutrients through a cell and the transportation infrastructure of a city, both of which exhibit similar economies of scale with exponents close to 0.75. It turns out there is a mathematical justification for why that would be so.
As someone fascinated by urban development, I especially appreciated the essay, especially for the philosophical lens it borrows:
These numerical coincidences seem to be telling us something profound. It appears that Aristotle’s metaphor of a city as a living thing is more than merely poetic. There may be deep laws of collective organization at work here, the same laws for aggregates of people and cells.
The numerology above would seem totally fortuitous if we hadn’t viewed cities and organisms through the lens of mathematics. By abstracting away nearly all the details involved in powering a mouse or a city, math exposes their underlying unity. In that way (and with apologies to Picasso), math is the lie that makes us realize the truth.