December 20, 2009 in Math
The NYT recently ran an article on the math behind the recent and controversial mammogram advisory change. Unsurprisingly, it is heavily centered on a Bayesian argument. Of course, the key point here is not that the statistics dictated the change, but that budgets and political agendas dictated an acceptable level, which the statistics subsequently informed:
Let’s suppose 100,000 screenings for this cancer are conducted. Of these, how many are positive? On average, 500 of these 100,000 people (0.5 percent of 100,000) will have cancer, and so, since 95 percent of these 500 people will test positive, we will have, on average, 475 positive tests (.95 x 500). Of the 99,500 people without cancer, 1 percent will test positive for a total of 995 false-positive tests (.01 x 99,500 = 995). Thus of the total of 1,470 positive tests (995 + 475 = 1,470), most of them (995) will be false positives, and so the probability of having this cancer given that you tested positive for it is only 475/1,470, or about 32 percent! This is to be contrasted with the probability that you will test positive given that you have the cancer, which by assumption is 95 percent.
Increasingly, I’ve noted in my discussions with statisticians and practitioners a reliance on Bayesian methods. Bayesian statistics rely on an understanding of the uncertainty of a hypothesis. For example, Bayesian hypotheses are literally updated as new information becomes available. Bayesian analyses will also rely heavily on conditional probabilities, or the understanding of likelihoods that depend on the occurrence of related events. One of the biggest Bayesian proponents is Professor Andrew Gelman, who maintains an excellent blog and is involved in fivethirtyeight.com.
In some ways, Bayesian methods have become a bit fad-like and, as with many fads (I’m looking at you, VaR), there should be concern that they will be applied blindly, without thought. Like anything else, it’s possible to do Bayesian statistics wrong – and even extremely wrong – but when wielded correctly, they make for an excellent investigative resource.
New Scientist has an article on the use – and misuse – of probability in criminal cases. Naturally, it focuses on Bayesian statistics. The key point the article makes is that while it’s important to consider the odds of something happening, it is just as critical to account for the odds of it happening by chance. That may seem contradictory (isn’t an event’s likelihood, by definition, the probability it happens by chance?) so let’s use a classic example, lifted from the article:
You have just tested positive for a disease that affects 1 in every 10,000 people. The test is 99% accurate. On the surface, that sounds like a sound diagnosis, and most people would say they are 99% confident that they do, in fact, have the disease. But consider the following: if every one of the 10,000 people took the same test, then 1 of them would yield a true positive and 99 more would exhibit false positives just by chance. Therefore, among people who have tested positive, there is only a 1% chance of actually having the disease – not the 99% likelihood we naively assumed before!
How does that work – wasn’t there only a 1% chance of the test being wrong? Well, yes – but if you think about it, that 1% chance of error is much larger than the 0.01% chance of having the disease in the first place and the test result must be placed in that context. For the more spatial readers, here is a picture from New Scientist:

The false positive problem is a classic textbook example of how Bayesian reasoning (that is, accounting for the ways in which chance can manifest itself) can affect a seemingly obvious result. It’s a very important consideration which could be overlooked without care. And besides, it makes for interesting pop sci articles.
Via economist Dan Ariely’s blog, this is what Isaac Asimov thought about perceiving the world through data. It is an implicitly Bayesian approach and brings to mind the famous Keynes quote about changing one’s mind. Asimov wrote:
“Don’t you believe in flying saucers, they ask me? Don’t you believe in telepathy? — in ancient astronauts? — in the Bermuda triangle? — in life after death?
No, I reply. No, no, no, no, and again no.
One person recently, goaded into desperation by the litany of unrelieved negation, burst out ‘Don’t you believe in anything?’
‘Yes’, I said. ‘I believe in evidence. I believe in observation, measurement, and reasoning, confirmed by independent observers. I’ll believe anything, no matter how wild and ridiculous, if there is evidence for it. The wilder and more ridiculous something is, however, the firmer and more solid the evidence will have to be.’
I had a great conversation last night which at one point verged into the pros and cons of various ratings systems. In particular, we discussed the “star+comment” system used by Yelp, in which between 1 and 5 stars can be assigned in addition to a text comment of arbitrary length.
Yelp does some clever things with their rankings, rather than just naively display restaurants with higher average rankings above ones with lower rankings. Most notably, I believe, they use a Bayesian process to asses the accuracy of the mean review. Thus, a 4 star rating based on 100 reviews could be presented above a 5 star rating based on 5 reviews, since there is uncertainty about the veracity of the 5 stars. On top of this, they take into account the people who have left comments (presumably adjusting for other reviews that person has given) as well as the content of the review comments.
Here’s a feature I’d like to see: adjust the rating to account for how Yelp predicts I would rate that restaurant. Lets say I’m looking at a certain restaurant, which has 4 stars. If in the past I tended to disagree with the people who have reviewed this restaurant, then perhaps it should be presented as a 3 or 2 star choice to me. Or perhaps I rate Italian restaurants very highly but hate sushi; even highest-rated sushi place on Yelp should be given a low rating when I view it. Or perhaps I like small restaurants, or cheap restaurants – give those categories a ratings boost when I view them.
There are a few caveats to this process: first, it requires me to have a reliable ratings history. This is just a necessary way to let Yelp know who I am. Second, the change doesn’t have to be dramatic – even a subtle shift in presented ratings could make a big impact to me. Finally, there are systemic effects at work. If a restaurant is dirty, or rude, then everyone will feel that way whether they’ve agreed in the past or not. These have to be accounted for.
On the whole this should be a relatively easy thing to implement for anyone with a reliable ratings history – and Yelp has plenty of those. For all I know, this would be a case of overfitting and have little real impact – but I think its intriguing enough to try.