Nate Silver's new book, The Signal and the Noise: Why So Many Predictions Fail -- but Some Don't, is, on the whole, an excellent overview of statistical thinking. I think most of my readers would enjoy it.
However, it is plagued by some bizarre mistakes that left me unable to completely trust that every detail is correct. There's one in the very first chapter. Nate is describing the impact of bad assumptions on model output and uses the correlation of default risk in mortgages as an example. After a few paragraphs describing the issue, he uses a table to quantify the effect. The text is quite good; the table is quite bad. Specifically, the headings are wrong -- the titles of columns 3 and 4 should be reversed. Here's the page in question:
There isn't room for equivocation here; the table is simply wrong. The text describes a phenomenon, and the accompanying table -- if taken at face value -- describes exactly the opposite. Of all the people I asked, only one (GHL) noticed this error. Many of the others recalled the table, but admitted that they didn't bother to parse it, choosing instead to accept it -- and the numbers it contains -- as hard evidence that the preceding argument was correct. Thus, the table managed to serve its purpose in spite of its contents.
So what's the big deal? The headers are wrong, it's not the end of the world! But it is a very real problem, because the topic at hand is statistical literacy, professional skepticism, and the consequences of making assumptions. See: the fifth paragraph. By now, some readers will have verified that the text and the table are, in fact, at odds. But astute readers should be questioning me as well. After all, I've claimed that the error is in the table, but if the only evidence is a disagreement of words, couldn't it just as easily be in the text? I can't do any more than assure you it isn't (at least, not within the scope of this post), but I think that if it were, casual readers would consider this error as serious as I do. Regardless of where it appeared, the very presence of the mistake made it extremely difficult for me to procede without questioning everything I read. What's the probability that such a large error isn't correlated with the presence of other smaller ones? The irony of this taking place in a book of this nature is more distressing than it is amusing.
As I said up front -- in almost all respects this is an excellent book. It's one of the better treatments I've seen of the proper "data science" mindset. But it isn't without problems. How these errors passed editorial review is beyond me -- particularly when every reviewer mentions the mortgage example! (Is that because it's in the first chapter? Is that because it's accompanied by one of the book's first tables? Whatever the reason, it shouldn't be wrong!) To date, the only other mention I've found of this issue is in an Amazon discussion forum, where it is also pointed out that the probabilities in the table seem off (20.4% is the probability of exactly 1 default [5*(.05*.95^4)], not the probability of at least 1 default [1-.95^5]).
p.s. Speaking of Amazon and subtle mistakes, why do they insist that the book is subtitled "Why Most Predictions Fail" when the correct wording -- as evidenced by the book's own cover -- is quite obviously "Why So Many Predictions Fail"?
p.p.s. While we're on the topic of default correlation, you would never find yourself in a situation where you can measure a default probability (here, 5%) and then overlay a correlation on it. Default probabilities found (or inferred) in the real world have correlation effects already baked into them. Interestingly, and somewhat counterintuitively, the point of a correlation model is to remove those effects, not add them. The toy example presented in the book is a fine teaching tool, but no one should ever think that numbers on this magnitude have any real-world application. Same goes for Nate's footnoted comment (not shown) that bizarrely equates his "risk multiple" with a CDO's leverage.