The Signal and the Noise: errata

December 13, 2012 in Data,Finance,Math,Risk

Nate Silver's new book, The Signal and the Noise: Why So Many Predictions Fail -- but Some Don't, is, on the whole, an excellent overview of statistical thinking. I think most of my readers would enjoy it.

However, it is plagued by some bizarre mistakes that left me unable to completely trust that every detail is correct. There's one in the very first chapter. Nate is describing the impact of bad assumptions on model output and uses the correlation of default risk in mortgages as an example. After a few paragraphs describing the issue, he uses a table to quantify the effect. The text is quite good; the table is quite bad. Specifically, the headings are wrong -- the titles of columns 3 and 4 should be reversed. Here's the page in question:

There isn't room for equivocation here; the table is simply wrong. The text describes a phenomenon, and the accompanying table -- if taken at face value -- describes exactly the opposite. Of all the people I asked, only one (GHL) noticed this error. Many of the others recalled the table, but admitted that they didn't bother to parse it, choosing instead to accept it -- and the numbers it contains -- as hard evidence that the preceding argument was correct. Thus, the table managed to serve its purpose in spite of its contents.

So what's the big deal? The headers are wrong, it's not the end of the world! But it is a very real problem, because the topic at hand is statistical literacy, professional skepticism, and the consequences of making assumptions. See: the fifth paragraph. By now, some readers will have verified that the text and the table are, in fact, at odds. But astute readers should be questioning me as well. After all, I've claimed that the error is in the table, but if the only evidence is a disagreement of words, couldn't it just as easily be in the text? I can't do any more than assure you it isn't (at least, not within the scope of this post), but I think that if it were, casual readers would consider this error as serious as I do. Regardless of where it appeared, the very presence of the mistake made it extremely difficult for me to procede without questioning everything I read. What's the probability that such a large error isn't correlated with the presence of other smaller ones? The irony of this taking place in a book of this nature is more distressing than it is amusing.

As I said up front -- in almost all respects this is an excellent book. It's one of the better treatments I've seen of the proper "data science" mindset. But it isn't without problems. How these errors passed editorial review is beyond me -- particularly when every reviewer mentions the mortgage example! (Is that because it's in the first chapter? Is that because it's accompanied by one of the book's first tables? Whatever the reason, it shouldn't be wrong!) To date, the only other mention I've found of this issue is in an Amazon discussion forum, where it is also pointed out that the probabilities in the table seem off (20.4% is the probability of exactly 1 default [5*(.05*.95^4)], not the probability of at least 1 default [1-.95^5]).

p.s. Speaking of Amazon and subtle mistakes, why do they insist that the book is subtitled "Why Most Predictions Fail" when the correct wording -- as evidenced by the book's own cover -- is quite obviously "Why So Many Predictions Fail"?

p.p.s. While we're on the topic of default correlation, you would never find yourself in a situation where you can measure a default probability (here, 5%) and then overlay a correlation on it. Default probabilities found (or inferred) in the real world have correlation effects already baked into them. Interestingly, and somewhat counterintuitively, the point of a correlation model is to remove those effects, not add them. The toy example presented in the book is a fine teaching tool, but no one should ever think that numbers on this magnitude have any real-world application. Same goes for Nate's footnoted comment (not shown) that bizarrely equates his "risk multiple" with a CDO's leverage.

{ 28 comments… read them below or add one }

Ed Whitney December 14, 2012 at 9:50 am

I noticed the error on page 28 that you noticed above, and looked on the publisher’s website for errata. This search has been unrewarding. You would think that today’s publishers would be able to post errata on the book web sites, would you not? What is their excuse for not doing so.

You mention “bizarre mistakes” as a plural but only one is cited. There is another on page 250 on the fifth line of the paragraph that begins “Meanwhile…” The text says that in the figure on page 251, about 90 percent of the false hypotheses are correctly rejected. But in that figure, there are true negative tests in 720 of the 900 false hypotheses; this is 80%, not 90%.

There may be other mistakes in the book; I am interested in others that I may have missed. I am on page 251 and will check back later. I appreciate your putting this on the Net; I googled “Nate Silver book errata” and found this right away.

Reply

Ed Whitney December 19, 2012 at 8:32 pm

Another error is on page 271, which shows the chess position after Kasparov’s third move, but only two moves have been made. The white queen bishop belongs on b2 for the diagram to match the text.

Reply

Jim Arleth December 23, 2012 at 12:52 am

There is an error on the chess position on page 271, but it is not the bishop to b2 mentioned above, that was white’s third move. For his second move, Kasparov played pawn to g3.
Again, very good book, but a few odd errors that I saw.

Reply

Jeffrey Berger December 30, 2012 at 6:17 pm

I had the same misgivings about reliability of the book after stumbling on several problems:
1. On pages 246 and 251 are charts that are supposed to illustrate that a positive test result for conditions with low frequency can lead people to the wrong conclusion. Problem with the charts is that they are supposed to be depicting a population of 1,000 cases. But the charts have 1,250 cells. I went to the chart when I had trouble understanding the text. Didn’t help.
2. Page 257: labels on the pie charts are wrong. Should be Individual vs. Institutional. Instead, we have Institutional vs. Institutional.
3. Page 379: a misquotation – “At NASA, I finally realized that the definition of rocket science is using relatively simple _psychics_ to solve complex problems.” Really?
Looks like the book was rushed into publication to hit the holiday season.

Reply

Tim Smith December 30, 2012 at 11:27 pm

In figure 13-7, I was left wondering what the 2004 terror attack in Montreal that claimed 329 lives was! I would have expected to remember that. The Madrid (1985) label with 191 casualties appears to reference the 2004 Madrid train bombing; Montreal (2004) appears to reference the 1985 Air India bombing on the Montreal-London-Delhi route.

Reply

Haydee December 31, 2012 at 9:15 am

Thank you for posting. Like you, i kept going back to text to see if could make sense of the table.
Having taught Statistics to high school juniors and seniors. I can assure you they learned enough to catch this error. Matter of fact the text and table would make a wonderful test question.
As others have commented, discouraging evidence of sub par editing and understanding of basic statistical terms.

I am reading the e- book. Wouldn’t the errors be easy to fix with ‘upgrades’

Reply

Bad Math January 3, 2013 at 7:46 pm

Pg. 119

” – 5^5 = which should be 3215 ”

The context of the rest of the paragraph make this errata (5^5 = 3125 by my math) oddly amusing… Is he trying to make a point here or is this just an odd typo?

Reply

ASM January 16, 2013 at 12:59 pm

At first I thought this was just an odd typo, since 5^6 = 15625 is correct. However that’s just 3125 * 5, so it’s easy to catch that 5^5 3215 or that that 3215 * 5 15625, so one number is obviously wrong. Oddly enough this typo happens in the paragraphs where he is talking about typos causing problems, especially if you use output for your next input. This could be a huge coincidence, but it shows the exact type of scenerio that we are being warned about.

Reply

Ryan Parks January 4, 2013 at 11:11 am

The book has several mathematical errors. I’m not sure how many of them stem of Mr. Silver and how many stem from the editors. Below is a copy of the email I sent Nate on Nov 27th, 2012.

Your book is fantastic but I think the probabilities in the table on page 27 are wrong. Take, for example, the Epsilon mortgage pool. You say the probability of a losing bet is 20.4pct, which you arrive at (I’m assuming) using Bernoulli’s formula. However, this is the probability for one and only one default. The Epsilon tranche loses money on one or more defaults, so shouldn’t it be the probability of 1 default plus the probability of 2 defaults and so on all the way up to 5 defaults which is equal to ~22.6pct. Said another way, the only way for Epsilon to NOT lose money is for none of the five mortgages to default. The probability of this happening is .95 to the 5th power which is ~77.4%. 1 minus ~77.4pct is a ~22.6pct chance of loss in the Epsilon tranche. By the same logic, aren’t the probabilities for all the other mortgages except the “Alpha Pool” incorrect?

I’m not trying to be a jerk, I think your book is wonderful. By the way, I’m a UofC class of 2001.

Reply

Kenner Rawdon January 7, 2013 at 5:42 pm

The paragraph following Figure 12-12, page 407, makes no sense to me. What values of y and z does it imply such that the revised probability is 0.28?

Reply

Tom Graham January 26, 2013 at 1:05 pm

On page 132, Laplace’s Demon was snoozing: “demon-state” should be “demonstrate.”

Reply

larry hottle March 21, 2013 at 6:08 pm

Perhaps because he believes in the power of Bayes’ theorem so strongly, he simply believes in the crowd’s ability to find all these mistakes p. 237 13%??

Reply

Corwyn April 23, 2013 at 3:50 pm

“Regardless of where it appeared, the very presence of the mistake made it extremely difficult for me to procede[sic] without questioning everything I read.”

Good. You should question everything you read. I am sure Silver would agree.

Reply

Ben April 26, 2013 at 12:24 pm

Thanks for posting this. A lot of these figure/table errors were quite distracting (I spent way too much time looking at the chess error) and will hopefully be corrected in a future edition. I would also add that in Fig. 11-4, a comparison of random-walk and actual stock market charts shows significantly more high frequency noise for the artificially generated plots, making it fairly easy to distinguish which were real and which were generated from Mr. Silver’s algorithm.

Reply

Paul June 2, 2013 at 4:24 pm

The tables on p. 430 and 431 correlating frequency and death tolls from terrorist attacks seem to be switched. As they appear, the September 11, 2001 attacks make the slope of the line somewhat less steep which would indicate that high fatality attacks should have be considered less of a problem after 9/11.

Reply

Paul June 2, 2013 at 5:09 pm

Damn, my bad, the tables are properly labeled. They are presented in a counter-intuitive order, but the tables themselves are fine.

Reply

Karl July 1, 2013 at 11:46 pm

I have just started reading it and hit that table and spent longer on it that I should because that mistake bugged me. I then had to go to the internet to check that I was right – because it just didn’t make sense! I suspect the proofers accepted it as true without checking.

Reply

Andrés August 17, 2013 at 11:53 pm

I found a referencial error on the Conclusion chapter, on the reference #5. There is said that the Internet was invented by Al Gore. This is incorrect, it need to clarify that before was called ARPANET and was created by Leonard Kleinrock and Douglas Engelbart.

Reply

Killinchy August 18, 2013 at 3:39 pm

I think Nate had his tongue very firmly in his cheek. At least, I hope he did.

Reply

A P August 20, 2013 at 1:41 am

Regarding your post-script -

[ p.s. Speaking of Amazon and subtle mistakes, why do they insist that the book is subtitled "Why Most Predictions Fail" when the correct wording -- as evidenced by the book's own cover -- is quite obviously "Why So Many Predictions Fail"? ]

Why indeed? And why do you insist the book’s subtitle is “Why So Many Predictions Fail?” when the correct wording — as evidenced by the book’s own cover — is quite obviously “Why So Many Predictions Fail – but Some Don’t”?

Reply

Clem Dickey September 21, 2013 at 7:06 pm

Page 146 para 5: “moving east-southeast” should probably read “moving west-southwest”.

Reply

D.B.Cooper September 24, 2013 at 8:56 am

‘creating 4,000 possible sequences after the first full turn’ (page 269) Could someone explain that to me please.

Reply

David Ernst September 27, 2013 at 10:40 pm

I’ve got an issue with figures 13-4 and 13-5. The first two points on 13-4 don’t seem to appear on 13-5. This might sound nit-picky, but it seems that to include those first two points would have really interfered with the tidy near-linear graph on 13-5′s log-log scale. Furthermore, he explicitly writes “it’s important to emphasize that I’ve done nothing to this data other than make it easier to visualize — it’s still the same underlying information. But what had once seemed chaotic and random is now revealed to be rather orderly”. Why didn’t he just say “terror attacks [...] that killed at least 5 (or whatever) people?” and leave those points off of 13-4? Instead, we can verify that the two graphs are really not quite the same data, and we have no way of knowing if he left anything else off. I really liked the book, but things like this seem so out of place.

Reply

Clem Dickey October 7, 2013 at 2:09 am

p 195, 4th line from bottom “flushed with feedback loops” should probably read “flush with feedback loops”

p 269 4th line “creating 4,000 possible sequences” should read “creating 400 possible sequences” (also noted by DB Cooper above)

p 335 5th line “FiveThrityEight”, a transposition.

p 386, footnote “we explore these very interesting ideas in chapter 5.” Since chapter 5 was a few chapters back, the verb should probably be past tense, “explored.”

Reply

Adam November 29, 2013 at 2:30 pm

On p. 422, “Secretary of State Condoleeza Rice had been warned in July 2001 about…” Rice, in 2001, was National Security Advisor, not Secretary of State.

Reply

Rod Bronson January 3, 2014 at 1:30 pm

Figure 11-11 is to show a famous optical illusion. Look at it. Which arrow is longer? They are the same length if you take off the reversed arrow heads in the bottom arrows. But the upper arrows which face outwards are supposed to look longer, an optical illusion that “There’s no way you can control yourself not to have that illusion…” (bottom of page 366). As someone said in one of the previous comments, if Nate made all these mistakes, you have to wonder if he is on the level, or if all this was just some kind of tongue in cheek exercise, as another commenter said.

Reply

Max Fetter January 3, 2014 at 4:16 pm

It seems hard to believe he would make so many errors, but harder to believe he would allow them to be published and undermine his credibility. His reputation, and therefore career, is at stake if people don’t find him to be accurate.

Reply

Max Fetter January 3, 2014 at 3:46 pm

On pg 357, Figure 11-9 presents Individual and Institutional Investor Total Equity Holdings, US in 1980 and 2007. The Institutional Investors are labeled for each part of both pie charts. Not hard to see the error if you were reading just leading up to the chart, but the light gray should be labeled Individual Investors.

Reply

Leave a Comment

Previous post:

Next post: