Exploring Python

December 27, 2012 in Data

GHL has written two very nice Python-related posts. The first is for highly-technical readers who want to use the new Blaze package ("NumPy 2.0") so badly they'll do so even if it doesn't work yet -- and yes, I'm in that camp! Here are instructions for building Blaze. Note that due to its rapid development it's possible that these instructions will not work in the near future.

The second is a hands-on exploration of Python immutability, a concept that deeply frustrated me when I first learned about it because sometimes it doesn't seem to have any internal consistency.


Nate Silver's new book, The Signal and the Noise: Why So Many Predictions Fail -- but Some Don't, is, on the whole, an excellent overview of statistical thinking. I think most of my readers would enjoy it.

However, it is plagued by some bizarre mistakes that left me unable to completely trust that every detail is correct. There's one in the very first chapter. Nate is describing the impact of bad assumptions on model output and uses the correlation of default risk in mortgages as an example. After a few paragraphs describing the issue, he uses a table to quantify the effect. The text is quite good; the table is quite bad. Specifically, the headings are wrong -- the titles of columns 3 and 4 should be reversed. Here's the page in question:

There isn't room for equivocation here; the table is simply wrong. The text describes a phenomenon, and the accompanying table -- if taken at face value -- describes exactly the opposite. Of all the people I asked, only one (GHL) noticed this error. Many of the others recalled the table, but admitted that they didn't bother to parse it, choosing instead to accept it -- and the numbers it contains -- as hard evidence that the preceding argument was correct. Thus, the table managed to serve its purpose in spite of its contents.

So what's the big deal? The headers are wrong, it's not the end of the world! But it is a very real problem, because the topic at hand is statistical literacy, professional skepticism, and the consequences of making assumptions. See: the fifth paragraph. By now, some readers will have verified that the text and the table are, in fact, at odds. But astute readers should be questioning me as well. After all, I've claimed that the error is in the table, but if the only evidence is a disagreement of words, couldn't it just as easily be in the text? I can't do any more than assure you it isn't (at least, not within the scope of this post), but I think that if it were, casual readers would consider this error as serious as I do. Regardless of where it appeared, the very presence of the mistake made it extremely difficult for me to procede without questioning everything I read. What's the probability that such a large error isn't correlated with the presence of other smaller ones? The irony of this taking place in a book of this nature is more distressing than it is amusing.

As I said up front -- in almost all respects this is an excellent book. It's one of the better treatments I've seen of the proper "data science" mindset. But it isn't without problems. How these errors passed editorial review is beyond me -- particularly when every reviewer mentions the mortgage example! (Is that because it's in the first chapter? Is that because it's accompanied by one of the book's first tables? Whatever the reason, it shouldn't be wrong!) To date, the only other mention I've found of this issue is in an Amazon discussion forum, where it is also pointed out that the probabilities in the table seem off (20.4% is the probability of exactly 1 default [5*(.05*.95^4)], not the probability of at least 1 default [1-.95^5]).

p.s. Speaking of Amazon and subtle mistakes, why do they insist that the book is subtitled "Why Most Predictions Fail" when the correct wording -- as evidenced by the book's own cover -- is quite obviously "Why So Many Predictions Fail"?

p.p.s. While we're on the topic of default correlation, you would never find yourself in a situation where you can measure a default probability (here, 5%) and then overlay a correlation on it. Default probabilities found (or inferred) in the real world have correlation effects already baked into them. Interestingly, and somewhat counterintuitively, the point of a correlation model is to remove those effects, not add them. The toy example presented in the book is a fine teaching tool, but no one should ever think that numbers on this magnitude have any real-world application. Same goes for Nate's footnoted comment (not shown) that bizarrely equates his "risk multiple" with a CDO's leverage.


Another day, another surprise from the New York Times! This time it's a front page article on "deep-learning," an integral part of my own work and something that defies many attempts at simple explanation. Sadly, that's also true of the Times article, which never actually explains what deep learning is! Indeed, the reader is left to wonder if in this context "deep" refers to the nature of the philosophical problem that artificial intelligence presents.

The closest we get is this:

But what is new in recent months is the growing speed and accuracy of deep-learning programs, often called artificial neural networks or just “neural nets” for their resemblance to the neural connections in the brain.

The same sentence could have been written about the perceptron networks in the 1960's, "classic" neural networks in the 1980's, or spiking networks in the past decade. In fact, it was -- the article references the "AI winters" that followed the first two highly-publicized advances.

Ironically, "deep" learning has nothing at all to do with any similarity to neural structure, but rather to the structure of the AI model. The technique was popularized by an excellent piece of work by Geoffrey Hinton at the University of Toronto -- and Hinton is deservedly mentioned in the Times article. (Hinton is a star of this field; his work on training algorithms enabled the neural networks of the 80's in the first place!) Hinton was working with a class of stochastic models called Restricted Boltzmann Machines. Like many other networks, RBM's have two layers, one for input and one for processing (some other networks have a third layer for output). Scientists know that adding more layers could improve their results -- and their models' perceived "intelligence" -- but for a variety of reasons those layers made teaching the models extremely difficult. Hinton reasoned that perhaps one could fully train the first two layers, then "lock" them, add a third layer, and begin training it. Once the third layer was trained, one would lock it as well and add a fourth layer as desired. By initializing all the layers of the network in this way, it became possible to use classical algorithms to train all the layers together in a "fine-tuning" procedure. The result was a model consisting of many layers -- a "deep" model -- in contrast to the simple or "shallow" models that were commonplace to that time. Hinton called this "greedy layer-wise training," referencing the surprising fact that each layer did its learning without knowing it would pass its knowledge on to another and nonetheless all the layers came to represent a cohesive representation of the data.

One can find parallels in earlier work by Yann LeCunn of NYU (also quoted in the article) whose convolution networks that came to dominate computer vision in the 90's were based on a similarly "deep" principle.

One could argue that deep learning mimics the brain's hierarchical structure -- and therefore does resemble neural connectivity. That's true, and plenty of doctorates have been earned by students able to demonstrate it, but one can create deep networks where no layer bears any resemblance to a neuronal process. Thus the whole model shares only its hierarchy with the brain; this is hardly sufficient to insist that the two are linked. A more reasonable answer is that hierarchical learning is necessary -- whether engineered or evolved -- because information exists within some ontological structure.

So, once again, hats off to the NYT for even mentioning this work. As with their last effort, I wish more effort were placed on explaining why it's important and not giving examples of what it's done, but perhaps this will inspire a few new bright minds to enter the field.


Thanks, xkcd:

As of this writing, the only thing that's 'razor-thin' or 'too close to call' is the gap between the consensus poll forecast and the result.


"iPad". Just "iPad".

November 5, 2012 in General

Having played with the new iPad mini, I am quite sure that it will become the dominant iPad product. It is the just the right size and weight; it's relatively cheap; the bezel is unobtrusive; the screen seems sharp even next to a true retina display. In a perfect world, I think that the mini would be rebranded as just "iPad" and its larger brother would be the "iPad HD". Now, I think that's unlikely -- we have only to look at the iPod mini to see that a main product continuously marketed as an alternative. Perhaps that's an effective tactic -- people seem quite interested in anchoring studies these days. I am recommending the iPad mini over the iPad to almost everyone that asks -- it's hard to think of an application which *demands* the larger screen but not, say, a retina MacBook Pro. Sure, movies will look better on the big iPad -- hell, everything will look better with 4x as many pixels -- but the mini is the Goldilocks product that gets the value balance right.


Politics & Statistics

November 4, 2012 in Data,News,Politics

I'm a big fan of Nate Silver -- he consistently demonstrates that he is one of the best and brightest statisticians around. I like to say that statisticians (and risk managers) are professional skeptics; our job is to let data speak for itself, not to speak on its behalf. Nate Silver does that better than anyone. Through a combination of clear visuals, transparent methods and clear writing, he systematically works to remove biases of any form from his chosen medium of poll numbers.

Well, it turns out there are politics in everything -- including analyzing politics. A number of articles were written this week accusing Silver of not only failing to do his job but also obfuscating reality for his own agenda. Dean Chambers kicked it off with an oddly personal rant and others like Dylan Byers jumped on the bandwagon shortly thereafter. All of the authors have in common a poor grasp of statistics and an even worse understanding of its objective: to present otherwise-compromised or noisy data in as clean an environment as possible.

The articles were met, at first, with vitriol from the Silver defenders (it can't be ignored that the two sides in this drama fall along party lines). Fortunately, cooler heads have prevailed and a number of wonderfully clear pieces have been written defending Silver's methods and sometimes even his conclusions. In particular, I liked Scott Galupo's take, Ezra Klein's opinion and this post on Simply Statistics. If there's a silver lining (pun most certainly intended) in this outpouring of "Shut up, nerd!" comments (to borrow from Mr. Galupo), it's that it provides a teaching opportunity for something that really does seem hard to understand at first ("How can he lead by 0.5% and have a 75% chance of winning!?") but turns out to be quite simple and even familiar -- I think Silver's football score analogy is fantastic.

So that's the background. Things got quite a bit more interesting today when Silver wrote the following on the NYT's 538 blog, in a post simply -- and provocatively? -- titled, "For Romney to Win, State Polls Must Be Statistically Biased":

Nevertheless, [wishful thinking is] potentially more intellectually coherent than the ones that propose that the leader in the race is “too close to call.” It isn’t. If the state polls are right, then Mr. Obama will win the Electoral College. If you can’t acknowledge that after a day when Mr. Obama leads 19 out of 20 swing-state polls, then you should abandon the pretense that your goal is to inform rather than entertain the public.

So it would appear that we've passed the point of defending statisticians, burst through the level of explaining statistics, and are now fully on the offensive! A lot of commentators are calling this a very aggressive shot. I still think Silver is playing defense -- but he's doing it with a crowd that will never see it that way. Silver's objective is to map early polling data to actual electoral outcomes. He has a model that performs that transformation. And for some time, his model has been saying that if the polling data is good, the outcome is far from a toss-up. That connection is key: if the polling data is good, the model's conclusions likely hold. Silver isn't claiming some crazy left-field insight based on proprietary information; he's simply saying that after considering all available polls, it appears much more likely that the President will be reelected than not. It's the same set of inputs and same set of possible outputs that every single pundit in the country is considering. The only difference is that Silver has tried to quantify the degree of uncertainty, and he's found that it's different than you might suppose.

This isn't magic; this is just a statistician explaining what the data says, assuming the relationships that data represented in the past will continue to hold at this time. I admire Silver for being brave enough to put his money where his mouth is even though he most certainly knows that all the probabilistic hedging in the world won't save his reputation if the incumbent doesn't win. The odds may be 3-to-1 against, but as soon as the probabilities collapse to a certainty, that will be forgotten. He'll either be a visionary or a false prophet. In a way that's sad, either way, because in so many cases the distribution of possible outcomes can be more interesting or valuable that the one that actually materializes, and I wish we could get people to focus more on that idea. But realistically, you can't ignore the outcome.

We talk about "garbage in, garbage out" all the time, and this is a fantastic example. Silver has pulled apart all the relationships that used to hold for electoral data, but if there is something wrong with his input data, all the modeling in the world won't save him. It could be something as simple as cell-phone users (and therefore younger people) being underrepresented in modern polls; it could be something as complex as outright fraud. It could be nothing. Ultimately, Silver's job is to let the data speak. On Tuesday, we'll find out how much of what it says is true.



Steve Jobs introducing the iPhone, January 9, 2007:

We've been innovating like crazy the last few years on this, and we've filed for over 200 patents for all the inventions in iPhone. And we intend to protect them.

And they have.

(Here are links to the title and post source videos)


Tolerance and respect

August 19, 2012 in Technology

I have been reviewing Microsoft's F# Component Design Guidelines. One section has a note to avoid using underscores in names, as it "clashes with .NET naming conventions." However, the authors found it necessary to add this caveat:

That said, some F# programmers use underscores heavily, partly for historical reasons, and tolerance and respect is important.

I never realized functional programmers were such champions of human rights.


Missing pieces

August 1, 2012 in Data,Technology

Two important things to keep in mind about Mountain Lion, entirely reblogged from Adam Laiacano because his pictures are worth exactly two thousand of my words:

One saving grace -- at least on my machine -- a Java runtime was automatically installed while I was installing Python. Why not just preinstall it in the first place?

(N.B.: If these images didn't immediately strike a chord with you, don't worry about them!)

(With thanks to Adam -- check out his full blog here!)


Who needs futures? Now you can arbitrage gas prices right at the pump:

(And if that's not enough, it costs a dollar more just down the block!)