Another day, another surprise from the New York Times! This time it's a front page article on "deep-learning," an integral part of my own work and something that defies many attempts at simple explanation. Sadly, that's also true of the Times article, which never actually explains what deep learning is! Indeed, the reader is left to wonder if in this context "deep" refers to the nature of the philosophical problem that artificial intelligence presents.
The closest we get is this:
But what is new in recent months is the growing speed and accuracy of deep-learning programs, often called artificial neural networks or just “neural nets” for their resemblance to the neural connections in the brain.
The same sentence could have been written about the perceptron networks in the 1960's, "classic" neural networks in the 1980's, or spiking networks in the past decade. In fact, it was -- the article references the "AI winters" that followed the first two highly-publicized advances.
Ironically, "deep" learning has nothing at all to do with any similarity to neural structure, but rather to the structure of the AI model. The technique was popularized by an excellent piece of work by Geoffrey Hinton at the University of Toronto -- and Hinton is deservedly mentioned in the Times article. (Hinton is a star of this field; his work on training algorithms enabled the neural networks of the 80's in the first place!) Hinton was working with a class of stochastic models called Restricted Boltzmann Machines. Like many other networks, RBM's have two layers, one for input and one for processing (some other networks have a third layer for output). Scientists know that adding more layers could improve their results -- and their models' perceived "intelligence" -- but for a variety of reasons those layers made teaching the models extremely difficult. Hinton reasoned that perhaps one could fully train the first two layers, then "lock" them, add a third layer, and begin training it. Once the third layer was trained, one would lock it as well and add a fourth layer as desired. By initializing all the layers of the network in this way, it became possible to use classical algorithms to train all the layers together in a "fine-tuning" procedure. The result was a model consisting of many layers -- a "deep" model -- in contrast to the simple or "shallow" models that were commonplace to that time. Hinton called this "greedy layer-wise training," referencing the surprising fact that each layer did its learning without knowing it would pass its knowledge on to another and nonetheless all the layers came to represent a cohesive representation of the data.
One can find parallels in earlier work by Yann LeCunn of NYU (also quoted in the article) whose convolution networks that came to dominate computer vision in the 90's were based on a similarly "deep" principle.
One could argue that deep learning mimics the brain's hierarchical structure -- and therefore does resemble neural connectivity. That's true, and plenty of doctorates have been earned by students able to demonstrate it, but one can create deep networks where no layer bears any resemblance to a neuronal process. Thus the whole model shares only its hierarchy with the brain; this is hardly sufficient to insist that the two are linked. A more reasonable answer is that hierarchical learning is necessary -- whether engineered or evolved -- because information exists within some ontological structure.
So, once again, hats off to the NYT for even mentioning this work. As with their last effort, I wish more effort were placed on explaining why it's important and not giving examples of what it's done, but perhaps this will inspire a few new bright minds to enter the field.