This is the conclusion of a two part blog series on Artificial Intelligence and Deep-learning in Technology.
Read Part 1 here.
This is the conclusion of a two part blog series on Artificial Intelligence and Deep-learning in Technology.
Read Part 1 here.
By far the most common neural networks used in deep learning today are described as convolutional, feed-forward networks. Similar neural networks have been shown to be “universal function approximators,” meaning that they can learn to take any arbitrarily chosen input and produce any required output. Returning to our dog and cat example in part 1, the input would be an image, and the output would be, say, ‘0’ for dog, and ‘1’ for cat. Shown many examples, along with the correct answers for each, such a network can eventually learn the particular patterns of pixel colors in the image that correspond more likely to dog or cat. Neural networks do very well at this type of task, sometimes as well or better than a person. This is quite an amazing and useful result.
In this example, the network has not just memorized the correct answers for the training examples, but has learned the statistical regularities that allow it to distinguish dogs from cats in images that it has not seen before. This ability is called generalization. And, it is the beginning of what could be considered necessary for “AI”. The ability to learn from examples and to generalize is the key to solving problems in which explicitly anticipating all possible inputs is impossible. Depending on the problem and the training data, deep learning networks can generalize well. In this sense marketers of AI are justified at least to some degree when making claim to something beyond mere automation. But to be sure, there is no understanding of what a dog or a cat is: only the statistical relationships between patterns of pixel colors and the labels of dog or cat in the training set. If for some reason the training set showed dogs mostly outside, and cats mostly inside, the network could easily report “dog” whenever an image shows an outdoor scene. In fact, it might report “dog” for outdoor images even when there is no animal in the images at all! Controlling and assessing exactly what is learned by deep learning networks is not a trivial task. The network only learns the minimum that is needed to guess the label and essentially throws the rest away.
How information of different forms can best be combined, and how meaning should be assigned to information are difficult issues that may be difficult to frame within the current language of machine learning. What if we were told, just before this discrimination task, that cats can in fact be photographed outside? This small hint might be very useful to prevent a very large number of errors. As humans, we could use this knowledge easily. Prior to the task, we would not even need to see an image of a cat outdoors. We can bring in this extra information and effortlessly make or break associations with prior knowledge for whatever need arises. Everything we know informs everything else we know. There would be no certain way, however, to make use of this information in a deep learning network. We would have to go back and correct the unwanted bias in the training set and essentially start over with training. (Recently, some interesting work has been done with one-example learning, but this is not generally compatible with the way most networks are designed or trained.)
Similarly, intelligent systems must adapt to varying context and contingencies within their environments. Failure to incorporate context properly often defines the limits of intelligent decision-making. If you have ever driven a car with a cruise-control, you’ve had the following experience: You just reach the crest of a hill and suddenly the cruise control stomps on the accelerator, because the car has slowed too much. As an (intelligent) driver, you might accept the slight slow-down as you can see you are about to be headed downhill. And, of course, you would not accelerate if there is a car just ahead of you. On the other hand, if there is a truck coming up close behind you, you very well may accelerate. Making use of context is very often required to decide on the correct course of action. Certainly in our simple example you can automate the few situations mentioned: add a couple of cameras and detect what other nearby vehicles are doing. But must often addressing context is highly nuanced with many interacting contingencies. We contend that doing so requires encoding knowledge in a very general and hierarchical manner. This is something that is not well-addressed by today’s neural network architectures.
“Can you tell a self driving car to be extra careful when there are little people holding balls detected near the side of a road?”
Similar to the thermostat that records usage patterns, attempts are being made to develop self-driving cars using deep learning networks trained with recordings of human drivers in different situations. Importantly, as in the dog and cat example, the idea is that with enough training data, the network could learn the statistical regularities of driving, how to respond in many different situations, and to generalize to situations that it has not seen. Self-driving car companies proudly claim to have hundreds of millions of car-miles of recorded driving, clearly hoping that such a massive data set will be sufficient to cover any car driving scenario. Time will tell. We believe that that intelligent systems should be able to accommodate specific information in a principled manner. Can you tell a self driving car to be extra careful when there are little people holding balls detected near the side of a road?
Other architectures with different strengths fall under the umbrella of deep learning. Perhaps the most promising type of deep learning networks are “generative” networks, which have seen increased study in the past couple of years. These are still mainly a research topic, but have important properties that could provide a more suitable framework for learning about the world. They are designed so that the network produces outputs similar to the training examples. For instance, the network might produce images of faces, or images of a room. Importantly, these are not the training examples themselves, but are images with similar statistics (e.g, the faces mostly have two eyes, etc). The key point is that, to produce new examples, they must have learned some model of the training examples.
Despite recent successes, astonishingly little of what is known about brains is incorporated into any deep learning networks. Deep learning comes from ideas of mathematical optimization rather than in any particular model of brain or intelligence.
An internal world model is valuable in many ways. A system with an internal model can have expectations: it can compare new information arriving with its model. It can also make inferences about the world given incomplete information, because it can use the expectations from the model to supply what’s not available externally. One might imagine that a model of the world that explicitly includes indoors, outdoors, cats and dogs, would make efficiently forming or breaking an association between dog and outdoors possible. Thus far, deep learning research has just begun to explore how such models can be formed. But existing networks are very narrow in function, and are difficult to train. It is not clear whether such the approaches will provide the broad solutions necessary to satisfy the various requirements of intelligent systems that we have touched on here.
Despite recent successes, astonishingly little of what is known about brains is incorporated into any deep learning networks. Deep learning comes from ideas of mathematical optimization rather than in any particular model of brain or intelligence. While some deep learning practitioners like to point out similarities to the human brain, they are mainly trying to create interest in the broader audience and these similarities are at best painted with very broad strokes. The rank and file of deep learning greatly prefer to stick to the language and conceptual frameworks of machine learning. They see no reason to model what they do after any other system, especially when what they do seems to be so successful.
Our perspective differs quite a bit. We see the great value in the machine learning perspective, but also see the value in understanding how our brains are built. Having started training at a time when neural network research was out of favor, we chose to study neuroscience; specifically, how neurons process, store, and transmit information, and how networks of these neurons assemble dynamically. Even a single neuron is a marvelously complex machine. Yet, the artificial “neurons” in deep learning are a basic model of neurons dating to 1943. Sadly, the deep learning community finds it easy to dismiss 70+ years of accumulated knowledge in neurophysiology as so much noise. Moreover, the dominant model of “learning” in deep learning, called backpropagation of errors, does not map easily to the architecture or to the components of biological brains. Given the incredible capabilities of our own brains, it is unclear why an approach that differs in these fundamental properties should be presumed to be superior. While it is possible that some of the complexity of biological brains is unnecessary for building a workable AI system, our position is that identifying and understanding the specific computational problems that brains solve can only help us get there. Today’s AI systems have a minimal understanding of the problems they address and minimal flexibility in how they can learn and adapt. Our goal at Mad Street Den is to fundamentally change that.
Today’s AI systems have a minimal understanding of the problems they address and minimal flexibility in how they can learn and adapt. Our goal at Mad Street Den is to fundamentally change that.