Over the decades since the inception of artificial intelligence, research in the field has fallen into two main camps. The “symbolists” have sought to build intelligent machines by coding in logical rules and representations of the world. The “connectionists” have sought to construct artificial neural networks, inspired by biology, to learn about the world. The two groups have historically not gotten along.
But a new paper from MIT, IBM, and DeepMind shows the power of combining the two approaches, perhaps pointing a way forward for the field. The team, led by Josh Tenenbaum, a professor at MIT’s Center for Brains, Minds, and Machines, created a computer program called a neuro-symbolic concept learner (NS-CL) that learns about the world (albeit a simplified version) just as a child might—by looking around and talking.
The system consists of several pieces. One neural network is trained on a series of scenes made up of a small number of objects. Another neural network is trained on a series of text-based question-answer pairs about the scene, such as “Q: What’s the color of the sphere?” “A: Red.” This network learns to map the natural language questions to a simple program that can be run on a scene to produce an answer.
The NS-CL system is also programed to understand symbolic concepts in text such as “objects,” “object attributes,” and “spatial relationship.” That knowledge helps NS-CL answer new questions about a different scene—a type of feat that is far more challenging using a connectionist approach alone. The system thus recognizes concepts in new questions and can relate them visually to the scene before it.
“This is an exciting approach,” says Brenden Lake, an assistant professor at NYU. “Neural pattern recognition allows the system to see, while symbolic programs allow the system to reason. Together, the approach goes beyond what current deep learning systems can do.”
In other words, the hybrid system addresses key limitations of both earlier approaches by combining them. It overcomes the scalability problems of symbolism, which has historically struggled to encode the complexity of human knowledge in an efficient way. But it also tackles one of the most common problems with neural networks: the fact that they need huge amounts of data.
It is possible to train just a neural network to answer questions about a scene by feeding in millions of examples as training data. But a human child doesn’t require such a vast amount of data in order to grasp what a new object is or how it relates to other objects. Also, a network trained that way has no real understanding of the concepts involved—it’s just a vast pattern-matching exercise. So such a system would be prone to making very silly mistakes when faced with new scenarios. This is a common problem with today’s neural networks and underpins shortcomings that are easily exposed (see “AI’s language problem”).
Connectionism purists may object to the fact that the system requires some knowledge to be hard-coded in. But the work is important because it nudges us closer to engineering a form of intelligence that seems more like our own. Cognitive scientists believe that the human mind goes through some similar steps, and that this underpins the flexibility of human learning.
More practically, it could also unlock new applications of AI because the new technology requires far less training data. Robot systems, for example, could finally learn on the fly, rather than spend significant time training for each unique environment they’re in.
“This is really exciting because it’s going to get us past this dependency on huge amounts of labeled data,” says David Cox, the scientist who leads the MIT-IBM Watson AI lab.
The researchers behind the study are now developing a version that works on photographs of real scenes. This could prove valuable for many practical applications of computer vision.