Deep learning systems are changing our approach to understand and mimic a vast array of processes. Some of those applications are as diverse as video games and healthcare. It has also proved to be an instrument that helps to clarify specific processes which are difficult to understand. A team of researchers from MIT’s Center for Brains, Minds, and Machines (CBMM) and Computer Science and Artificial Intelligence Laboratory (CSAIL) have established a study which seeks to answer some of these questions which are centered around language learning in young children. The enhancement of this process is known as semantic parsing.
The process consists of converting language into a logical and measurable data form. It employs deep learning algorithms to copy the procedure in children to achieve results through observation. The team behind the research will present the details in a paper at this year’s Empirical Methods in Natural Language Processing conference in Brussels, Belgium, between November 2nd and 4th. The team used video for the training to get the results. Candace Ross, a graduate student in the Department of Electrical Engineering and Computer Science and CSAIL and first author of the paper, said, “There are temporal components — objects interacting with each other and with people — and high-level properties you wouldn’t see in a still image or just in language.”
A total of 400 videos demonstrating the tasks were used, and 1200 captions were added. Mechanical Turk, a crowdsourcing platform, contributed to add the captions. The scientists then divided the captions into two groups. 840 captions were used for tuning and training purpose while the remaining 360 were reserved only for testing and offered a streamlined process. The co-author Andrei Barbu, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Center for Brains, Minds, and Machines (CBMM) said, “you don’t need nearly as much data — although if you had [the data], you could scale up to huge data sets.”
The research offers the possibility to deepen the understanding of some of the fundamental learning processes in which children engage in. There were some apparent challenges which children have with articulating some of these nuances due to their different stages of development. AI is playing a significant role in this matter. Boris Katz, a principal research scientist and head of InfoLab Group at CSAIL said, “A child has access to redundant, complementary information from different modalities, including hearing parents and siblings talk about the world, as well as tactile information and visual information, [which help him or her] to understand the world. It’s an amazing puzzle, to process all this simultaneous sensory input. This work is part of a bigger piece to understand how this kind of learning happens in the world.”
The process of acquiring language is very complex, and it requires a multi-disciplinarian approach which takes into account the world where children inhabit. Ross further added, “Children interact with the environment as they’re learning. Our idea is to have a model that would also use perception to learn.” The details about the paper are also shared in their paper called, “Deep sequential models for sampling-based planning” through the Computer Science Department of the University of Washington.