In a groundbreaking development, researchers have created a speaker system equipped with seven “self-deploying” microphones that can dynamically reconfigure themselves to create distinct “speech zones” within a room. This innovative technology enables the accurate tracking and identification of different voices, even as they move.
Published in the journal Nature Communications, this unconventional speaker system comprises what is referred to as a “robotic acoustic swarm.” The “self-deploying” microphones are miniature robots, resembling thimbles, that possess the ability to communicate with one another. These tiny robots navigate autonomously within the room on miniature wheels akin to diminutive Roombas and can return to a charging station as needed.
Co-lead author of the study, Malek Itani, from the Paul G. Allen School of Computer Science & Engineering, emphasized the significance of this innovation, stating, “For the first time, using what we’re calling a robotic ‘acoustic swarm,’ we’re able to track the positions of multiple people talking in a room and separate their speech.”
To navigate their surroundings effectively, these prototype robots employ a technique reminiscent of high-frequency echolocation. This mobility is crucial as it allows the microphones to be distributed widely, facilitating more precise calculations. Presently, the robots are confined to tabletop environments, as their localization capabilities are limited to two-dimensional space.
Co-lead author Tuochao Chen elaborated on their approach: “We developed neural networks that use these time-delayed signals to separate what each person is saying and track their positions in a space. So you can have four people having two conversations and isolate any of the four voices and locate each of the voices in a room.”
The practicality of this technology was validated through real-world experiments conducted in various settings, including offices and kitchens, with three to five people engaged in conversations. Remarkably, the system achieved voice localization with an accuracy rate of 90 percent, maintaining a distance of within 1.6 feet from each other. The median error was even more impressive, measuring just under six inches across all scenarios.
However, the system’s processing speed presents a minor limitation, taking an average of 1.82 seconds to process three seconds’ worth of sound. This may impact the efficiency of applications such as video conferences.
In the future, the researchers intend to apply these muting and separation techniques in real-time physical spaces. They envision utilizing the localizing microphones to achieve what noise-canceling headphones do for individual ears but on a room-wide scale.
This innovative robotic acoustic swarm technology holds the potential to revolutionize audio processing and localization in various domains, including communication and surveillance.