Summary: Researchers have developed a wearable interface called EchoSpeech, which recognizes silent speech by tracking lip and mouth movements through acoustic sensing and AI. The device requires minimal user training and recognizes 31 unannounced commands. The system can be used to give voice to people who are unable to speak sounds or communicate silently with others.
Source: Cornell University
Researchers at Cornell University have developed a silent-speech recognition interface that uses acoustic-sensing and artificial intelligence to continuously recognize 31 nonvocalized commands based on lip and mouth movements.
The low-power, wearable interface — called EchoSpeech — requires only a few minutes of user training data before it can recognize commands and run them on a smartphone.
Ruidong Zhang, doctoral student in informatics, is lead author of “EchoSpeech: Continuous Silent Speech Recognition on Minimally-Obstructive Eyewear Powered by Acoustic Sensing,” which will be presented at the Association for Computing Machinery’s Conference on Human Factors in Computing Systems. CHI) this month in Hamburg, Germany.
“For people who cannot raise their voice, this silent speech technology can be an excellent input for voice synthesizers. It can give patients back their voice,” said Zhang, with further development of the technology’s potential uses. Said about
In its current form, EcoSpeech can be used to communicate with others via smartphone where speech is inconvenient or inappropriate, such as noisy restaurants or quiet libraries. Silent Speech Interface can be combined with a stylus and used with design software such as CAD, but eliminates the need for a keyboard and mouse.
The low-power, wearable interface — called EchoSpeech — requires only a few minutes of user training data before it can recognize commands and run them on a smartphone. image is in the public domain
Crafted with a microphone and a pair of speakers smaller than a pencil eraser, the EcoSpeech glasses become a wearable AI-powered sonar system, sending and receiving sound waves across the face and sensing mouth movements . A deep learning algorithm then analyzes these echo profiles in real time with approximately 95% accuracy.
“We’re taking sonar to the body,” said Cheng Zhang, assistant professor of informatics and director of Cornell’s Smart Computer Interface for Future Interaction (SciFi) Lab.
“We’re very excited about this system,” he said, “because it really advances the field on performance and privacy. It’s small, low-power, and privacy-sensitive, which enables new, wearable devices to be used in the real world.” All are important characteristics for applying qualified techniques.”
credit: Ruidong Zhang
Cheng Zhang said that most technology in silent-speech recognition is limited to a select set of predetermined commands and requires the user to face or wear a camera, which is neither practical nor feasible. There are also major privacy concerns associated with wearable cameras — both for the user and those with whom the user interacts, he said.
Acoustic-sensing technology like EchoSpeech removes the need for wearable video cameras. And because audio data is much smaller than image or video data, it requires less bandwidth to process and can be relayed via Bluetooth to a smartphone in real time, said informatics professor Franois Guimbretier. Said.
“And because data is processed locally on your smartphone rather than uploaded to the cloud,” he added, “privacy-sensitive information never leaves your control.”
Summary written with the help of ChatGPT AI technology.
About this AI research news
Author: Becca Bowyer
Source: Cornell University
contact: Becca Bowyer – Cornell University
image: image is in the public domain
Basic Research: The findings will be presented at the Association for Computing Machinery’s conference on Human Factors in Computing Systems (CHI).