Electrodes and AI bring 'silent speech' one step closer to reality

Every time you speak, your neck and facial muscles move in a specific way. Many people with speech impediments are still able to move their muscles, despite not being able to talk smoothly. Now, researchers are looking at a new way to use technology to reverse engineer these muscle movements and translate them into a synthetic audible voice.

Electromyography (EMG) electrodes placed on the face can detect muscle movements from speech articulators.Image credits: Gaddy & Klein (2020).

The approach developed by UC Berkeley researchers uses electrodes placed on the face and throat. Broadly speaking, the method is called electromyography (or EMG) — where electrode sensors collect information about muscle activity. An algorithm then builds a model of the muscle data and generates synthetic speech. It’s a sort of electronic lip reading, except than it doesn’t use the actual lip movements for tracking facial movements.

“Digitally voicing silent speech has a wide array of potential applications,” the team’s paper reads. “For example, it could be used to create a device analogous to a Bluetooth headset that allows people to carry on phone conversations without disrupting those around them. Such a device could also be useful in settings where the environment is too loud to capture audible speech or where maintaining silence is important.”

It’s not the first time something like this has been developed. Silent speech interfaces have been around for a few years, but there’s still plenty of room for improvement when it comes to the performance of these devices. This is where the new approach comes in with an innovation: the AI algorithm transfers audio outputs “from vocalized recordings to silent recordings of the same utterances.” In other words, this is the first model that trains the algorithm with EMG data collected during silent speech, not ‘real’ speech. This approach offers better performance, the researchers note in the study.

“Our method greatly improves intelligibility of audio generated from silent EMG compared to a baseline that only trains with vocalized data,” the researchers add.

According to the measured data, the word interpretations produced this way were more accurate than existing technology. In one experiment, transcription word error dropped from 64% to 4%, while in another experiment (which used a different vocabulary), it dropped from 88% to 68%.

The paper has been published in the journal arXiv and has not yet been peer reviewed at the time of this writing. However, the paper has received an award at the Empirical Methods in Natural Language Processing (EMNLP) event held online last week, in recognition of its results.

To support more research in this field, researchers have open-sourced a dataset of nearly 20 hours of facial EMG data.

Electrodes and AI bring 'silent speech' one step closer to reality

When Ice Gets Bent, It Sparks: A Surprising Source of Electricity in Nature’s Coldest Corners

We can still easily get AI to say all sorts of dangerous things

Scientists Solved a Key Mystery Regarding the Evolution of Life on Earth

AI has a hidden water cost − here’s how to calculate yours

Smart Locks Have Become the Modern Frontier of Home Security

A Global Study Shows Women Are Just as Aggressive as Men with Siblings

Birds Are Singing Nearly An Hour Longer Every Day Because Of City Lights

U.S. Mine Waste Contains Enough Critical Minerals and Rare Earths to Easily End Imports. But Tapping into These Resources Is Anything but Easy

Scientists Master the Process For Better Chocolate and It’s Not in the Beans

Most Countries in the World Were Ready for a Historic Plastic Agreement. Oil Giants Killed It