“Our central goal was to make an ‘artificial vocal tract,’ ” Chartier told The Washington Post via email on Friday. “Not a real physical one, but a computer one that could generate full sentences, not just words."
“Our goal is to help those who cannot speak to say what they wish to say,” he said.
The team worked with five volunteers at the UCSF Epilepsy Center who had had electrodes implanted on the surface of their brains in preparation for neurosurgery to control their seizures. They read hundreds of sentences out loud while researchers monitored activity in their brain’s speech centers, which control people’s ability to talk.
The next step was to decode the brain signals. The researchers used machine learning to generate a simulation of the movements of a person’s vocal tract based on the signals they received from the volunteers’ brains, and translated these simulated movements into synthesized speech, Chartier explained.
Researchers then needed to determine whether the synthesized sentences made sense to listeners. The researchers produced several sample sentences and then asked testers, using Amazon Mechanical Turk, to identify individual words and transcribe full sentences.
Simpler generated sentences — such as “Is this seesaw safe?” and “Bob bandaged both wounds with the skill of a doctor” — were generally intelligible to testers, but more complex phrases, such as “At twilight on the twelfth day we’ll have Chablis,” gave them more trouble.
The neural decoder could be an important step toward improving the tools available to help those who cannot speak to be heard.
People who have neurological conditions that make it difficult or impossible to communicate through speech have access to tools that can use movement of their eyes or heads, or a device that controls a cursor, to select letters one by one. But these technologies can be cumbersome, the researchers write in their study. “Although these systems can enhance a patient’s quality of life,” they say, “most users struggle to translate more than 10 words per minute, far slower than the average of 150 words per minute of natural speech.”
The researchers behind the neural decoder hope to one day help people with conditions such as amyotrophic lateral sclerosis, or those whose ability to speak was damaged by a stroke, regain their ability to engage in spoken dialogue. The technology, however, has to advance before it can be used on people who cannot speak.
“In this study, we knew what participants were trying [to] say since they could talk, however we will need to adapt our algorithms to work with people who cannot talk,” Chartier said.
Marc Slutzky, a neurologist at Northwestern University, told Nature magazine that the study is “a really important step.”
But “there’s still a long way to go before synthesized speech is easily intelligible,” he said.