
UCSF clinical research coordinator Max Dougherty connects a neural data port in Ann’s head on Monday, May 22, 2023, in El Cerrito, Calif. Ann is a participant in Dr. Eddie Chang’s study of speech neuroprostheses. (Photo by Noah Berger/UCSF 2023)
BERKELEY, Calif. — For the millions of people worldwide who have lost their ability to speak due to stroke, ALS, or other neurological injuries, a revolutionary technology is breaking down barriers to communication. Researchers have developed a potentially groundbreaking system that translates brain activity directly into speech in real-time, allowing those with severe paralysis to communicate naturally once again.
Moving beyond previous technologies that forced uncomfortable pauses in conversation, this new “brain-to-voice neuroprosthesis” works almost simultaneously with the user’s intent to speak. The system processes brain signals in tiny 80-millisecond chunks, producing speech that flows naturally as the person thinks about forming words.
“Our streaming approach brings the same rapid speech decoding capacity of devices like Alexa and Siri to neuroprostheses,” says Gopala Anumanchipalli, a professor of electrical engineering and computer sciences at the University of California at Berkeley, and co-principal investigator of the study, in a statement. “Using a similar type of algorithm, we found that we could decode neural data and, for the first time, enable near-synchronous voice streaming. The result is more naturalistic, fluent speech synthesis.”
A Second Chance at Communication
The research focused on a 47-year-old woman, referred to as “Ann,” who experienced a brainstem stroke 18 years before the study. This devastating event left her with quadriplegia and anarthria – the inability to coordinate speech muscles despite having full cognitive ability to understand and formulate language. After years of communicating through a transparent letter board and eye-tracking devices at a painfully slow rate of 2.6 words per minute, the new technology offered her a chance to speak again at rates approaching normal conversation.
“This new technology has tremendous potential for improving quality of life for people living with severe paralysis affecting speech,” says neurosurgeon Edward Chang, senior co-principal investigator of the study. Chang leads the clinical trial at UCSF that aims to develop speech neuroprosthesis technology using high-density electrode arrays that record neural activity directly from the brain surface. “It is exciting that the latest AI advances are greatly accelerating BCIs for practical real-world use in the near future.”
The technology works through a 253-channel electrode array surgically implanted on the surface of her brain, covering the brain region that controls speech muscles. As she attempts to silently “mime” words without making sound, the system captures and interprets the neural signals, converting them into both audible speech and text in real-time.
“We are essentially intercepting signals where the thought is translated into articulation and in the middle of that motor control,” explains co-lead author Cheol Jun Cho, a UC Berkeley Ph.D. student in electrical engineering and computer sciences. “So what we’re decoding is after a thought has happened, after we’ve decided what to say, after we’ve decided what words to use and how to move our vocal-tract muscles.”

Edward Chang, MD, of UCSF. (Credit: Photo by Noah Berger/UCSF 2023)
Breaking the Delay Barrier
Previous technologies collected all neural data during a speech attempt before producing anything, causing delays of about 8 seconds for a single sentence. The new system processes information in small increments, allowing speech to emerge almost as soon as it’s conceived in the brain – much like how we naturally speak.
“We can see relative to that intent signal, within 1 second, we are getting the first sound out,” says Anumanchipalli. “And the device can continuously decode speech, so Ann can keep speaking without interruption.”
When tested with a set of 50 common phrases focused on caregiving needs, the system achieved an impressive 90.9 words per minute. With a larger 1,024-word vocabulary, it still managed 47.5 words per minute. For context, natural conversation typically flows at 120-150 words per minute, showing this technology is approaching practical speeds for everyday interaction.
For the smaller phrase set, about 88% of words were correctly interpreted. Even with the larger vocabulary where error rates increased, the system still provided meaningful communication at speeds that make conversation viable.
To ensure the system wasn’t simply memorizing familiar phrases, the researchers tested it with completely unfamiliar vocabulary – words from the NATO phonetic alphabet like “Alpha,” “Bravo,” and “Charlie.” The system achieved 46% accuracy on these novel words, proving it was learning fundamental aspects of speech production rather than just recognizing patterns.
“We wanted to see if we could generalize to the unseen words and really decode Ann’s patterns of speaking,” he says. “We found that our model does this well, which shows that it is indeed learning the building blocks of sound or voice.”
Beyond a Single Solution
The research extends beyond a single implementation. The team successfully applied their approach to other recording methods, including single-unit recordings from another person with paralysis and surface electrodes measuring muscle activity from healthy speakers mimicking silent speech.
“By demonstrating accurate brain-to-voice synthesis on other silent-speech datasets, we showed that this technique is not limited to one specific type of device,” explains Kaylo Littlejohn, Ph.D. student and co-lead author. “The same algorithm can be used across different modalities provided a good signal is there.”
Ann, the participant from the study, reported that “streaming synthesis was a more volitionally controlled modality,” according to Anumanchipalli. “Hearing her own voice in near-real time increased her sense of embodiment.”
Future work will focus on adding expressivity to the output voice, capturing changes in tone and emphasis that occur during natural speech. “That’s ongoing work, to try to see how well we can actually decode these paralinguistic features from brain activity,” says Littlejohn. “This is a longstanding problem even in classical audio synthesis fields and would bridge the gap to full and complete naturalism.”
Restoring the Human Connection
This brain-to-voice technology marks a significant advance in giving natural communication capabilities back to those with severe paralysis. While previous interfaces showed promise in decoding intended speech, they struggled with speed, vocabulary range, and conversational flow. The streaming nature of this new approach addresses these challenges directly.
For millions affected by conditions that impair speech – from stroke and traumatic brain injury to ALS and other neurodegenerative diseases – this research offers tangible hope. The vision is a speech neuroprosthesis that operates seamlessly, allowing people to join conversations with the same ease and flow as those without disabilities.
Though not yet ready for widespread clinical use, researchers will continue refining their approach, working toward improved accuracy, even lower latency, and systems suitable for daily use outside research settings. If things go as planned in the development process, the tech will bring us one step closer to ensuring that no one is ever truly silenced by paralysis again.
Paper Summary
Methodology
The research team implanted a high-density electrocorticography (ECoG) array with 253 electrodes directly on the brain surface of the participant during surgery. This array covered key speech-related brain regions in the left hemisphere. During testing, the participant would see sentences on a screen and attempt to silently speak them when given a visual “GO” cue – essentially trying to move her vocal tract muscles without making sound. To collect the data needed to train their algorithm, the researchers first had Ann look at a prompt on the screen — like the phrase: “Hey, how are you?” — and then silently attempt to speak that sentence. “This gave us a mapping between the chunked windows of neural activity that she generates and the target sentence that she’s trying to say, without her needing to vocalize at any point,” said Littlejohn. The system captured high-frequency brain activity patterns and processed them through specialized deep learning models that could recognize speech intentions. Since Ann cannot vocalize, the researchers solved this challenge using AI. “We used a pretrained text-to-speech model to generate audio and simulate a target,” said Cho. “And we also used Ann’s pre-injury voice, so when we decode the output, it sounds more like her.”
Results
The system showed impressive performance on multiple fronts. When decoding from a set of 50 common phrases, it reached speeds of 90.9 words per minute with a word error rate of just 12.3%. When challenged with a much larger 1,024-word vocabulary, it still managed 47.5 words per minute, though accuracy declined with the increased complexity (58.8% word error rate). The delay between thought and speech was remarkably low – about 1-2 seconds – creating a near-natural conversation experience. To measure latency, the researchers employed speech detection methods, which allowed them to identify the brain signals indicating the start of a speech attempt. Perhaps most surprisingly, the system demonstrated some ability to decode words it had never seen before, achieving 46% accuracy on novel words. This suggests the technology is capturing fundamental aspects of speech production rather than simply memorizing patterns. Furthermore, when tested in continuous operation over several minutes, the system could automatically detect when the participant was trying to speak, eliminating the need for explicit prompts or cues.
Limitations
Despite its breakthroughs, this research has important constraints. The study focused on a single participant, leaving questions about how the approach would work across different individuals with various speech impairments. While the architecture showed promise when tested offline with other participants and data types, real-time implementation with multiple participants would provide stronger evidence of broad applicability. The error rates, particularly for the larger vocabulary set (58.8%), remain higher than ideal for perfect communication, though still allow meaningful exchanges. The researchers also remain focused on building expressivity into the output voice to reflect changes in tone, pitch or loudness during speech. “That’s ongoing work, to try to see how well we can actually decode these paralinguistic features from brain activity,” said Littlejohn. “This is a longstanding problem even in classical audio synthesis fields and would bridge the gap to full and complete naturalism.” The current implementation also requires surgical implantation of electrodes directly on the brain surface – a significant procedure that limits widespread adoption.
Funding and Disclosures
In addition to the NIDCD, support for this research was provided by the Japan Science and Technology Agency’s Moonshot Research and Development Program, the Joan and Sandy Weill Foundation, Susan and Bill Oberndorf, Ron Conway, Graham and Christina Spencer, the William K. Bowes, Jr. Foundation, the Rose Hills Innovator and UC Noyce Investigator programs, and the National Science Foundation. Several researchers are listed as inventors on pending provisional patent applications related to the neural decoding approaches. Two researchers are co-founders of Echo Neurotechnologies, a company developing neural decoding technologies, representing a potential conflict of interest that was disclosed in the publication.
Publication Details
The study, “A streaming brain-to-voice neuroprosthesis to restore naturalistic communication,” was published in Nature Neuroscience on March 31, 2025. The research was led by Kaylo T. Littlejohn and Cheol Jun Cho as co-first authors, with Edward F. Chang and Gopala K. Anumanchipalli serving as joint senior authors. The work was conducted as part of the BCI Restoration of Arm and Voice (BRAVO) clinical trial at the University of California, San Francisco and the University of California, Berkeley. The clinical trial registration number is NCT03698149 on ClinicalTrials.gov.