A previous version of this article mistakenly referred to a video streaming service as Nexflix. In fact, it is Netflix. This version has been corrected.
Even the most widespread artificial voices based on real people, like Apple’s Siri and Amazon’s Alexa, sound fake. But a wave of start-ups are deploying artificially intelligent voice-cloning services for digital assistants, video games and movie studios.
The generated voices have gotten more realistic in the age of deepfakes, a technology that uses AI to manipulate content to look and sound deceptively real. This media is so good that it is sometimes tough to tell the difference between human voices and their synthetic counterparts.
Five years after Kilmer’s surgery, his representatives contacted Sonantic to digitally restore his lost voice.
“So that’s what we did,” said Zeena Qureshi, CEO and co-founder of Sonantic. “Val’s team wanted to give him his voice back so that he could continue creating.”
The project started in December 2020 after Kilmer finished taping “Val,” a documentary about his Hollywood career and battle with cancer.
Sonantic’s AI technology was not featured in the movie. However, the firm released a clip on YouTube that has more than 18,000 views.
Kilmer’s project comes a month after documentary filmmaker Morgan Neville revealed he used an unidentified voice-cloning software to imitate the late chef Anthony Bourdain in the commercial film “Roadrunner.” Neville drew criticism from Ottavia Bourdain, the late actor’s widow, who disputed being approached about re-creating her husband’s voice through AI.
Sonantic would not reveal other actors they are working with. The three-year-old company works primarily with gaming brands, such as Xbox Game Studios’ Obsidian Entertainment and Remedy Games, and often licenses its synthetic voice service to studios, allowing them to edit and direct artificial voices similarly to how directors can influence human actors.
“We like to think of it as Photoshop for voice, where you can go in and touch up little areas,” said John Flynn, the firm’s chief technology officer.
The firm’s audio engineers typically require three hours of audio to re-create a voice within 24 hours. But due to movie licensing restraints, Sonantic had to re-create Kilmer’s voice with fewer than 30 minutes of audio.
The company says engineers pulled samples from old footage and “cleaned” them to cancel out background noise. They created a script based on the material, linked the audio and text together in “short chunks” and ran the data through their “voice engine” algorithms, which learn to speak by listening to the recordings, Flynn said.
The voice engine derives meaning from the written words and can use these cues to “illustrate intense anger and emotional pain,” according to a statement. In April, Sonantic showed how the audio service can convey couples in the throes of a heated argument. In that demo, two voices have an ordinary conversation that quickly escalates into a shouting match. In real life, the scenario would preserve “actors’ vocal cords” and allow “them to earn passive income,” the company said in a release.
The firm says it created 40 versions of Kilmer’s voice and selected the highest quality option best capturing the actor’s expression. The result is a desktop-based text-to-speech program Sonatic says can mimic Kilmer’s projection levels and emotion.
The voice software can read lines of text aloud, supposedly capturing Kilmer’s former subtleties in speech, expression and tone. Kilmer, beloved for his role as Iceman in “Top Gun,” is free to use the tech however he wants, Sonantic says.
“It’s exclusively his model. He could use it for personal use or professional use if he wants to,” Qureshi said.
As in Kilmer’s case, the technology has possibilities for people who have difficulty speaking or actors needing to rest their vocal cords after long screaming sessions in the studio.
But the technology also sparks legal, ethical and economic concerns, particularly among voice actors concerned about their livelihood drying up.
Deepfake technology has been used to make videos of politicians such as Donald Trump and Barack Obama, spotlighting the dangers of technology designed to make it appear as if people are saying things they never said.
“When I’m an actor, I get to decide whether I support the content,” said Jay Britton, a voice actor who plays animated characters in Netflix’s “Go! Go! Cory Carson” and a long list of video games. “It would be a devastating thing to drop on a voice actor, that your voice is out there saying things that you might not necessarily support.”
Sonantic says its product is not meant to replace actors. The firm’s website touts that its product can “reduce production timelines from months to minutes,” promising “compelling, lifelike performances for games and films with fully expressive AI-generated voices” — premises that could cut down on the number of hours human actors are paid to perform in-studio.
There are no laws in the United States prohibiting companies from generating synthetic voices.
There is, however, a legal framework to deter those seeking to cash in on a famous person’s likeness. In a voice theft case from the 1990s, raspy voice singer Tom Waits sued Dorito Lay for using a sound-a-like voice in an ad and was awarded $2.6 million in damages.
“If (companies) go around reproducing voices of recognizable people without permission, they may be violating the right of privacy and subjecting themselves to a lawsuit,” said Peter Raymond, an intellectual property lawyer at Reed Smith LLP in New York City. “If you’re doing it as a parody or an artistic routine, then it’s not a violation. If it’s for commercial benefit, it would be a violation.”
Sonantic’s software garnered praise from Kilmer, who said in a statement that the start-up “masterfully restored my voice in a way I’ve never imagined possible.”