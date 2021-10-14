The implications could be sweeping. In the grand vision of auto-dubbers, any piece of video content will one day be available in a custom language at the flick of a button — and, even better, fully feel like the original. The resulting world would be one of seamless interchangeability: a piece of entertainment would emanate not from a particular place but pop up unexpectedly as the seeming creation of whatever language its viewer wants to watch it. As Forrest Gump famously said, das Leben ist wie eine Schachtel Schokolade.
“The potential here is so big,” said Scott Mann, a Hollywood director who co-founded one of these startups, Flawless. “Most of us are not even aware how much great content is in the world. Now we can watch it.”
Yet for all its shimmering cross-culturalism, hidden social implications abound in an easy-dub world. Entertainment could lose its local flavor, for one thing. Or cosmopolitan consumers may never be exposed to the sounds of a foreign language.
Dubbing has long been a painstaking exercise, and that’s before anyone even watches.
Traditional dubbing can work like this. A studio or local distributor, having decided it wants to put out a movie with local-language dialogue, pays to translate a script, hire a set of voice actors to play the characters, rent out engineering equipment, put the actors through numerous voice takes, record them and then splice their readings back into the original video, a mighty grapple to achieve a smooth final product. The whole process could take months.
Auto-dubbing can work like this. The original actor records five minutes of random text in their own language. Then the machines take over. A neural network learns the actor’s voice. A program takes that vocal information and applies it to a digital translation of the script. The AI then spits out perfectly timed lines from the film in the foreign language and drops them into the action. The whole process could take weeks.
“We have the technology to fill a big gap," said Oz Krakowski, chief marketing officer of deepdub, a Dallas- and Tel Aviv-based startup that employs essentially this process. “We can give studios what they want and give consumers a totally unique experience.” The company is about to put its claim to the test, releasing “Every Time I Die,” a 2019 thriller whose English-language version is on Netflix, as Spanish- and Portuguese- language versions dubbed entirely by AI.
Companies take a range of approaches to auto-dubbing. deepdub focuses more on the audio, digitally re-deploying the original actor’s voice off a machine translation but leaving the video unchanged. Another firm, London-based Papercup, doubles down on this tack, using so-called “synthetic voices."
Flawless goes the other way, relying on live (and labor-intensive) voice actors but editing on-screen lips and faces so they look like they’re actually speaking the language. All three companies bring humans into the process at various points for quality control.
(There is some research from tech giants like Amazon but no commercial product yet. Several other firms, such as the video-focused Synthesia and voice-centered Respeecher, are also working on related technologies. Amazon founder Jeffrey Bezos owns The Washington Post.)
Though all services use some form manipulation, most say they’re not engaging in deepfakes, either wary of making their material vulnerable to political manipulation or, at least, of the controversy a recent CNN documentary about Anthony Bourdain generated for using his AI voice. Of course, one’s man deepfake is another man’s digital enhancement.
Venture capital firms have bet on auto-dubbing, with Papercup in December raising $10.5 million from a group of investors that included Arlington, Va.’s Sands Capital Ventures. Flawless recently concluded its undisclosed Series A financing; deepdub is in the middle of one.
It’s easy to understand the interest. Foreign-language content is a vast unmined frontier for Hollywood; even in mainly subtitled versions, Netflix’s Korean survival drama “Squid Game” has become the service’s No. 1 show in many countries, including the United States. If “Squid Game” can do what it’s done largely with subtitles, the auto-dubbers say, imagine what can happen when foreign dialogue is available in everyone’s own language. An endless parade of foreign-language smashes Stateside is not hard to conceive.
(There are already some dubbed versions of “Squid Game," but they’re…not popular.)
As a kind of new spin on the Tower of Babel, in which everyone speaks different languages but still understands each other, auto-dubbing also means non-English speakers won’t need to learn English to enjoy a Hollywood movie, with the dialogue projected sans subtitles.
But these thrilling possibilities also come with concerns. A world without foreign dialogue in entertainment means a world in which millions of viewers, in both the U.S. and abroad, will never be exposed to a language outside their own.
"If everything you listen to is dubbed, you lose all the phonetics, all the information, all the empathy,” said Siva Reddy, an assistant professor of linguistics and computer science at McGill University. “You can have a monolithic way of looking at everyone.”
The idea of quotable dialogue could also be thrown into question. Some of the most famous lines in film and TV history — like Gump’s “life is like a box of chocolates" — developed because they were heard, and eventually repeated, the same way. A perfect piece of on-screen speech that sounds 30 different ways from the outset may never turn into a classic.
Papercup co-founder Jesse Shemen, whose company has worked with Discovery and Sky, says that the benefits would outweigh these worries.
“I believe hearing thoughts and philosophies that we never would have heard is far more accretive and additive than being limited by your own language,” he said, citing everything from a financial expert in Nigeria to an NBA commentator in South America.
Watching auto-dubbing can be an unsettling affair. It’s almost like the little rips and wrinkles of normal dubbing serve a purpose, reminding us this movie was created in another culture. Slightly jarring, for instance, is watching Robert De Niro ask if you’re talking to him while he’s talking Japanese.
The technical air bubbles are also far from smoothed out. Getting the voices to sound human, with all the required lilts and inflection, is something that AI can still struggle with. And automatic translations tend to be literal, missing context.
“We have to be honest about where the tech is,” Shemen added. “Matching the performance level of humans is not a simple task. And forget emulating a high-quality voice artist.”
Others in the auto-dubbing movement, however, say this could lead to new opportunities, like synthetic voices with signature styles.
“We don’t have to replicate what Hollywood has done until now," said Flawless co-founder Mann. “I think we can create a lot of new rules for what foreign movies will sound like,” he added, of course alluding to Doc Brown’s iconic line to Marty McFly that, where we’re going, nous n’avons pas besoin de routes.