The Rise of the Robo-Voices
The next time you see a movie or TV show that was dubbed from a foreign language, the voices you hear may not belong to actors who rerecorded dialogue in a sound booth. In fact, they may not belong to actors at all.
Highly sophisticated digital voice manufacturing is coming, and entertainment executives say it could bring a revolution in sound as industry-changing as computer graphics were for visuals. New companies are using artificial intelligence to create humanlike voices from samples of a living actor’s voice—models that not only can sound like specific performers, but can speak any language, cry, scream, laugh, even talk with their mouths full. At the same time, companies are refining the visual technology so actors look like they are really speaking.
As streaming services export American fare globally and foreign markets send their hits to the U.S., dubbing is a bigger business than ever. But the uses of synthetic voices extend well beyond localizing foreign films. AI models can provide youthful voices for aging actors. The technology can resurrect audio from celebrities who have died or lost the ability to speak. And it can tweak dialogue in postproduction without the need for actors.
All the tinkering raises thorny ethical questions. Where is the line between creating an engrossing screen experience and fabricating an effect that leaves audiences feeling duped?
“When I’m sitting in the theater, I’m eating popcorn, I don’t think, ‘Wait a second, this is AI? They didn’t tell me, I should have known,’ ” says Oz Krakowski, chief revenue officer of Deepdub, an AI voice company based in Tel Aviv. He doesn’t think filmmakers necessarily need to inform audiences when using fake voices. “I come to the theater to have a good time and enjoy. And that’s what we’re supplying: an immersive experience.”
The technology is set to hit a new target in the coming months, when foreign-language dubbed versions of the 2019 indie horror movie “Every Time I Die” are released in South America. Those versions mark one of the first times entire dubbed movies use computerized voice clones based on the voices of the original English-speaking cast. So when the film comes out abroad, audiences will hear the original actors “speaking” Spanish or Portuguese. Deepdub created the replicas based on 5-minute recordings of each actor speaking English.
Much was made of the de-aging visuals used on the face of Mark Hamill for “The Mandalorian,” in which he played Luke Skywalker in the 1977 original “Star Wars” and reappeared as the young Jedi in the show’s late 2020 season finale. But the actor’s voice aired without fanfare.
“Something people didn’t realize is his voice isn’t real,” Jon Favreau, the show’s creator, says in a behind-the-scenes feature Disney+ released in late August. “It becomes harder and harder to trust your own eyes and ears when it comes to this stuff.”
With the blessing of Mr. Hamill, 70 years old, voice-cloning company Respeecher created young Luke’s dialogue using snippets of the actor’s voice from 40-year-old recordings. The sources included old sound footage from the original “Star Wars” trilogy, an old “Star Wars” radio show and a book on tape from that period read by Mr. Hamill.
“Making something sound good is not similar to making something look good,” says Alex Serdiuk, co-founder and chief executive of Respeecher, based in Kyiv, Ukraine. “When we look at images that are generated, our imagination can add some colors to it, but we usually pick out all the small robotic, metallic sounds in a computer-generated voice.”
Even before HAL, the talking computer from the 1968 movie “2001: A Space Odyssey,” Hollywood has been fascinated by synthetic speech. Until recently, however, such speech has been little more than a novelty in film and TV.
The London AI firm Sonantic recently worked with actor Val Kilmer to recreate nearly 2 minutes of his voice in an online demonstration of the technology. The AI-generated model for Mr. Kilmer, who lost his voice to throat cancer, sounded so real it moved his own son to tears, says Sonantic CEO Zeena Qureshi. “That idea of being able to customize voice content, to change emotions, pitch, direction, delivery, style, accents—that’s now possible where it wasn’t before,” she says.
Mr. Kilmer and his son, Jack Kilmer, didn’t respond to requests for comment.
Zohaib Ahmed, founder of the Toronto-based voice cloning company Resemble AI, sees a world of “choose your own adventure” movies with 75 narrative arcs, where the cast’s custom AI models handle the grunt work of recording the different endings. Other AI evangelists see more pedestrian uses, like helping videogame developers make script changes without the time and expense of hiring professional actors. Another audio expert envisions old movies updated for modern audiences with offensive dialogue rewritten.
Among the technology’s biggest risks: the manipulated video or audio of deepfakes. More than one company attempted to replicate Morgan Freeman’s voice in demos unavailable for public use. A representative for the actor says this was done without permission from Mr. Freeman.
Filmmaker Morgan Neville recently acknowledged creating 45 seconds of an AI voice for late chef Anthony Bourdain in the documentary “Roadrunner,” stirring controversy over whether he had consent from the family and whether he should have informed the audience.
Journeyman actors, meanwhile, worry about getting replaced by computers.
In a lawsuit this spring, voice artist Bev Standing alleged that TikTok used an AI text-to-speech feature that she says is based on her voice; she estimates it has been used thousands of times without her permission or payment. She objects to people swearing with her voice—“If you want me to say the F-word, it kind of goes against my brand,” she says—and sees her voice as a piece of personal property being stolen.
Ms. Standing says her voice fell out of her control after she recorded a 2018 job where she read 10,000 lines of nonsense text for AI modeling. (Sample text: “Maybe tomorrow we can rent a car and run over some puppies.”) Last year, friends started telling her they heard her on TikTok. She investigated and, believing the voice was hers, found a lawyer.
TikTok and its parent, ByteDance Ltd., didn’t comment on the substance of Ms. Standing’s complaint.
In the past 18 months, Los Angeles entertainment lawyer Daniel Black has negotiated four contracts that seek to use synthetic voices in place of performers when necessary. Mr. Black made sure his clients had the right to reject the voices if they didn’t like the result. In the end, producers never needed to use the artificial voices, but the Hollywood clients took note.
“All four felt it was creepy,” Mr. Black says. “There was clearly a recognition, you know, ‘We can create your voice without you.’ ”