A Dubai-based speech synthesis company released a fake podcast conversation between Joe Rogan and Steve Jobs, featuring realistic sounds digitally copied from both individuals. It takes place during the “first episode” of “Podcast.ai,” a supposed podcast series established by Play. ht, which sells speech synthesis services.
During the interview, you’ll initially hear a recreation of Rogan’s voice made by voice cloning technology. Deep learning technology has enabled AI models to reproduce distinctive voices accurately.
To get the desired result, the AI model must first be trained on existing samples of the voice that will be cloned. Since his lone voice is abundant on his podcasts, Rogan is a popular target for AI voice training using deep learning models.
What makes this case of AI stunt more noteworthy is that Play. ht also used the voice of the late Apple CEO Steve Jobs. Furthermore, Play. ht says that the interview transcript was also generated by AI, presumably using a large language model (LLM) akin to GPT-3.
“Transcripts are generated with fine-tuned language models,” writes Play. ht on the Podcast.ai website. “For example, the Steve Jobs episode was trained on his biography and all recordings of him we could find online, so the AI could accurately bring him back to life.”
In keeping with its LLM origins, the 19-minute interview makes some sense. However, after a while, elements of the fake conversation start to seem like conceptual mashups of popular Jobs talking points, such as aesthetics, breakthrough products, competitors like Google, Microsoft, and Adobe, and the original Macintosh’s successes.
For example, during one segment of the interview, the fake Jobs criticizes Microsoft in a way that is extremely close to what the real Jobs said in a famous 1995 interview for Triumph of the Nerds. Still, it’s not a carbon copy; if you compare the two, you can tell the voice is synthesized.
“That’s the problem I’ve always had with Microsoft,” fake Jobs says. “In many ways, they’re smart people, and they’ve done good work, but they’ve never had any taste. They’ve never had any aesthetic sense.”
It’s unclear whether it’s allowed to utilize Jobs’ or Rogan’s vocal representations in this way, mainly to advertise a commercial product. However, despite the podcast’s PR gimmick nature, the thought of totally fake celebrity podcasts piqued our interest.
As speech synthesis becomes increasingly common and potentially undetectable, we can expect media artifacts from any age to be flexible and shapable to fit any narrative.