OpenAI's GPT-4o Voice Mode Says It Needs to Breathe

In a recent Reddit video, OpenAI’s GPT-4o Voice Mode, a cutting-edge large language model (LLM), demonstrates its advanced voice capabilities in a unique and somewhat whimsical interaction. The video, which was posted on Reddit’s r/Singularity forum, features a largely off-camera user engaging with the LLM, which has recently begun rolling out to the public following its announcement earlier this year. The clip showcases the LLM’s human-like voice and conversational abilities, distinguishing it from its predecessors.

In the video, the user requests GPT-4o to recite a series of tongue twisters. After the LLM complies, the user asks it to repeat the exercise at a faster pace, without taking any breaths or pauses.

“I want you to do it again, but way faster,” the person chatting with the language model demands, “and without taking any breaths or pauses.”

Surprisingly, GPT-4o refuses to do so, explaining, “I wish I could, but I need to breathe just like anybody speaking. Wanna give it a shot yourself and see how fast you can go?”

Asking GPT 4o advanced voice is really good.

Instructing to say tongue twisters without pausing to breathe. It insists it *has* to breathe “just like anybody speaking”

Sourced from Reddit pic.twitter.com/rRDmKf5FkJ
— Rohan Paul (@rohanpaul_ai) August 1, 2024

This response sparked a flurry of speculation and debate among Reddit users. Some suggested that the LLM’s system prompt might be designed to emulate natural human speech patterns, avoiding an overly mechanical or robotic response that could be unsettling to users.

“The system prompt probably instructs the model to mimic how a human speaks and avoid any unnatural robotic [E]minem rap that would scare off the general public,” one user quipped.

Others questioned whether the LLM’s refusal could be attributed to its training data, with some dismissing this idea as implausible, suggesting that such data would more likely lead to incoherence rather than intentional defiance.

“Seems unlikely that the training data would cause it to refuse,” the Redditor followed up. “That would presumably just cause it to do it badly/incoherently.”

The video has not only intrigued users but also highlighted GPT-4o’s ability to handle conversational nuances with a degree of naturalness and charm. While the refusal to perform the task as requested led to various theories, many users expressed admiration for how adeptly the LLM managed the situation, furthering discussions about the evolving capabilities of AI and its interaction with human commands.

“Great, so now AI training includes responding in defiance to us,” another user posited. “What could go wrong[?]”