Google’s recent demonstration of its artificial intelligence model, Gemini, in a widely viewed YouTube video raised eyebrows as it showcased seemingly unparalleled capabilities. However, a closer examination reveals that the impressive back-and-forth interaction between the AI and a human was not as spontaneous as it appeared.
The Gemini demo video, boasting 1.6 million views, presents a scenario where Google’s AI responds dynamically to voice and video prompts. Yet, a disclaimer in the video’s description admits to speeding up responses for the sake of the demonstration. In a subsequent blog post, Google elucidates the actual methodology in creating the video.
The AI, it turns out, was not responding to real-time voice or video prompts. Instead, it was prompted using “still image frames from the footage, and prompting via text,” as confirmed by a Google spokesperson. While the video presents a person interacting with the AI, asking questions about objects on the screen, the reality is that the AI was fed still images and text prompts to generate responses.
For instance, when the demonstrator holds up a rubber duck, the AI seemingly identifies the material in real time after the user squeezes it. In reality, the AI was shown a still image of the duck, and a text prompt explained the squeaking noise, resulting in the correct identification.
Another impressive moment involves a cups and balls magic trick, where the AI accurately determines the location of a hidden ball. Yet, this achievement is based on showing the AI a series of still images representing the cups being swapped, not a real-time video.
The video also features an intriguing scene where the user asks the AI to invent a game based on a world map, incorporating emojis. The AI seemingly generates a game called “guess the country.” However, Google’s blog reveals that the AI did not invent the game; it followed specific instructions and generated clues based on stills of a map.
While the Gemini demo showcases the AI’s capabilities, it raises questions about transparency and the authenticity of real-time interactions. Google defends its approach, stating the demo was created to test Gemini’s abilities on diverse challenges. However, the use of still images and text prompts draws parallels with OpenAI’s GPT-4, and the timing of the video release amid recent upheavals in the AI space adds an intriguing layer to the unfolding narrative.
As the AI race progresses, Google finds itself navigating the delicate balance between innovation and authenticity.