Google is exploring the ambitious “Project Ellmann,” which envisions utilizing artificial intelligence (AI), specifically large language models (LLMs) like Gemini, to provide users with a comprehensive “bird’s-eye” view of their lives through the analysis of mobile phone data, including photos and searches.
Named after biographer Richard David Ellmann, the project aims to leverage Gemini to ingest search results, identify patterns in user photos, and create a chatbot capable of answering nuanced questions, essentially serving as a “Life Story Teller.”
Gemini, Google’s latest AI model, is highlighted for its multimodal capabilities, allowing it to process and understand information beyond text, encompassing images, video, and audio. It has been positioned as a powerful tool, potentially outperforming OpenAI’s GPT-4 in certain scenarios. The company plans to license Gemini through Google Cloud for various applications, extending its impact beyond internal projects like Project Ellmann.
Project Ellmann’s core idea involves large language models extracting context from biographies, previous moments, and subsequent photos, providing a deeper understanding of a user’s photo collection. The proposed technology aims to go beyond mere pixel and metadata analysis, identifying and categorizing meaningful life moments such as university years, years in a specific location, or the period of being a parent.
“We can’t answer tough questions or tell good stories without a bird’s-eye view of your life,” one description reads alongside a photo of a small boy playing with a dog in the dirt.
“We trawl through your photos, looking at their tags and locations to identify a meaningful moment,” a presentation slide reads. “When we step back and understand your life in its entirety, your overarching story becomes clear.”
The presentation said large language models could infer moments like a user’s child’s birth. “This LLM can use knowledge from higher in the tree to infer that this is Jack’s birth, and that he’s James and Gemma’s first and only child.”
“One of the reasons that an LLM is so powerful for this bird’s-eye approach, is that it’s able to take unstructured context from all different elevations across this tree, and use it to improve how it understands other regions of the tree,” a slide reads, alongside an illustration of a user’s various life “moments” and “chapters.”
Presenters gave another example of determining one user had recently been to a class reunion. “It’s exactly 10 years since he graduated and is full of faces not seen in 10 years so it’s probably a reunion,” the team inferred in its presentation.
The Ellmann system, as presented in internal documents, suggests its ability to infer complex life events, such as a user’s child’s birth, by drawing on contextual information higher up in the data tree. It envisions a holistic understanding of a user’s life, allowing for the identification of various life “moments” and “chapters.” The system would, in essence, function as a personal life narrative generator.
An interesting aspect of the presentation is the demonstration of “Ellmann Chat,” a feature where the chatbot already possesses knowledge about the user’s life. Examples include answering queries about pets, family visits, and even providing insights into the user’s interests, work, and travel plans based on data extracted from screenshots.
In the broader context, Project Ellmann is positioned within Google’s larger strategy of incorporating AI into its products. This includes Gemini, launched as the company’s most advanced AI model, and its potential integration into Google Cloud services for widespread adoption.
While Project Ellmann could contribute to the ongoing race among tech giants to offer more personalized life memories, Google emphasizes that this exploration is in its early stages. The company commits to prioritizing user privacy and safety in any potential implementation. The proposed technology aligns with trends in the tech industry, where companies like Apple also explore AI-driven features for organizing and enhancing user experiences with photos and memories.