Google CEO Sundar Pichai has ignited a discussion around the moral use of data in AI development. Pichai made a suggestion in a recent interview that OpenAI, a research firm renowned for its potent language models, may have broken YouTube’s terms of service when it came to training its ground-breaking text-to-video model, Sora.
Sora, which derives its name from the Japanese word for “sky,” has stunned the artificial intelligence community by producing excellent films in response to straightforward written commands. Still unknown, though, is the source of the data that went into training this amazing model. on the specifics, OpenAI’s CTO, Mira Murati, has been evasive, saying they used “publicly available data and licensed data” while sidestepping inquiries on websites like YouTube and Instagram.
This lack of transparency is concerning for YouTube, especially considering its CEO, Neal Mohan, has already declared using YouTube content for AI training a “clear violation” of the platform’s terms. Creators have a right to expect their work is respected, and YouTube’s terms specifically prohibit unauthorized downloading of transcripts or video portions.
The situation highlights the challenges faced by data-hungry AI companies. Training these complex models requires vast amounts of information, and sourcing ethical data can be a complex endeavor. Amazon-backed Anthropic provides a potential solution: they’ve opted to train their models using self-generated data.
This isn’t the only controversy OpenAI has faced recently. Actress Scarlett Johansson expressed outrage after a voice in their GPT-4o model, a newly released language model, sounded “eerily similar” to hers. Johansson had previously rejected OpenAI CEO Sam Altman’s offer to voice the model. OpenAI, facing public backlash, was forced to temporarily suspend the use of the specific voice option in question.
These incidents raise crucial questions about the ethical implications of AI development. How should AI companies interact with copyrighted content and the voices of creators? Should there be stricter regulations on data acquisition in the AI industry? As AI continues to evolve, finding solutions that balance innovation with ethical considerations will be paramount.