OpenAI’s Sora, a remarkable advancement in video AI technology, has garnered widespread attention. However, questions persist regarding the data source used to train this groundbreaking model. Amidst the speculation, OpenAI’s COO, Brad Lightcap, addressed the issue during an interview about the complexities surrounding data ethics and AI development.
Lightcap spoke about the possible business uses of AI technology at the Bloomberg Technology Summit, citing Sora as one notable example. However, Lightcap avoided answering directly when asked if YouTube videos were used in Sora’s training, highlighting the significance of data accountability and transparency.
Lightcap addressed a direct question regarding using YouTube data in Sora’s training by outlining the necessity of a strong framework for data usage in AI development. He talked about how important it is for content creators to maintain ownership over their work, including choosing to use or not use training datasets. Lightcap acknowledged the complexity of the situation but declined to offer an authoritative answer on YouTube’s role in Sora’s training.
To ease concerns about content reliability, OpenAI has published a piece on “understanding the source of what we see and hear online.” The article, however, did not mention YouTube by name; instead, it concentrated on OpenAI’s initiatives to create guidelines for content authentication and identification.
According to earlier reports, OpenAI may have trained earlier models like GPT-4 using YouTube content, which would have violated the platform’s policies. In spite of these accusations, OpenAI has not disclosed any information about the specifics of Sora’s training data.
Despite all of the criticism, OpenAI insists that Sora will launch later this year, demonstrating their faith in the future potential of their video artificial intelligence technology.
However, transparency and accountability will continue to be critical in determining how AI technology develops as long as stakeholders wrestle with these problems.