OpenAI, Google, And Anthropic Won’t Let Their Data Be Used To Train AI Models – But They Use Everyone Else’s Content

Jannat Un Nisa

3 years ago

In the realm of generative AI, prominent tech companies are adopting a somewhat contradictory approach when it comes to utilizing online content. Microsoft-backed OpenAI, along with Google and Google-backed Anthropic, have been incorporating content from various companies to train their generative AI models without explicit permission.

While the tech industry may argue that their actions align with fair use, they refuse to allow their own content to be utilized for training other AI models. This raises the question of why they should be permitted to freely use everyone else’s content. For example, Anthropic’s AI assistant, Claude, explicitly states that users cannot develop competing products or services, including artificial intelligence or machine learning algorithms.

Similarly, Google’s terms of use for their generative AI emphasize that their services cannot be used for the development of machine learning models or related technology. OpenAI, the company behind ChatGPT, also prohibits users from utilizing their services to create models that compete with their own.

These tech companies are well aware that high-quality content is crucial for training new AI models, which is why they protect their own output. However, it raises concerns as to why other websites and companies allow these tech giants to freely use their content for training purposes. Some companies are beginning to realize the situation and are unhappy about it. Reddit, for instance, which has been extensively employed in AI model training, plans to start charging for access to its data, asserting that such valuable content should not be provided to the largest companies in the world for free.

Elon Musk previously accused Microsoft, the primary supporter of OpenAI, of unlawfully using Twitter’s data to train AI models, hinting at potential legal action. In response, a Microsoft spokesperson disputed the accusation. OpenAI’s CEO, Sam Altman, acknowledges the need for change and is working on developing new AI models that respect copyright, aiming to compensate content creators when their content or style is used by AI systems.

Publishers, including Insider, have a vested interest in this matter and are advocating for tech companies to pay for the use of their content in training AI models.

The current approach to training AI models has drawn criticism from former Microsoft executive Steven Sinofsky, who argues that it undermines the web, as it fails to deliver value to creators and copyright holders.

Related Articles