Generative AI Is Running Out Of Text To Train Itself On, Reports Say

Shaheer Shahzad

3 years ago

ChatGPT and other AI-powered bots may soon be “running out of text in the universe” that trains them to know what to say, according to a new report. Stuart Russell, a professor of computer science at the University of California, Berkeley, said that the technology that hoovers up mountains of text to train artificial intelligence bots like ChatGPT is “starting to hit a brick wall.” In other words, there’s only so much digital text for these bots to ingest.

This might significantly alter how generative AI researchers get data and train their systems in the upcoming years. Russell claimed that although AI would continue to displace people in many occupations, how it does so may alter.

“The language-in, language-out jobs are going to be the first to go,” Russell said. “But as we run out of text, we’re going to have to find new ways to train AI.”

Here’s a hunch: AI builders may turn to synthesized data soon. That’s right, data created by machines not people. Surprisingly enough, this type of information can be realistic as hell and still teach AI models almost as effectively as real-world records do.

AI gurus may take things up a notch by involving data from external sources like social media and sensors in the area. As of yet, we can’t say for sure how this dearth of textual material will affect generative AI. But, it’s high time developers tackle this problem before it gets out of hand.

In addition to the lack of text data, social media executives and creatives are paying more attention to generative AI developers. While social media executives are upset that the data from their platforms is being utilized freely, some creatives worry that their work is being copied without their permission. The future course of these difficulties is still uncertain. But it’s undeniable that generative AI is a tremendous tool with the potential to transform a wide range of sectors. It will be interesting to see how technology is used and how it affects society as it develops further.

Several lawsuits have been filed against OpenAI in recent weeks alleging that the company used datasets containing personal data and copyrighted materials to train ChatGPT. One of the biggest lawsuits was filed by 16 unnamed plaintiffs, who claim OpenAI used sensitive data such as private conversations and medical records. Another lawsuit was filed by lawyers for comedian Sarah Silverman and two additional authors, who accused OpenAI of copyright infringement due to ChatGPT’s ability to write up accurate summaries of their work.

Regarding the assortment of lawsuits brought against them, OpenAI hasn’t said anything in the media. Although its CEO Sam Altman has stayed silent on the accusations, he has previously shown a wish to stay out of legal trouble. Altman informed the audience at an Abu Dhabi tech conference in June that he had no intentions for OpenAI to do an initial public offering (IPO), citing the possibility of conflicts with investors as his justification.

“I don’t really want to be like sued by a bunch of like public market, Wall Street whatevers,” Altman said.

Related Articles