OpenAI, the organization behind the popular ChatGPT software, is facing criticism from major news outlets for allegedly using their articles to train the artificial intelligence tool without compensating them, according to a report from Bloomberg.
News Corp.'s Dow Jones unit, which owns the Wall Street Journal, has stated that anyone who wants to use the work of its journalists to train AI should be properly licensing the rights to do so from Dow Jones. The company added that it is reviewing the situation and takes the misuse of its journalists' work seriously.
The concerns were raised when computational journalist Francesco Marconi posted a tweet revealing that the work of news outlets was being used to train ChatGPT.
Marconi asked the chatbot for a list of news sources it was trained on and received a response naming 20 outlets. It's unclear whether OpenAI has agreements with all of these publishers, and scraping data without permission would break the publishers' terms of service.
CNN also believes that using its articles to train ChatGPT violates its terms of service and plans to reach out to OpenAI about being paid to license the content. If it can't be settled amicably, expect a lawsuit to be filed, and either way, more media outlets will pile on.
This follows a case in November 2022 when GitHub, Microsoft Corp., and OpenAI were sued for GitHub Copilot, which was accused of plagiarizing human developers in violation of their licenses.
Developers v GitHub Copilot
The case, Doe v. GitHub Inc, U.S. District Court for the Northern District of California, No. 4:22-cv-06823, asks for compensation for developers.
Microsoft, GitHub, and OpenAI have told a San Francisco federal court that a proposed class-action lawsuit for improperly monetizing open-source code to train their AI systems cannot be sustained. The complaint, filed by a group of anonymous copyright owners, alleged that the companies trained GitHub's Copilot with code from GitHub repositories without complying with open-source licensing terms, and that Copilot unlawfully reproduced their code.
However, the companies said the complaint lacked specific allegations, failed to identify particular copyrighted works they misused, and did not outline any specific injuries suffered from their actions. They also cited the doctrine of fair use, which allows unlicensed use of copyrighted works in some situations.
It's unclear how these legal actions will pan out for OpenAi and Microsoft, but they're not alone, as lawsuits continue piling against the sector.
Artists v Stable Diffusion
Similarly, a group of artists sued AI generators Stability AI Ltd., MidJourney Inc., and DeviantArt Inc. in January 2023 (Andersen et al v. Stability AI Ltd. et al No. 3:2023cv00201), claiming that those companies downloaded and used billions of copyrighted images without compensating or obtaining the consent of the artists.
Meanwhile, Getty Images filed suit against Stability AI, Getty Images (US), Inc. v Stability AI, Inc., D. Del., No. 1:99-mc-09999, claiming it wasn't properly compensated for its watermarked work to be used by AI data scraping companies. Getty is known for its editorial imagery, something not easily mimicked by artificial intelligence. For example, you could generate an image of Snoop Dogg, but not one of where he's at today.
Still, the company owns stock illustration and photo sites like iStock that compete against the likes of Shutterstock and Adobe Stock, each of which has had its own AI-related controversies recently. Shutterstock will not allow humans to sell AI images on its site, but it does have a partnership with OpenAI to integrate DallE-2 and sell AI-generated images in a custodial Shutterstock AI account that competes with its human artists. This is a slap in the face to a community already painfully undercompensated at $0.10 per image sale.
Meanwhile, Adobe Stock released generative AI image rules allowing the content on its marketplace, but a change in its terms of service related to AI has many artists wondering exactly what they're scraping and for what purpose.
The use of artificial intelligence in the news industry has raised concerns over the potential takeover of jobs and the spread of misinformation. Recently, publications like CNET and Men's Journal have been forced to correct AI-written articles that were riddled with errors. Buzzfeed's usage of the technology to create its infamous quizzes, on the other hand, gave it a massive stock boost to near-IPO levels.
The controversy surrounding OpenAI's use of news articles to train ChatGPT highlights the importance of proper licensing and the protection of copyright. It's unclear whether it's safe to use generative AI text and images, but the race is on regardless.
Where do you stand in the AI debate?