Reddit brands Anthropic as 'anything but' a white knight, heating up AI scraping wars

Reddit brands Anthropic as ‘anything but’ a white knight, heating up AI scraping wars

A clash between established online content providers and artificial intelligence upstarts is heating up again as AI-driven large language models gobble information in a race to dominate the web’s frontier.

The latest of the AI scraping wars is between Reddit (RDDT) and AI startup Anthropic (ANTH.PVT), a company backed by tech giants Amazon (AMZN) and Google (GOOG, GOOGL) that created the AI language model Claude.

Reddit is claiming in a new lawsuit that Anthropic intentionally scraped Reddit users’ personal data without their consent and then put their data to work training Claude.

FILE PHOTO: Reddit app is seen on a smartphone in this illustration taken, July 13, 2021. REUTERS/Dado Ruvic/Illustration/File Photo — Reddit app is seen on a smartphone in 2021. REUTERS/Dado Ruvic/Illustration/File Photo · Reuters / Reuters

Reddit said in its complaint that Anthropic “bills itself as the white knight of the AI industry” and argues that “it is anything but.”

Anthropic said last year that it had blocked its bots from Reddit’s website, according to the complaint. But Reddit said Anthropic “continued to hit Reddit’s servers over one hundred thousand times.”

An Anthropic spokesperson said, “We disagree with Reddit’s claims and will defend ourselves vigorously.”

Anthropic is also defending itself against a separate suit from music publishers, including Universal Music Group (0VD.F), ABKCO, and Concord, alleging that Anthropic infringed on copyrights for Beyoncé, the Rolling Stones, and other artists as it trained Claude on lyrics to more than 500 songs.

The confrontation between Reddit and Anthropic adds to a growing number of high-profile cases where copyright holders have tried to guard their works from the reach of technology firms.

Beyonce walks on stage to accept the Innovator award during the iHeartRadio Music Awards at Dolby Theatre in Los Angeles, California, U.S., April 1, 2024. REUTERS/Mario Anzuoni — Anthropic is also defending itself against a separate suit from music publishers including Universal Music Group, ABKCO, and Concord alleging that Anthropic infringed on copyrights for Beyoncé and other artists. Photo: REUTERS/Mario Anzuoni · REUTERS / Reuters

A question at the heart of all these lawsuits: Can artificial intelligence companies use copyrighted material to train generative AI models without asking the owner of that data for permission?

Courts haven’t settled on the answer. However, last February, the US District Court for Delaware handed copyright holder Thompson Reuters a win in a case that could impact what data training models can legally collect.

The court granted Thompson Reuters’ request for summary judgment, saying that its competitor, Ross, infringed on its copyrights by using lawsuit summaries to train its AI model.

The court rejected Ross’s argument that it could use the summaries under the concept of fair use, which allows copyrights to be used for news reporting, teaching, research, criticism, and commentary.

One big name featuring prominently in some of these clashes is OpenAI (OPAI.PVT), the creator of chatbot ChatGPT that is run by Sam Altman and backed by Microsoft (MSFT).

Comedian Sarah Silverman has accused the companies in a lawsuit of copying material from her book and 7 million pirated works in order to train its AI systems. Parenting website Mumsnet has also accused OpenAI of scraping its six billion-word database without consent.

But perhaps the most prominent case targeting OpenAI is from The New York Times (NYT), which in 2023 filed a lawsuit accusing OpenAI and Microsoft of illegally using millions of the news outlet’s published stories to train OpenAI’s language models.

The newspaper has said that ChatGPT, which trained on millions of its articles, at times generates query answers that closely mirror its original publications.

Last week, OpenAI called the lawsuit “baseless” and appealed a judge’s recent order in that case requiring the AI developer to preserve “output data” generated by ChatGPT.

FILE PHOTO: The headquarters of the New York Times is pictured on 8th Avenue in New York, February 5, 2008. REUTERS/Gary Hershorn (UNITED STATES)/File Photo — The headquarters of the New York Times in New York. Photo: REUTERS/Gary Hershorn · REUTERS / Reuters

OpenAI and Microsoft are using a defense similar to those raised in other AI training copyright disputes: that the Times’ publicly available content falls under the fair use doctrine and, therefore, can be used to train its models.

Getty Images is trying to chip away at that same argument in lawsuits in the US and United Kingdom filed in 2023 against AI image generation startup Stability.

The UK case went to trial on Monday. Stability argues that fair use (or “fair dealing” as it’s known in the UK) justified training its technology, Stable Diffusion, on copyrighted Getty material.

That same defense has hallmarks of justification that Google has been asserting for the past two decades to fight lawsuits claiming it violated copyright laws when pulling information into results for users’ search queries.

In 2005, the Authors Guild sued Google over millions of books that the tech giant scanned and made available in “snippets” to online searchers. Google didn’t pay for the copyrighted information but did provide word-for-word pieces of the copyrighted works in search results.

Google signage is seen at the Google headquarters in the Manhattan borough of New York City, New York, U.S., December 19, 2018. REUTERS/Shannon Stapleton — A Google building in Manhattan. Photo: REUTERS/Shannon Stapleton · REUTERS / Reuters

The US Court of Appeals for the Second Circuit reasoned in a decision that Google’s scanning project tested the limits of fair use but was “transformative” and therefore protected under fair use law.

In 2016, Getty Images sued Google over similar claims, alleging that Google violated its copyrights and antitrust law by displaying Getty’s high-resolution images in Google search results.

The practice, Getty argued, promoted piracy, and kept its prospective customers from visiting its site and from paying for their content.

Google and Getty settled before trial. In an agreement, Google agreed to display copyright holders’ information more prominently, as well as enter into a licensing partnership with Getty and remove a “view image” button from Google Search that made it more difficult to download high-quality images.

Google could be drawn into the new AI scraping wars as part of a landmark antitrust case against Google’s Search monopoly that was won last year by the US Justice Department.

The DOJ has argued to a judge considering remedies to rectify Google’s monopoly that the search giant could further entrench its dominance by training its AI model, Gemini, on its vast index of internet content.

Click here for the latest technology news that will impact the stock market

Read the latest financial and business news from Yahoo Finance