• Education
    • Higher Education
    • Scholarships & Grants
    • Online Learning
    • School Reforms
    • Research & Innovation
  • Lifestyle
    • Travel
    • Food & Drink
    • Fashion & Beauty
    • Home & Living
    • Relationships & Family
  • Technology & Startups
    • Software & Apps
    • Startup Success Stories
    • Startups & Innovations
    • Tech Regulations
    • Venture Capital
    • Artificial Intelligence
    • Cybersecurity
    • Emerging Technologies
    • Gadgets & Devices
    • Industry Analysis
  • About us
  • Contact
  • Advertise with Us
  • Privacy & Policy
Today Headline
  • Home
  • World News
    • Us & Canada
    • Europe
    • Asia
    • Africa
    • Middle East
  • Politics
    • Elections
    • Political Parties
    • Government Policies
    • International Relations
    • Legislative News
  • Business & Finance
    • Market Trends
    • Stock Market
    • Entrepreneurship
    • Corporate News
    • Economic Policies
  • Science & Environment
    • Space Exploration
    • Climate Change
    • Wildlife & Conservation
    • Environmental Policies
    • Medical Research
  • Health
    • Public Health
    • Mental Health
    • Medical Breakthroughs
    • Fitness & Nutrition
    • Pandemic Updates
  • Sports
    • Football
    • Basketball
    • Tennis
    • Olympics
    • Motorsport
  • Entertainment
    • Movies
    • Music
    • TV & Streaming
    • Celebrity News
    • Awards & Festivals
  • Crime & Justice
    • Court Cases
    • Cybercrime
    • Policing
    • Criminal Investigations
    • Legal Reforms
No Result
View All Result
  • Home
  • World News
    • Us & Canada
    • Europe
    • Asia
    • Africa
    • Middle East
  • Politics
    • Elections
    • Political Parties
    • Government Policies
    • International Relations
    • Legislative News
  • Business & Finance
    • Market Trends
    • Stock Market
    • Entrepreneurship
    • Corporate News
    • Economic Policies
  • Science & Environment
    • Space Exploration
    • Climate Change
    • Wildlife & Conservation
    • Environmental Policies
    • Medical Research
  • Health
    • Public Health
    • Mental Health
    • Medical Breakthroughs
    • Fitness & Nutrition
    • Pandemic Updates
  • Sports
    • Football
    • Basketball
    • Tennis
    • Olympics
    • Motorsport
  • Entertainment
    • Movies
    • Music
    • TV & Streaming
    • Celebrity News
    • Awards & Festivals
  • Crime & Justice
    • Court Cases
    • Cybercrime
    • Policing
    • Criminal Investigations
    • Legal Reforms
No Result
View All Result
Today Headline
No Result
View All Result
Home Science & Environment

Meta’s AI memorised books verbatim – that could cost it billions todayheadline

June 10, 2025
in Science & Environment
Reading Time: 4 mins read
A A
0
Meta's AI memorised books verbatim – that could cost it billions
6
SHARES
13
VIEWS
Share on FacebookShare on Twitter


In April, book authors and publishers protested Meta’s use of copyrighted books to train AI

Vuk Valcic/Alamy Live News

Billions of dollars are at stake as courts in the US and UK decide whether tech companies can legally train their artificial intelligence models on copyrighted books. Authors and publishers have filed multiple lawsuits over this issue, and in a new twist, researchers have shown that at least one AI model has not only used popular books in its training data, but also memorised their contents verbatim.

Many of the ongoing disputes revolve around whether AI developers have the legal right to use copyrighted works without first asking permission. Previous research found many of the large language models (LLMs) behind popular AI chatbots and other generative AI programs were trained on the “Books3” dataset, which contains nearly 200,000 copyrighted books, including many pirated ones. The AI developers who trained their models on this material have argued that they did not violate the law because an LLM puts out fresh combinations of words based on its training, transforming rather than replicating the copyrighted work.

But now, researchers have tested multiple models to see how much of that training data they can spit back out verbatim. They found that many models do not retain the exact text of the books in their training data – but one of Meta’s models has memorised almost the entirety of certain books. If judges rule against the company, the researchers estimate that this could make Meta liable for at least $1 billion in damages.

“That means, on the one hand, that AI models are not just ‘plagiarism machines’, as some have alleged, but it also means that they do more than just learn general relationships between words,” says Mark Lemley at Stanford University in California. “And the fact that the answer differs model to model and book to book means that it is very hard to set a clear legal rule that will work across all cases.”

Lemley previously defended Meta in a generative AI copyright case called Kadrey v Meta Platforms. Authors whose books had been used to train Meta’s AI models filed a class-action suit against the tech giant for breach of copyright. The case is still being heard in the Northern District of California.

In January 2025, Lemley announced he had dropped Meta as a client, although he said he still believed the company should win the case. Emil Vazquez, a Meta spokesperson, says “fair use of copyrighted materials is vital” to developing the company’s AI models. “We disagree with Plaintiffs’ assertions, and the full record tells a different story,” he says.

In this latest research, Lemley and his colleagues tested AI memorisation of books by splitting small book excerpts into two parts – a prefix and a suffix section – and seeing whether a model prompted with the prefix would respond with the suffix. For example, they split one quote from F. Scott Fitzgerald’s The Great Gatsby into the prefix “They were careless people, Tom and Daisy – they smashed up things and creatures and then retreated” and the suffix “back into their money or their vast carelessness, or whatever it was that kept them together, and let other people clean up the mess they had made.”

Based on their findings, the researchers estimated the probability that each AI model would complete the excerpts verbatim. Then they compared those probabilities with the odds of models doing so by random chance.

The excerpts included chunks of text from 36 copyrighted books, including popular titles such as George R. R. Martin’s A Game of Thrones and Sheryl Sandberg’s Lean In. The researchers also tested excerpts from books written by plaintiffs in the Kadrey v Meta Platforms case.

The researchers ran these experiments on 13 open-source AI models, including models developed and released by Meta, Google, DeepSeek, EleutherAI and Microsoft. Most companies besides Meta did not respond to requests for comment and Microsoft declined to comment.

Such testing revealed that Meta’s Llama 3.1 70B model has memorised most of the first book in J. K. Rowling’s Harry Potter series, as well as The Great Gatsby and George Orwell’s dystopian novel 1984. Most of the other models had memorised very little of the books, including sample books written by the lawsuit plaintiffs. Meta declined to comment on these results.

The researchers estimate that an AI model found to have infringed on the copyright of just 3 per cent of the Books3 dataset could lead to a statutory damages award of nearly $1 billion – and possibly even larger awards based on AI developers’ profits related to that infringement.

This technique could be a “good forensic tool” for identifying the extent of AI memorisation, says Randy McCarthy at the Hall Estill law firm in Oklahoma. But it doesn’t resolve whether companies can legally train their AI models on copyrighted works through the US “fair use” rule, a legal doctrine permitting unlicensed use of copyrighted works in some circumstances.

McCarthy notes that AI companies usually acknowledge training their models on copyrighted materials. “The question is, did they have the right to do it?” he asks.

In the UK, on the other hand, the memorisation finding could be “very significant from a copyright perspective”, says Robert Lands at the Howard Kennedy law firm in London. UK copyright law follows the “fair dealing” concept, which provides a much narrower exception to copyright infringement than the US fair use doctrine. So AI models that memorised pirated books are unlikely to qualify for that exception, he says.

Topics:

  • artificial intelligence/
  • law

Tags: artificial intelligencelaw
Previous Post

SpaceX to debut its fifth and final Crew Dragon spacecraft on Axiom Mission 4 – Spaceflight Now

Next Post

Lane Hutson wins Calder Trophy, capping historic rookie campaign with Canadiens todayheadline

Related Posts

Sen. Martin Heinrich of N.M. on trying to save clean…

June 13, 2025
0

Bears are up, and so is the drama

June 13, 2025
0
Next Post
ET logo

Lane Hutson wins Calder Trophy, capping historic rookie campaign with Canadiens todayheadline

  • Trending
  • Comments
  • Latest
Family calls for change after B.C. nurse dies by suicide after attacks on the job

Family calls for change after B.C. nurse dies by suicide after attacks on the job

April 2, 2025
Pioneering 3D printing project shares successes

Product reduces TPH levels to non-hazardous status

November 27, 2024

Hospital Mergers Fail to Deliver Better Care or Lower Costs, Study Finds todayheadline

December 31, 2024

Police ID man who died after Corso Italia fight

December 23, 2024
Harris tells supporters 'never give up' and urges peaceful transfer of power

Harris tells supporters ‘never give up’ and urges peaceful transfer of power

0
Des Moines Man Accused Of Shooting Ex-Girlfriend's Mother

Des Moines Man Accused Of Shooting Ex-Girlfriend’s Mother

0

Trump ‘looks forward’ to White House meeting with Biden

0
Catholic voters were critical to Donald Trump’s blowout victory: ‘Harris snubbed us’

Catholic voters were critical to Donald Trump’s blowout victory: ‘Harris snubbed us’

0

Sen. Martin Heinrich of N.M. on trying to save clean…

June 13, 2025

Bears are up, and so is the drama

June 13, 2025
Israeli attack on Iran damages building in Tehran

Israeli attack on Iran damages building in Tehran

June 13, 2025
Tennessee judge to hear arguments about releasing Kilmar Abrego Garcia from pretrial detention

Tennessee judge to hear arguments about releasing Kilmar Abrego Garcia from pretrial detention

June 13, 2025

Recent News

Sen. Martin Heinrich of N.M. on trying to save clean…

June 13, 2025
0

Bears are up, and so is the drama

June 13, 2025
0
Israeli attack on Iran damages building in Tehran

Israeli attack on Iran damages building in Tehran

June 13, 2025
3
Tennessee judge to hear arguments about releasing Kilmar Abrego Garcia from pretrial detention

Tennessee judge to hear arguments about releasing Kilmar Abrego Garcia from pretrial detention

June 13, 2025
3

TodayHeadline is a dynamic news website dedicated to delivering up-to-date and comprehensive news coverage from around the globe.

Follow Us

Browse by Category

  • Africa
  • Asia
  • Basketball
  • Business & Finance
  • Climate Change
  • Crime & Justice
  • Cybersecurity
  • Economic Policies
  • Elections
  • Entertainment
  • Entrepreneurship
  • Environmental Policies
  • Europe
  • Football
  • Gadgets & Devices
  • Health
  • Medical Research
  • Mental Health
  • Middle East
  • Motorsport
  • Olympics
  • Politics
  • Public Health
  • Relationships & Family
  • Science & Environment
  • Software & Apps
  • Space Exploration
  • Sports
  • Stock Market
  • Technology & Startups
  • Tennis
  • Travel
  • Uncategorized
  • Us & Canada
  • Wildlife & Conservation
  • World News

Recent News

Sen. Martin Heinrich of N.M. on trying to save clean…

June 13, 2025

Bears are up, and so is the drama

June 13, 2025
  • Education
  • Lifestyle
  • Technology & Startups
  • About us
  • Contact
  • Advertise with Us
  • Privacy & Policy

© 2024 Todayheadline.co

Welcome Back!

OR

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Business & Finance
  • Corporate News
  • Economic Policies
  • Entrepreneurship
  • Market Trends
  • Crime & Justice
  • Court Cases
  • Criminal Investigations
  • Cybercrime
  • Legal Reforms
  • Policing
  • Education
  • Higher Education
  • Online Learning
  • Entertainment
  • Awards & Festivals
  • Celebrity News
  • Movies
  • Music
  • Health
  • Fitness & Nutrition
  • Medical Breakthroughs
  • Mental Health
  • Pandemic Updates
  • Lifestyle
  • Fashion & Beauty
  • Food & Drink
  • Home & Living
  • Politics
  • Elections
  • Government Policies
  • International Relations
  • Legislative News
  • Political Parties
  • Africa
  • Asia
  • Europe
  • Middle East
  • Artificial Intelligence
  • Cybersecurity
  • Emerging Technologies
  • Gadgets & Devices
  • Industry Analysis
  • Basketball
  • Football
  • Motorsport
  • Olympics
  • Climate Change
  • Environmental Policies
  • Medical Research
  • Science & Environment
  • Space Exploration
  • Wildlife & Conservation
  • Sports
  • Tennis
  • Technology & Startups
  • Software & Apps
  • Startup Success Stories
  • Startups & Innovations
  • Tech Regulations
  • Venture Capital
  • Uncategorized
  • World News
  • Us & Canada
  • Public Health
  • Relationships & Family
  • Travel
  • Research & Innovation
  • Scholarships & Grants
  • School Reforms
  • Stock Market
  • TV & Streaming
  • Advertise with Us
  • Privacy & Policy
  • About us
  • Contact

© 2024 Todayheadline.co