SP
BravenNow
Encyclopedia Britannica is suing OpenAI for allegedly ‘memorizing’ its content with ChatGPT
| USA | technology | ✓ Verified - theverge.com

Encyclopedia Britannica is suing OpenAI for allegedly ‘memorizing’ its content with ChatGPT

#Encyclopedia Britannica #OpenAI #lawsuit #copyright infringement #GPT-4 #AI training #Merriam-Webster #memorization

📌 Key Takeaways

  • Encyclopedia Britannica and Merriam-Webster sue OpenAI for copyright infringement.
  • Lawsuit alleges OpenAI used copyrighted content to train AI models like GPT-4 without permission.
  • Claim states GPT-4 'memorized' and can output near-verbatim copies of Britannica's content.
  • OpenAI accused of generating responses 'substantially similar' to the publishers' copyrighted material.

📖 Full Retelling

On Friday, Encyclopedia Britannica and dictionary publisher Merriam-Webster filed a lawsuit against OpenAI alleging that it used their copyrighted content to train its AI, then generated responses that were "substantially similar" to their content, as previously reported by Reuters . According to Britannica, OpenAI repeatedly copied its content without permission, stating, "GPT-4 itself has 'memorized' much of Britannica's copyrighted content and will output near-verbatim copies of significant portions on demand. The memorized examples are unauthorized copies that [OpenAI] used to train their models, including GPT-4." The lawsuit goes on … Read the full story at The Verge.

🏷️ Themes

Copyright Law, AI Training

📚 Related People & Topics

Encyclopædia Britannica

General knowledge encyclopaedia

The Encyclopædia Britannica (Latin for 'British Encyclopaedia') is a general-knowledge English-language encyclopaedia. It has been published since 1768, and after several ownership changes is currently owned by Encyclopædia Britannica, Inc. The 2010 version of the 15th edition, which spans 32 volume...

View Profile → Wikipedia ↗
OpenAI

OpenAI

Artificial intelligence research organization

# OpenAI **OpenAI** is an American artificial intelligence (AI) research organization headquartered in San Francisco, California. The organization operates under a unique hybrid structure, comprising the non-profit **OpenAI, Inc.** and its controlled for-profit subsidiary, **OpenAI Global, LLC** (a...

View Profile → Wikipedia ↗

Machine learning

Study of algorithms that improve automatically through experience

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Within a subdiscipline in machine learning, advances i...

View Profile → Wikipedia ↗

Entity Intersection Graph

No entity connections available yet for this article.

Mentioned Entities

Encyclopædia Britannica

General knowledge encyclopaedia

OpenAI

OpenAI

Artificial intelligence research organization

Machine learning

Study of algorithms that improve automatically through experience

Deep Analysis

Why It Matters

This lawsuit is important because it addresses the core legal and ethical issues of AI training on copyrighted material without permission, potentially setting a precedent for how AI companies use existing content. It affects publishers like Encyclopedia Britannica and Merriam-Webster, whose business models rely on licensing their authoritative content, as well as the broader AI industry, which may face increased scrutiny and costs for data sourcing. The outcome could influence future AI development, intellectual property laws, and the balance between innovation and copyright protection, impacting creators, consumers, and tech companies alike.

Context & Background

  • OpenAI and other AI firms have faced multiple lawsuits from publishers, authors, and media companies alleging unauthorized use of copyrighted works to train large language models (LLMs), such as cases from The New York Times and Getty Images.
  • The legal debate centers on whether AI training constitutes 'fair use' under U.S. copyright law, which allows limited use of copyrighted material for purposes like criticism or research, but courts have not yet definitively ruled on its application to AI.
  • Encyclopedia Britannica, founded in 1768, is a longstanding reference work known for its curated, fact-checked content, while Merriam-Webster, established in 1831, is a prominent dictionary publisher, both relying on subscription and licensing revenue in the digital age.
  • AI models like GPT-4 are trained on vast datasets scraped from the internet, including books, articles, and websites, raising concerns about transparency, compensation for creators, and the potential for AI to replicate protected content verbatim.

What Happens Next

The lawsuit will proceed through the legal system, with potential hearings and motions in the coming months, possibly leading to a settlement or trial that could clarify copyright standards for AI training. If the case advances, it may influence ongoing legislative efforts, such as proposed AI regulations in the U.S. and EU, aimed at addressing data usage and intellectual property. OpenAI and other AI companies might adjust their data-sourcing practices, seek more licensing agreements, or develop technical safeguards to avoid memorization, impacting future AI model releases and industry norms.

Frequently Asked Questions

What does 'memorizing' content mean in this context?

In this context, 'memorizing' refers to AI models like GPT-4 storing and reproducing near-verbatim copies of copyrighted text from sources like Encyclopedia Britannica during training, which the lawsuit claims leads to unauthorized outputs that mimic the original content without permission.

How could this lawsuit affect everyday users of ChatGPT?

If the lawsuit leads to stricter copyright enforcement, OpenAI might limit ChatGPT's responses or implement filters to avoid reproducing copyrighted material, potentially reducing the detail or accuracy of information on certain topics. It could also result in higher costs for AI services if companies must pay licensing fees, impacting subscription prices or access.

What is the 'fair use' defense that OpenAI might use?

OpenAI might argue that using copyrighted content for AI training falls under 'fair use,' a legal doctrine allowing limited use without permission for purposes like education or research, by claiming it transforms the data into a new, non-infringing AI system. However, publishers counter that verbatim copying for commercial gain does not qualify as fair use, making this a key point for courts to decide.

Are there similar cases against AI companies?

Yes, there are multiple similar cases, including lawsuits from The New York Times, authors, and artists alleging that AI companies used their copyrighted works without permission for training models. These cases collectively challenge the data practices of the AI industry and could lead to broader legal standards for content usage.

What might be the long-term impact on AI development?

Long-term, this lawsuit could force AI companies to adopt more transparent and licensed data sources, potentially slowing innovation or increasing costs, but also encouraging ethical practices and partnerships with content creators. It may spur new technologies for training AI without memorization or lead to industry-wide standards for copyright compliance in AI models.

}
Original Source
On Friday, Encyclopedia Britannica and dictionary publisher Merriam-Webster filed a lawsuit against OpenAI alleging that it used their copyrighted content to train its AI, then generated responses that were "substantially similar" to their content, as previously reported by Reuters . According to Britannica, OpenAI repeatedly copied its content without permission, stating, "GPT-4 itself has 'memorized' much of Britannica's copyrighted content and will output near-verbatim copies of significant portions on demand. The memorized examples are unauthorized copies that [OpenAI] used to train their models, including GPT-4." The lawsuit goes on … Read the full story at The Verge.
Read full article at source

Source

theverge.com

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine