400+ Local Newspapers Hit OpenAI and Microsoft with Massive Copyright Lawsuit Over Copilot Training Data

Nearly 400 local and regional newspapers across dozens of U.S. states filed a federal lawsuit against OpenAI and Microsoft on June 24, 2026, in the Southern District of New York, accusing the tech giants of using millions of copyrighted news articles without permission to train their generative AI models, including Microsoft Copilot. The suit, which represents one of the largest collective actions by publishers against AI companies to date, alleges that the systematic scraping and reproduction of journalistic content constitutes massive copyright infringement and violates the Digital Millennium Copyright Act (DMCA) by stripping copyright management information from the articles.

The plaintiffs—spanning community weeklies, mid-sized dailies, and regional chains—argue that OpenAI’s ChatGPT and Microsoft’s Copilot ingested their reporting to build commercially successful products that directly compete with the struggling news industry, all without compensation or consent. The case injects fresh urgency into the global debate over AI training data, fair use, and the survival of local journalism.

A Coalition of the Small and Midsized Fights Back

The lawsuit consolidates claims from 398 newspaper titles operating in 47 states, from the Mountain View Gazette in Colorado to the Pensacola Ledger in Florida. Unlike previous high-profile actions by large publishers such as The New York Times or The Wall Street Journal, this coalition is composed primarily of outlets with small newsrooms and limited legal budgets. They coordinated through a newly formed advocacy group called the Local Press Copyright Alliance (LPCA).

“These are not billion-dollar corporations with armies of lawyers,” said LPCA executive director Marisol Vega in a statement accompanying the filing. “Our members are the backbone of civic information. They cannot afford to let their lifeblood—original reporting—be siphoned off by Silicon Valley for profit.”

The complaint details how the defendants scraped newspaper websites and databases, bypassing paywalls and ignoring robots.txt directives, to assemble a corpus of at least six million articles dating back to the early 2000s. It cites internal OpenAI and Microsoft engineering documents—some of which have surfaced in prior litigation—that allegedly show deliberate efforts to incorporate high-quality, fact-checked news content because it improved the reliability and coherence of AI outputs.

The Copilot Connection

Microsoft’s Copilot is a central target because it integrates deeply into the company’s ubiquitous productivity suite and web browser, potentially exposing millions of users to regurgitated news content in real time. The plaintiffs claim that Copilot often reproduces verbatim passages from their articles, especially when users ask for summaries of local events or background on civic issues, without attributing the source or linking back to the publisher’s site.

One exhibit in the filing shows a Copilot-generated response about a school board meeting that lifts over 200 words directly from a copyrighted story by the Buckeye Valley Sentinel in Ohio. Another demonstrates how ChatGPT, when prompted with a hyperlocal crime story headline, reconstructed the entire lead and several paragraphs word-for-word. Such outputs, the newspapers argue, not only infringe their exclusive rights but also erode the incentive for readers to visit their websites, undercutting both subscription and advertising revenue.

The suit emphasizes that Microsoft, through its partnership with OpenAI and its own Copilot rollout, has amplified the harm. By baking GPT-based models into Edge, Bing, Windows, and Office, “Microsoft has made itself a distributor of infringing content on an unprecedented scale,” the complaint reads.

Legal Theories: Copyright, DMCA, and Beyond

The plaintiffs advance multiple legal claims, each carrying distinct risks for the AI industry:

Direct copyright infringement: Unauthorized reproduction of articles during training and in outputs. The newspapers hold registered copyrights for the works, which entitles them to statutory damages of up to $150,000 per work if willful infringement is proved—a potential exposure that could run into the billions.
Contributory and vicarious infringement: Microsoft and OpenAI are accused of enabling and profiting from direct infringement by users who prompt the systems to generate infringing text.
Violation of the DMCA’s Section 1202: The defendants are alleged to have intentionally removed or altered copyright management information (CMI)—such as author names, publication dates, and copyright notices—from the articles before feeding them into training pipelines. Because Section 1202 provides for statutory damages of $2,500 to $25,000 per violation, this claim could multiply the financial stakes dramatically.
Unjust enrichment and unfair competition: The complaint argues that the AI companies have unfairly profited from the newspapers’ labor and investment, creating products that directly compete with the plaintiffs’ own news distribution.

The newspapers are seeking an injunction to stop the defendants from using their content without a license, destruction of all models trained on infringing data, and significant monetary damages.

The Fair Use Question

The case is likely to turn on the doctrine of fair use, which allows limited use of copyrighted material without permission for purposes such as criticism, comment, news reporting, teaching, or research—and, as tech companies argue, for transformative uses like training AI. Courts weigh four factors: the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect on the potential market for or value of the work.

OpenAI and Microsoft have consistently maintained that training on publicly available web content is fair use. In earlier litigation with The New York Times, they argued that language models do not “copy” works in the traditional sense; instead, they learn statistical patterns to generate new content, a process that is transformative and does not supplant the original market for news. The companies point to the fact that outputs often synthesize information from multiple sources rather than reproducing entire articles verbatim.

However, the local newspapers’ complaint highlights instances of near-verbatim reproduction—something that undercuts the transformative argument. If the models can be shown to memorize and regurgitate large chunks of copyrighted text, the fair use defense becomes harder to sustain. The Supreme Court’s 2023 ruling in Andy Warhol Foundation for the Visual Arts v. Goldsmith, which emphasized the commercial character of infringing uses, may also weigh against the AI makers, particularly as their products directly compete with news outlets.

A Different Kind of Plaintiff

What sets this lawsuit apart is the plaintiffs’ focus on the survival of local news. The complaint includes detailed economic analyses showing that since GPT-3’s release in 2020, digital traffic to participating newspapers has declined by an average of 23%, with some small outlets seeing drops of over 40%. While myriad factors affect readership, the suit attributes a significant portion of the decline to users obtaining information from AI chatbots instead of visiting news websites.

“This isn’t just about lost clicks,” the filing states. “It’s about the hollowing out of local democracy. When a newspaper closes a city hall bureau because its revenue model collapses, the community loses more than a website—it loses an accountability mechanism.”

The LPCA has released supporting data showing that nearly one in five local newspapers in its coalition has reduced staff or frequency of publication since 2023, correlating with the rise of AI-driven search and summarization tools. Several hundred closed permanently.

Industry and Policy Implications

The lawsuit lands at a time when policymakers are grappling with AI regulation. In the U.S., the Copyright Office continues to study the issue but has yet to issue definitive guidance. Congress has held hearings but passed no comprehensive AI copyright legislation. Meanwhile, the European Union’s AI Act imposes transparency requirements on general-purpose AI models, including disclosure of copyrighted training data—requirements that could force U.S. companies to alter their practices globally.

Tech companies have pursued licensing deals with large publishers: OpenAI has agreements with Axel Springer, the Associated Press, and others; Microsoft has paid some outlets for access to archives. But these deals typically exclude small and midsized newspapers, leaving them with no compensation mechanism. The lawsuit seeks to redress that imbalance, demanding that any licensing framework cover all content creators, not just major names.

Should the newspapers prevail, the ruling could force a fundamental restructuring of how AI models are trained. Companies might be required to obtain licenses for any copyrighted material used in training, or to implement robust opt-out mechanisms and content filtering. That could slow innovation and increase costs, but it could also create a revenue stream for cash-strapped publishers.

Microsoft’s Dual Role

Microsoft’s position is particularly complex. The company has invested billions in OpenAI and positioned Copilot as the AI entry point for its massive enterprise and consumer base. Simultaneously, Microsoft has a long history of working with news organizations through its Microsoft News platform and has publicly championed journalism sustainability. The complaint calls out this tension: “Microsoft cannot on the one hand proclaim its support for local news while on the other systematically dismantling the economic foundation of those same organizations.”

In a statement, a Microsoft spokesperson said the company “respects copyright and is committed to working with publishers to build a sustainable future for news in the AI era.” OpenAI did not initially respond to a request for comment, but in prior cases it has argued that training data usage is lawful and that the company offers publishers tools to control how their content is accessed.

What Comes Next

The case, Local Press Copyright Alliance v. OpenAI LP et al., will likely proceed in parallel with other major copyright battles. A judge has yet to be assigned, but legal observers expect a long discovery process featuring intense technical deposition about how exactly training datasets were compiled and how often models emit verbatim text. The newspapers have requested a jury trial.

Early motions to dismiss are anticipated, with the defendants likely to argue that the DMCA claims fail because the removal of CMI was not intentional or because the information was not conveyed as part of a “work” as defined by the statute. They may also challenge standing, asserting that the newspapers cannot identify which specific articles were used or prove substantial similarity for each.

But the coalition’s sheer size gives it leverage. With hundreds of registered copyrights and documented examples of verbatim output, the plaintiffs may survive dismissal and push toward a landmark settlement or ruling. News industry analysts predict that a quick resolution is unlikely; the case could take years to resolve through trial and appeals.

For the billions of users who rely on AI tools daily, the case raises fundamental questions about who owns the knowledge these systems ingest. The answers will shape not only the future of journalism but also the very architecture of the knowledge economy.