A coalition of local and regional newspaper publishers launched a federal lawsuit in New York on June 24, 2026, against OpenAI and Microsoft, alleging the companies unlawfully used content from nearly 400 U.S. newspapers to train their AI models, including ChatGPT and Microsoft Copilot. The complaint, filed in the Southern District of New York, accuses the tech giants of copyright infringement and seeks both damages and a permanent injunction to halt further unauthorized use of the publications’ work. The legal action marks one of the largest collective legal challenges yet from the news industry against the generative AI sector, underscoring mounting tensions over the boundaries of fair use in training large language models.
The lawsuit represents a diverse array of small and mid-sized newspapers—daily and weekly outlets from all 50 states—that are members of several press associations. These publications argue their journalism was scraped from the web without consent, compensation, or proper attribution, and then fed into models that power widely used tools. Unlike previous suits from individual publishers like The New York Times, this case highlights the plight of local newsrooms, which often operate on razor-thin margins and have seen their content become a core data source for AI systems that now compete with them for audience attention.
The Allegations: Systemic Copyright Violations
At the heart of the complaint is the claim that OpenAI and Microsoft copied millions of newspaper articles en masse by hoovering up publicly accessible web content, including paywalled stories, through automated crawlers. The publishers allege this activity violated their copyrights and the terms of service of their websites. Once ingested, the text was used to train generative AI models that can reproduce substantial portions of articles verbatim, paraphrase them, or synthesize new content closely mimicking the originals—all without any licensing agreement.
The lawsuit specifically targets the training of two product families: OpenAI’s GPT models (including those behind ChatGPT) and Microsoft’s Copilot assistant, which is deeply integrated into Windows, Edge, and the Microsoft 365 suite. Plaintiffs argue that Microsoft, as both a direct developer and an investor in OpenAI, bears responsibility for the infringement. They point to Copilot’s ability to surface news snippets and summaries within its chat interface as evidence that the AI tool directly competes with their own digital news products, effectively siphoning away readers and advertising revenue.
Microsoft Copilot in the Crosshairs
Microsoft has positioned Copilot as a productivity and search companion, embedding it into the Windows taskbar and making it a default feature in its Edge browser. The tool relies heavily on large language models trained on vast datasets that include copyrighted news articles. For the newspaper plaintiffs, this integration creates a particularly direct threat: a Windows user can ask Copilot for today’s headlines or a summary of local events, and the AI will generate a response that may pull facts, quotes, and narrative structure from stories published that very morning by the local paper—without ever sending traffic to the paper’s site.
The complaint details instances where Copilot’s responses included near-verbatim excerpts from articles behind paywalls, suggesting that the underlying model had memorized substantial portions of the training data. Such reproduction, the lawsuit contends, goes far beyond transformative use and cuts to the heart of a publisher’s economic model. For cash-strapped community newspapers, which rely on digital subscriptions and page views to survive, the rise of AI-powered news summarization could accelerate an already grim decline in local journalism.
A Growing Wave of Legal Action
This filing joins a growing list of copyright lawsuits against AI developers. Major media organizations like The New York Times, Getty Images, and a group of eight newspapers owned by Alden Global Capital have already sued OpenAI and others. But the new case stands out for its scale—it consolidates claims from hundreds of small publishers—and for its explicit focus on Microsoft’s Copilot, which the plaintiffs describe as a “plagiarism engine” that launders original reporting. Earlier suits have often centered on ChatGPT alone; this one argues that Microsoft’s deep integration of Copilot across its ecosystem makes it an equally culpable party.
Legal experts note that the outcome of these cases could reshape the AI industry. If courts find that training AI models on copyrighted news content without licensing is not protected by fair use, it could force companies to strike expensive data deals with publishers or fundamentally alter how models are trained. Some publishers have already opted for licensing agreements, such as News Corp’s deal with OpenAI, but the vast majority of local outlets have not been offered such terms.
Fair Use vs. Economic Harm
OpenAI and Microsoft are expected to mount a robust defense centered on the doctrine of fair use, arguing that training AI models on publicly available data is a transformative, non-expressive use that does not supplant the market for the original works. In an earlier response to similar lawsuits, OpenAI asserted that its models do not memorize or reproduce training data in a way that violates copyright, and that any incidental appearance of training data is a rare bug. Microsoft has likewise maintained that Copilot is designed to respect publishers’ rights, offering tools like robots.txt directives and opt-out mechanisms.
The publishers, however, contend that fair use does not apply because the AI models generate content that directly competes with the original articles and does so at a scale that dwarfs human consumption. They point to the commercial nature of the AI products—ChatGPT and Copilot are premium, for-profit services—and the potential market harm as key factors weighing against fair use. The complaint also notes that the scraped content includes not only hard news but also op-eds, features, and investigative pieces that required significant investment to produce.
Impact on Local Journalism
Local newspapers have been hit hardest by the digital revolution. More than 2,500 newspapers have closed since 2005, and thousands more have cut staff and print frequency. The lawsuit paints a stark picture: by training AI on their content without compensation, OpenAI and Microsoft are “accelerating the demise” of an industry already reeling from revenue loss to social media platforms and search engines. The plaintiffs argue that if AI assistants can instantly and freely provide users with the same information that reporters spent hours uncovering, fewer people will subscribe to local news, leading to further news deserts.
For many of these small publishers, the lawsuit is a last-ditch effort to establish a legal framework that ensures they are paid for the use of their journalism. They seek not only monetary damages but also an order requiring the AI companies to delete any models trained on their content and to obtain explicit consent before future use. The case could test whether the generative AI boom was built on a foundation of intellectual property that legally belongs to thousands of creators and rights holders.
Microsoft’s Dual Role as Investor and Developer
Microsoft’s involvement adds layers of complexity. As a major investor in OpenAI—having reportedly poured billions into the startup—Microsoft has access to the same models that power ChatGPT. But the complaint also targets Microsoft’s own development of the Copilot experience, which may have included additional fine-tuning or custom training on news content. Internal emails cited by the plaintiffs allegedly show Microsoft executives discussing the use of news articles to improve response quality, though the details remain under seal.
The lawsuit further claims that Microsoft’s crawler, Bingbot, was used to scrape news sites even when they were marked with “noai” or similar directives in robots.txt files, a practice that would violate basic web standards. Microsoft has previously stated that it respects such directives, but the publishers say their logs prove otherwise. If proven, this could undermine the company’s defense that it acted in good faith.
What Comes Next
The case is likely to proceed slowly through the federal courts, with initial motions expected within months. Both sides will probably engage in discovery, which could force OpenAI and Microsoft to reveal details about their training sets—data they have been reluctant to make public. The outcome may rest on two pivotal questions: whether AI training constitutes fair use, and whether the outputs produced by ChatGPT and Copilot are infringing derivatives of copyrighted works.
In the meantime, the lawsuit could accelerate negotiations between AI companies and publishers. Microsoft has already launched a program to pay some news organizations for content, but the terms have been criticized as inadequate for small outlets. The case might pressure the industry to develop standardized licensing frameworks, much like what occurred with Google News and its related publisher funds in Europe under adjacent copyright laws.
For Windows users, the lawsuit brings attention to how deeply AI is woven into the operating system. Copilot is no longer an optional add-on but a central interface for interacting with Windows. If the plaintiffs prevail, Microsoft might need to modify Copilot’s news capabilities or strip out certain training data, potentially reducing the tool’s effectiveness. That could slow the adoption of AI-enhanced features in operating systems, where tech companies have bet big that users want all-in-one assistants.
Broader Implications for the AI Industry
This case is not just about newspapers; it tests the legal footing of the entire generative AI business model. Microsoft and OpenAI are not alone—Google, Meta, and Amazon have trained models on similarly vast corpora. A ruling against fair use could expose them to a cascade of litigation from photographers, book authors, musicians, and software developers. It might also spur regulatory action, as Congress and the Copyright Office weigh updates to IP law for the AI age.
Conversely, a ruling broadly in favor of fair use could embolden AI companies to deepen their integration with news content, potentially leading to consolidation where only the largest tech platforms control access to real-time information. That prospect worries not only publishers but also antitrust regulators who see AI as a new frontier for platform dominance.
As the lawsuit unfolds, it will serve as a bellwether for the relationship between journalism and AI. The local newspapers bankrolling this case hope to reclaim value from the very technology that threatens to displace them—a David-versus-Goliath fight that could define the information economy for decades to come.