A group of high-profile authors, including Kai Bird, Jia Tolentino, and Daniel Okrent, has filed suit against Microsoft, alleging the company used nearly 200,000 pirated books to train one of its artificial intelligence models.
The complaint, lodged in New York federal court, accuses the tech giant of building its Megatron AI system on unlicensed literary material, a move the authors claim constitutes large-scale copyright infringement.
The lawsuit is the latest in a wave of legal actions targeting major tech firms over the use of protected content to build generative AI tools.
The plaintiffs are seeking statutory damages of up to $150,000 per misused work, along with a court order to block further alleged infringement.
According to the filing, Microsoft relied on a dataset of illegally obtained digital books to train Megatron, an AI model that responds to text prompts by generating output that mimics the syntax, voice, and themes of its training data.
“The model is built on the work of thousands of creators,” the complaint states, “and designed to imitate their expression.”
Microsoft has not responded publicly to the allegations. An attorney for the authors declined to comment.
The timing of the lawsuit aligns with a growing legal debate over how far tech firms can go in training AI systems on copyrighted material.
Just a day earlier, a federal judge in California ruled that AI firm Anthropic may have fairly used copyrighted works under U.S. law—but could still be liable for sourcing those works through piracy.
In a related case, Meta prevailed in court this week over similar allegations, though the judge’s opinion cited the plaintiffs’ weak arguments more than a strong legal defense by the company.
These legal skirmishes are beginning to test the boundaries of U.S. copyright law as it applies to AI.
While tech companies argue that such training constitutes transformative “fair use,” rights holders are increasingly pushing back—often with the backing of major publishers, music labels, and media outlets.
The stakes are rising fast. The New York Times has sued OpenAI over the use of its archived articles.
Dow Jones, Disney, and NBCUniversal have filed or joined lawsuits targeting various AI firms for misusing text, music, and visual content.
Getty Images is also in litigation with Stability AI over the company’s use of its licensed photo library to train image-generation models.
Industry leaders such as OpenAI’s Sam Altman have defended the practice.
“The creation of ChatGPT would have been impossible without the use of copyrighted works,” Altman told regulators earlier this year, framing the issue as a necessary trade-off in building transformative technology.
The Microsoft lawsuit—especially given the number of works and the company’s high profile—adds new urgency to the broader regulatory questions surrounding AI development.
Legal precedent is only beginning to form, but the flood of litigation suggests that copyright law could become one of the most consequential battlegrounds in the race to shape the future of artificial intelligence.
Was this article helpful?
YesNo