The New York Times Is Spending Millions To Fight AI. Will It Be Worth It?
Imagine you’re the CEO of a large media company. You’ve guided your company through the shift to a digital media ecosystem, from the decline of print to the advent of the all-powerful social media algorithm. Critics have predicted your company’s death a million times, but somehow, you’ve found a way to survive. Maybe you paywalled your content and built a loyal subscriber base. Maybe you’ve discovered a new revenue stream through producing word games. It hasn’t been easy, but you’re still standing.
Then along comes ChatGPT, the language learning model (LLM) from OpenAI that can provide information about any topic under the sun. While your journalists may take months to develop a story, ChatGPT presents the same information in seconds, tailored to the specific needs of the user. How did this AI learn so much so fast?
The answer, unfortunately for you, is from your content. If your outlet publishes content on the web, OpenAI likely scraped it, threw it into a massive database with billions of other pieces of online content, and used that database to “train” ChatGPT. ChatGPT is essentially a prediction machine. When a user asks a question, it draws from billions of data points, including your journalism, to predict the right answer. While OpenAI itself is secretive over the exact data it uses to train its models, the dataset is so large that it almost certainly includes content from every major media outlet on the internet.
How would a media CEO like yourself respond to such a jarring new technology? Would you strike a deal with OpenAI, allowing them to use your content for training purposes in exchange for $16 million annually, like DotDash Meredith? Would you then give interviews to tech journalists citing the inevitability of OpenAI and the importance of collaborating with them on a shared future, like The Atlantic? Or maybe you’d try the New York Times approach, suing OpenAI and its partner Microsoft on several counts of copyright infringement and trademark dilution. While their rival outlets cut deals, The New York Times spent $10.8 million on AI litigation costs in 2024 alone. 14 months after their initial Complaint and still months away from any potential trial, it seems like they’re willing to spend much more than that.
The NYT’s allegations and OpenAI/Microsoft Defenses
The New York Times enlisted the help of Susman Godfrey, an elite boutique litigation firm that netted Dominion Voting Systems a $787.5 million defamation settlement with Fox News. In a Complaint filed in December 2023, the NYT told the Southern District of New York that it provides a service that “has grown even more valuable to the public” in a “damaged information ecosystem that is awash in unreliable content.” OpenAI’s use of NYT content is a “free-ride on The Times’s massive investment in its journalism” and the company “quickly became a multi-billion dollar for-profit business built in large part on the unlicensed exploitation of copyrighted works belonging to The Times and others.”
In essence, The NYT alleged that (1) OpenAI and Microsoft violated NYT copyright by creating copies of its content and using it for training purposes, (2) the user-facing ChatGPT or Microsoft AI then becomes a “contributory” copyright infringer by returning content to the user that is either a copy of or derivative of NYT content, and (3) the AI sometimes engages in a separate copyright offense by reproducing verbatim or almost verbatim copies of NYT articles.
There’s also an unfair competition claim (why pay for journalism if AI can give it to you for free), and a trademark dilution claim. The dilution claim comes from perhaps the opposite of stealing content: LLM’s “hallucinate” content and then attribute it to the NYT. In response to a query asking about healthy foods, Microsoft’s LLM told a user that the NYT named red wine “one of the 15 most heart healthy foods.” Wouldn’t that be nice?
The NYT did not claim a specific amount of monetary damages, but however you calculate it they are seeking upwards of $100 million dollars. If the NYT typically licenses articles at $10 each and OpenAI used all 16 million unique records of content available on the NYT website, the total would come to $160 million. If a court found OpenAI liable for willful violations of copyright, that number could increase exponentially. Perhaps more importantly, the lawsuit seeks the court-ordered destruction of “all GPT or other LLM models and training sets that incorporate Times Works,” essentially burning all of OpenAI’s generative artificial intelligence technology to the ground.
OpenAI and Microsoft both filed motions to dismiss some of the NYT’s claims, focusing on the contributory infringement and trademark dilution claims. They argued that they were not “contributory infringers” to end-user infringement on NYT copyrights because the NYT provided no actual examples of users stealing proprietary content via AI and because neither defendant has any knowledge of such stealing. Using AI in that way is against their terms of service. The defendants have emphasized that any end-user infringement or misappropriation in the Complaint came from the NYT’s lawyers trying to get the AI to infringe copyright, and cases like that are uncommon. A federal judge heard oral argument on the motions to dismiss this month and a decision is forthcoming.
OpenAI and Microsoft think they can beat back the NYT’s other claims with an affirmative defense called Fair Use. The federal copyright act on which the NYT based its claims has a built-in exception for the “fair use” of copyrighted material. Examples of Fair Use specifically named in the law include criticism, news reporting, teaching, or research.
A Fair Use defense requires the application of a multi-factor test. Among the factors considered are the purpose and character of the use and the potential negative effect ChatGPT and other AI tools have on the NYT as a business. The NYT will likely argue that this case exceeds what is allowed under Fair Use, because proprietary content was important to training ChatGPT and because ChatGPT now competes with the NYT as a source of information. Alternatively, we can expect OpenAI and Microsoft to claim that NYT content was just one small part of LLM training and that ChatGPT does not draw traffic away from the NYT in any meaningful way.
Either way, this is a novel application of a 1990s copyright law, leaving legal experts split on which interpretation is most convincing. These fair use guidelines were originally written years before Katie Couric and the TODAY Show discovered the meaning of the term “internet.” We are in uncharted territory.
Why go through all this trouble when they could cut a deal?
The NYT almost certainly could have followed other media outlets in cutting a deal with OpenAI for the use of their content. Both sides admit that the lawsuit came about after months of content licensing negotiations failed. The Atlantic CEO Nicholas Thompson says the benefits of cutting a deal are worth letting any potential copyright transgressions lie in the interests of a mutually beneficial future. In addition to training data, The Atlantic’s deal also includes a partnership in which OpenAI will help the magazine implement AI tools on its website. The deal also included language directing OpenAI’s search function, implemented into ChatGPT in November 2024, to attribute and link to Atlantic stories when appropriate. The New York Times isn’t getting that kind of treatment.
There’s also the most practical incentive of all: the financial one. OpenAI has almost certainly already trained its model on any given news site’s data. Why not collaborate with them on the next iteration and bring in an extra few million annually as part of the deal?
The New York Times believes itself to be a special case, which might be why it's forging its own path. According to the Complaint, OpenAI’s training data disproportionately used NYT content for training data and weighed it more heavily in training protocol, perhaps giving the NYT the strongest copyright claim among media outlets. There’s also the fact that they’re one of the few media outlets in the country with the financial means to slog through an incredibly expensive legal fight. Facing tough financial headwinds and undergoing several rounds of layoffs, Dotdash Meredith decided they’d rather take 16 million annually to work with OpenAI than spend 10 million annually fighting them, and it’s hard to blame them for that approach.
Then there’s the principle of taking a stand against generative AI, which many journalists regard as an existential threat. Writers unions from The Atlantic and Vox released statements following their companies’ respective deal announcements expressing concern over both the concept of partnering with OpenAI and the opaqueness of the deals themselves. The NYT Complaint mainly frames the copyright claims in the context of overall harm to itself but does acknowledge societal interests as well. “If The Times and its peers cannot control the use of their content, their ability to monetize that content will be harmed.” the Complaint reads. “Less journalism will be produced, and the cost to society will be enormous.”
The NYT’s true motivation probably lies somewhere in the middle of protecting journalism as an industry and themselves as a singular business entity. The two concepts are inextricably linked, and exactly how much of each is at play is a matter of opinion. As Harvard Law technology expert Mason Kortz told Harvard Law Today: “I don’t think The New York Times would be averse to getting a good decision in favor of publishers on this. But I don’t think that’s why they’re engaging in this litigation.”
It’s worth noting that Atlantic CEO Nicholas Thompson thinks his deal ultimately helps the NYT lawsuit because it sets the market for high-quality journalistic content. As he told the Verge: “getting a fair exchange of value is good precedent for our industry.” However, he’s not willing to disclose how much money actually changed hands. The market has been set, we just don’t know where.
The state of the lawsuit, 14 months later
That $10.8 million went to a variety of filing and discovery-related disputes throughout 2024. Did the NYT engage in “prompt hacking” to create misleading evidence of copyright violation? Should Microsoft and OpenAI have access to confidential NYT reporter files related to its reporting? Did OpenAI delete key evidence during discovery? Lawyers fought through all of these issues and more in court while legal fees piled up.
Barring a settlement, none of this is wrapping up anytime soon. Since OpenAI and Microsoft didn’t challenge all of the copyright claims in the original Complaint, at least some of the seven original claims will be allowed to proceed. Now joined by co-plaintiff the New York Daily News, the parties are currently slogging through several dozen discovery disputes. Discovery in cases with this much on the line can take years before the case even proceeds to trial, if at all.
In the meantime, media outlets without the financial means or appetite for a fight with tech giants will continue to either acquiesce or enter deals. This lawsuit may even extend past the end of some of the existing media/OpenAI licensing deals. The Atlantic’s deal with OpenAI only extends through the end of this year. What happens after that? Now that OpenAI has had training access to high quality journalism for years, will it still be worth as much to use that data again in 2026?
The New York Times chose to skip the short term deals in favor of an existential legal battle. If they win, they’ll open up Microsoft and OpenAI to an apocalyptic level of liability. If they lose, generative AI companies will have expansive new access to journalistic content through fair use. Or, the parties will settle, and the two sides will decide just how much money will be required to put these legal, ethical, and philosophical concerns to rest.
I asked ChatGPT about the case and it told me the outcome will depend on various factors. Hard to argue with that.