The New York Times Launches a Very Strong Case Against Microsoft and OpenAI

It seems that The New York Times Company (“The Times”) got fed up with the pace of its negotiations with Microsoft and OpenAI over their use of The Times’ content for training and running their LLMs. So much so that The Times filed a post-Christmas complaint against the two, likely knowing full well they’d lay waste to the winter vacations of hundreds of people working for OpenAI and Microsoft. It might be the most well-known AI-related case to date because the case isn’t a class action and the plaintiff is globally recognized.

The complaint alleges:

  • Copyright infringement against all defendants (related to handling of the datasets containing content from The Times, handling of models allegedly derivative of the datasets, and the ultimate output)
  • Vicarious copyright infringement (the idea that Microsoft and various OpenAI affiliates directed, controlled and profited from infringement committed by OpenAI OpCo LLC and OpenAI, LLC)
  • Contributory copyright infringement by all defendants (the idea that the defendants contribute to any infringement perpetrated by end users of the models)
  • DMCA Section 1202 violations by all defendants regarding removal of copyright management information from items in the datasets
  • Common law unfair competition by misappropriation by all defendants (related to training AI models on The Times’ content and offering AI services that reproduce The Times’ content in identical or substantially similar form (and without citing The Times or linking to the underlying content))
  • Trademark dilution by all defendants (arguing the the AIs dilute the quality associated with The Times’ trademarks by falsely claiming certain content originates from The Times)

Unlike other complaints, this one doesn’t spend too much time explaining how AI models work or teeing up the analogies they plan to use in court. Instead, the complaint includes multiple extremely clear-cut examples of the LLMs spitting out The Times’ content nearly verbatim or stating bald-faced lies about The Times’ content. Many of the other complaints admitted they weren’t able to find clear-cut examples of infringing output, nebulously resting their claims on the idea that all output is, by definition, infringing. Here, Microsoft and OpenAI haven’t just used The Times’ content to teach the AI how to communicate, they’ve launched news-specific services and features that ingest both archived content and brand new articles from The Times. The other plaintiffs also weren’t able to argue that their specific content, out of the trillions of pieces of training data in the datasets, was particularly important for creating quality AIs. Here, The Times convincingly argues that its content was extremely valuable for training the AIs, both because of the quantity involved as well as the fact that the training process involved instructing the AI to prioritize The Times’ content.

This is probably the strongest AI-related complaint out there. I think it’s quite possible that a jury or judge angry at Microsoft and OpenAI for offering services that compete with and undercut The Times is more likely to also find that the training activities constituted copyright infringement and that the model itself is a derivative work of the training data, without thinking too hard about a scenario where the ultimate model doesn’t supplant the business or livelihood of the copyright holders in the training data. It’s definitely a case where “bad facts invite bad law.”

This case is also notable for the fact that it explicitly goes after the defendants for their AIs’ hallucinations. An AI summarizing a news event based on one or more news articles opens a Pandora’s box worth of debate about the line between uncopyrightable facts and copyrightable expression, as well as how/if those same standards should be applied to a computer “reading” the news. But the hallucinations aren’t facts; they’re lies. And even if the defendants prevail in arguing that the AIs are mostly just providing people with unprotectable facts, there’s very little to shield them from liability for the lies, both with respect to trademark dilution claims, but also with respect to potential libel or privacy-related claims that might be brought by other individuals. Copyright law can forgive a certain amount of infringement under certain circumstances but these other areas of law are far less flexible.

The other really interesting thing about this complaint is the extent to which it describes the business of The Times – how much work the journalists put in to create the articles, the physical risks they take during reporting, the value of good journalism in general, and The Times’ struggle with adjusting to an online world. The complaint paints a picture of an honorable industry repeatedly pants-ed by the tech industry, which historically has only come to heel under enormous public pressure and the Herculean efforts of The Times to continue to survive. It’s interesting because US copyright law decisively rejects the idea that copyright protection is due for what is commonly referred to as “sweat of the brow.” In other words, the fact that it takes great effort or resources to compile certain information (like a phonebook), doesn’t entitle that work to any copyright protection – others may use it freely. And where there is copyrightable expression, the difficulty in creating it is irrelevant. So, is all this background aimed solely at supporting the unfair competition claim? Is it a quiet way of asking the court to ignore the “sweat of the brow” precedent, to the extent that it’s ultimately argued by the defendants, in favor of protecting the more sympathetic party? Maybe they’re truly concerned that the courts no longer recognize the value of journalism and need a history lesson? No other AI-related complaint has worked so hard to justify the very existence, needs, and frustrations of its plaintiffs.

Unless Microsoft and OpenAI hustle to strike a deal with the New York Times, this is definitely going to be the case to watch in the next year or two. Not only does it embody some of the strongest legal arguments related to copyright, it is likely to become a lightning rod for many interests who will use it to wage a proxy war on their behalf. The case, and especially the media coverage of the case, will likely embitter the public and politicians even further against big tech, treating its success as a zero sum game vis a vis journalists and creators more broadly. It’s the kind of case that ultimately results in federal legislation, either codifying a judgment or statutorily reversing it. 

One thought on “The New York Times Launches a Very Strong Case Against Microsoft and OpenAI

Comments are closed.