Insights

Thomson Reuters v. Ross Intelligence: Copyright, Fair Use, and AI (Round One)

Competitor's use of copyrighted material to train a legal research AI tool was not fair use—but questions remain for other AI cases

By James Rosenfeld, Sarah Wood, Haley Zoffer, Shannon L. McNeal, and Christopher W. Savage

02.14.25

This month, a federal judge rejected an AI startup's claim that using copyrighted material to train its AI system was permissible under the fair use doctrine. The decision—Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc., No. 1:20-CV-613-SB (D. Del. Feb. 11, 2025)—marks the first time a court has rejected a fair use defense in this context.

The district court's ruling is surely only the first round in the ongoing legal battles between rights owners, large language models, and the generative AI industry.

Background

Thomson Reuters has a database of nearly every judicial decision from anywhere in the United States. It creates a headnote—a concise statement of a court's decision on a legal issue—for every issue addressed in every decision. The headnotes give lawyers a quick summary of the points of law in a decision. But in addition to creating headnotes, Thomson Reuters also assigns each headnote a Key Number, based on the specific legal issue the headnote (and thus the decision) deals with. Every decision in the database addressing a given legal issue has a headnote with that same Key Number. Lawyers looking at a decision can use the Key Number System to quickly find other decisions addressing the same legal issues.

Around 2015, Ross Intelligence, Inc. (Ross) began building an AI legal search tool that would use natural language processing (NLP) to let users retrieve decisions relevant to their research by presenting questions in plain, conversational language. To make its NLP AI system work, Ross had to train the AI using a "supervised learning" approach—posing a large number of natural-language legal research questions, looking at the decisions the AI returned in response, and telling the AI whether its responses were correct. This process trains a neural-network-based AI so that over time its responses become increasingly reliable.

This process necessarily requires a preexisting understanding of which decisions are relevant to a given natural language question, and Thomson Reuters' headnotes and Key Number System provided exactly what Ross needed. So, Ross initially asked to license this Westlaw data to train its AI, but Thomson Reuters declined. Ross then engaged a third party, LegalEase Solutions, to generate "Bulk Memos" summarizing a wide range of legal issues and identifying relevant cases. Ross used the Bulk Memos to train its AI.

But LegalEase created the Bulk Memos using Westlaw's resources—including both the headnotes and the Key Number System, which the Bulk Memos closely resembled. After Thomson Reuters learned what Ross had done, it sued Ross in federal court, claiming that these uses infringed copyrights in its headnotes and Key Number System. Ross argued that its use of the Bulk Memos was fair use and therefore not an infringement. Both parties moved for summary judgment. The court initially denied those motions, but, after reconsidering, 3rd Circuit Judge Stephanos Bibas, sitting by designation on the district court, invited both parties to submit renewed motions for summary judgment. After considering the renewed motions, the judge changed his mind and granted partial summary judgment in favor of Thomson Reuters, ruling that what Ross had done was not fair use.

The Decision

The court first ruled that by using the Bulk Memos, Ross had infringed Thomson Reuters' copyrights in the headnotes.[1] It then found that Ross's activity was not fair use.

Direct Copying of Copyrighted Works

Only "original" works are copyrightable. 17 U.S.C. § 102(a). Ross argued that the headnotes were not sufficiently original so that using them could not be infringement. Judge Bibas rejected this claim, concluding that the headnotes—both as individual works and as a compilation—clear the "minimal threshold" of creativity required for copyright protection. The headnotes, he said, "introduce creativity by distilling, synthesizing, or explaining part of an opinion." He not only ruled that headnotes embodying content created by Thomson Reuters were subject to copyright, the judge also ruled that headnotes that merely "quote judicial opinions verbatim" were sufficiently original. He reasoned that even a direct quote from an opinion "is a carefully chosen fraction of the whole," the selection of which "expresses the editor's idea about what the important point of law from the opinion is," regardless of whether or not the underlying judicial opinion is subject to copyright.

The court then addressed whether Ross had copied these original elements in Thomson Reuters' headnotes, finding that of the nearly 3,000 headnotes amenable to resolution on summary judgment, more than 2,200 had been directly infringed. Using a "substantial similarity" analysis, Judge Bibas identified more than 2,200 headnotes with corresponding Bulk Memo questions that "look[ed] more like a headnote than ... the underlying judicial opinion." Next, noting that the substantial similarity inquiry requires evaluating "whether an ordinary user of a product would find [the allegedly infringing work] substantially similar to the copyrighted work," Judge Bibas found that "[a]s a lawyer and a judge," he himself was uniquely "well positioned" to answer that question—which he did, after "having slogged through all 2,830 headnotes," holding that Ross had actually copied 2,243 headnotes "whose language very closely tracks the language of the Bulk Memo question but not the language of the case opinion."

Fair Use

Having found that Ross had copied the headnotes and that the headnotes were subject to copyright, Judge Bibas moved on to Ross's claims that its use of the headnotes was not infringing because it was fair use. He considered in turn each of the four fair use factors. See 17 U.S.C. § 107.

Purpose and Character

Fair use claims often turn on a defendant's argument that the purpose and character of its use of copyrighted material are sufficiently "transformative" to be permissible and non-infringing. Here, Judge Bibas found it difficult to ignore the simple fact that "Ross took the headnotes to make it easier to develop a competing legal research tool." That is, he concluded that Ross was using the headnotes for essentially the same exact purpose for which Thomson Reuters had created them—facilitating legal research. This conclusion drove much of the court's fair use analysis.

An AI—even a legal research AI—does not, itself, read or understand natural language at all, much less natural language questions about legal opinions. Instead, a neural-network based NLP AI converts natural language into a complicated set of "vectors" that express the mathematical relationships among different words and phrases and uses those mathematical relationships to determine which natural language inputs—here, questions about legal issues—correspond most clearly to which natural language outputs—here, legal opinions in the AI's database.

Judge Bibas recognized this when he noted that Ross had turned the headnotes "into numerical data about the relationships among legal words to feed into its AI." But while he believed this made the purpose and character factor "much trickier," he ultimately held that Ross's use was not transformative. The key for Judge Bibas was that the ultimate purpose of Ross's use of the headnotes was to create an AI search tool that "retrieves judicial opinions"—which is exactly what Thomson Reuters' headnotes and Key Number System are designed to do. Consequently, Judge Bibas concluded that Ross's use of the headnotes did not "have a 'further purpose or different character' from Thomson Reuters's."

This aspect of the ruling is the key factor that may distinguish it from other pending AI copyright cases: whether using copyrighted material to train a generative AI model—one that creates new material as output, rather than simply retrieving preexisting material—would be a transformative use. The judge himself recognized this distinction, stressing that "only non-generative AI" was at issue.

Finally, on the "purpose and character" factor, Judge Bibas also rejected Ross's argument that its copying of the headnotes occurred only at a permissible "intermediate step"—training queries for its AI. He distinguished the cases supporting that theory on the grounds that they all concerned computer code, not "written words." In his view, intermediate copying in the code cases was "necessary" to ensure compatibility between computer programs.

Nature of the Work

Although Judge Bibas conceded that the perhaps minimally creative nature of Thomson Reuters' headnotes tipped the scales on this factor in Ross's favor, he downplayed it, noting that this factor "rarely play[s] a significant role in the determination of a fair use dispute."

Amount and Substantiality

This factor also favored Ross. Here, Judge Bibas held that it did not matter whether the unpublished Bulk Memo inputs had copied all or substantial parts of Thomson Reuters' headnotes. What mattered was the fact that Ross's public-facing outputs did not include the protected headnotes at all.

Market Effect

Judge Bibas found this to be the most important fair use factor in this case and found that it favored Thomson Reuters. In considering the effects of Ross's copying, Judge Bibas evaluated both the existing market for legal research platforms and a potential derivative market for creating legal-AI training data. Similar to his analysis of the "purpose and character" of Ross's use, Judge Bibas here zeroed in on the fact that Ross "meant to compete with Westlaw by developing a market substitute."

What's Next

NLPs vs LLMs

As suggested above, the importance of this case in the ongoing conflicts between AI developers and copyright holders may be limited by the differences between both the data required to train NLP models versus LLMs and the ways in which these two types of AI are used. NLP-based tools are designed to enhance research by prioritizing the machine's comprehension and analysis of specific human-generated queries rather than generating human-like text. To accomplish this precise information retrieval, NLP systems use supervised learning, requiring carefully curated and labeled training data to ensure precise and contextually relevant results. LLMs, on the other hand, primarily use unsupervised learning based on vast amounts of unlabeled text data, identifying language patterns without the need for the annotated datasets utilized by Ross.

A Landmark Moment for Fair Use?

Thomson Reuters represents the first significant court opinion on whether the fair use doctrine protects the use of copyrighted materials to train an AI. The ongoing surge in generative AI applications has presented federal courts with numerous copyright disputes involving computer code, music, literature, images, and artwork. Stakeholders across the AI ecosystem are keen to extract relevant insights from these rulings.

However, architects, developers, and others involved with generative AI may have different fair use defenses than Ross's. Judge Bibas specifically confined his analysis to non-generative AI scenarios. Unlike Ross, AI developers using copyrighted material to train generative AI models—which produce original (though, at present, not copyrightable) content—may argue that their use is transformative. It remains to be seen how courts will respond to those arguments.

Even so, Thomson Reuters does highlight the continued need for AI developers to exercise caution when deciding what to use for training data. The financial burden of this lawsuit forced Ross to cease operations. AI developers should use uncopyrighted data when possible, obtain licenses if available, and, when obtaining training data from a third party, always seek indemnification against claims that using the material to train an AI would violate IP rights. In Thomson Reuters, even though Ross asserted that it had no control over LegalEase and had no knowledge of its use of Westlaw's copyrighted headnotes, the court found Ross to have infringed Thomson Reuters' copyrights. This, too, suggests that developers should seek audit rights and additional warranties around the provenance and quality of training data that they acquire from others.

Finally, this ruling may change the depth of due diligence necessary for seed investors and venture capitalists as they consider financing AI startups. This would mark a shift away from founder-centric diligence and could potentially limit access to capital for boundary-pushing startups.

Implications for Copyright Holders

For these same reasons, copyright holders should not be overly confident as a result of this ruling. The type of copyrighted work involved in Thomson Reuters occupies a unique position between largely functional computer code and more expressive works. Also interesting is the fact that Judge Bibas noted that he and other members of the judiciary are regular users of Westlaw's technology. In cases that do not involve legal research tools, judges may be less familiar with the technology and, therefore, might be less inclined to decide fair use cases on summary judgment rather than sending them to a jury.

[1] The court did not resolve the infringement claims regarding Ross's use of the Key Number System or regarding approximately five hundred judicial opinions that apparently contained or reflected Thomson Reuters' own editorial decisions.