Multiple shelves of library books.
Feature

What AI Executives Need to Know About Fair-Use Law

10 minute read
Chris Ehrlich avatar
By
SAVED
How can fair-use law protect companies using AI?

One unresolved legal question could change the entire artificial intelligence (AI) market: Can copyrighted data be used to train AI models and generate AI outputs under fair-use law?

Since the surge of generative AI tools in the marketplace, content creators, from media companies to artists, are suing major AI model companies over copyright infringement of their creations: The New York Times vs. Microsoft and OpenAI; Authors Guild vs. OpenAI; Zhang vs. Google; Kadrey, et. al. vs. Meta; Concord Music Group vs. Anthropic; Getty Images vs. Stability AI; and more.

Most of the cases are still in the courts. However, a federal judge in New York recently ruled in favor of OpenAI in a copyright-infringement suit by the media companies Raw Story and AlterNet, according to Reuters. While prevailing precedents aren’t set yet, the courts will ultimately give AI model companies, enterprises building on AI models and creators more clarity on the legality of using copyrighted data for AI training and outputs.

Here, we get answers from copyright attorneys and AI leaders on the core arguments around the gray fair-use law and how AI executives should navigate the business-defining legal issue.

AI and Fair-Use Law

What is Fair-Use Law?

Emily Poler, a partner at Poler Legal, stressed that there are “hundreds or thousands of interpretations and applications of the fair-use doctrine.” Accordingly, attorneys don't talk about "the fair-use law," because while it is codified, “it also springs from thousands of cases that apply the law to a huge range of situations.”

U.S. courts consider several factors to determine fair use, according to Poler: if the underlying copyrighted material is being used in a way that is "transformative"; the broad or narrow nature of the copyrighted material; how much of the underlying copyrighted material is being used; and the impact of the potential fair use on the market for the copyrighted work or the copyrighted work’s value.

Allowing Limited Uses

Fair use has “a simple definition that is complex to interpret,” said JD Harriman, a partner at Foundation Law Group. Fair use, he said, allows for “limited use of copyrighted material without the copyright owner's permission for certain purposes,” including commentary, criticism, news reporting, teaching, satire and parody.

Yet, there are “no hard rules for how much of a copyrighted work can be used,” and commercial use is restricted, said Harriman, who described fair-use debates as being “highly circumstantial.”

Defining What is a Creation

Fundamentally, fair use is “about the economics of incentives,” said Gerry Stegmaier, a partner in Reed Smith’s emerging technologies practice. Creators, he said, often seek to assert copyright claims aggressively, because they “fear that the market for their creative activity is threatened.”

Questions such as if there is a work, if it is sufficiently expressive for protection and if it was actually copied are all “predicates to evaluating whether the alleged use of that work should be deemed fair,” Stegmaier said.

Copyright law “rejected the idea” that the mere act to create content is sufficient for copyright protection, said Stegmaier, noting it is the “expression itself, the creative result, which is often the essence of whether copyright protection is available.” This is partly why the collection and use of facts in the public domain has “become so common.” Instead, he said, attorneys litigate over click-wrap agreements, paywalls and the “steps and techniques used to obtain the information.”

“The law continues to favor information being broadly available and easily accessible and furtherance of incentives to create information and share it,” Stegmaier said. “Fair use furthers those purposes.”

How Can Companies Using AI Protect Themselves Against Copyright Suits?

To prepare for potential copyright issues, AI executives should consider “relying on LLM providers” that are “committed to defending their customers,” said Raj Krishnan, a director at Microsoft. This “serves as a strong initial defense.”

When creating retrieval-augmented generation (RAG) systems or fine-tuning a model, companies should “ensure that the content used is either owned by you or you have permission to use it,” Krishnan said.

AI companies are collaborating with content creators to “prevent the controversial use of proprietary content,” and they’re working with the government to establish “clearer guidelines on whether using openly available content to train models constitutes fair use,” Krishnan said.

Get Rights and Licenses

Chief data officers and chief data and AI officers are primarily focused on “ensuring they have the legal rights to use data for modeling and decision-making, which takes precedence even over regulatory concerns at this time,” said Jack Berkowitz, CDO, Securiti.

AI companies want to continue innovating with AI, Berkowitz said, but they need to make sure they’re “responsibly using existing content and have the appropriate rights and licenses” to use content in “downstream systems,” such as the proper consumer or business user grants and licensing on third-party data. AI companies can reduce their copyright risks, he said, by ensuring clear licensing, maintaining transparency in data usage and adopting governance tools to track compliance.

Practice Data Provenance

Unlike traditional research or creative processes, most large language models (LLMs) lack the ability to provide clear references to sources, “complicating the application of fair-use law,” said Greg Benson, a professor of computer science at the University of San Francisco and chief scientist at SnapLogic. AI companies, he said, must then implement tools and processes to “track data provenance and validate outputs to mitigate risks.” Without these safeguards, AI companies “face a higher likelihood of legal disputes, as the line between fair use and infringement can be ambiguous in the context of generative AI.”

Use Content ID Systems and Perform Due Diligence

AI companies can help protect against using illegal content by implementing technical safeguards that “demonstrate your commitment to mitigating unauthorized use,” said Doug Stephen, president of the enterprise learning division at CGS. Content ID systems, he said, can effectively detect and filter out copyrighted material, and monitoring tools allow AI companies to “actively control the data ingested by your systems.”

With certain types of AI models, particularly open-source models, there are often “some underlying restrictions” on commercial use or the volume of commercial use, Stephen said. There is “increasing diligence and concern” by enterprises, he said, about their investment in AI tools and usage being compromised, depending on “how the models are built, the tech they use and the data on which they are trained.”

“The simple reality is that very few organizations have sufficient internal depth in all of these areas to manage all of the sources of risk effectively, but many are getting stronger very quickly and board engagement represents a key way to drive this change,” Stephen said.

‘Fit’ Content Into Fair Use

The best copyright protection for AI companies is to “not use copyrighted materials without a license from the copyright holder,” said Poler with Poler Legal. Otherwise, she said, companies creating AI platforms “need to fit their use of copyrighted materials into as many of the fair-use factors as possible.” For instance, an AI platform could follow several key steps, according to Poler: limit the amount of a copyrighted work ingested; use copyrighted materials in a way that is "transformative"; use materials where the scope of the copyright protection is fairly narrow; or take steps to limit the impact on the copyrighted material’s market or value.

Seek Indemnity

Companies using AI can also seek covenants, representations and warranties relating to non-infringement from AI service providers, with sufficient indemnities in the agreements, said Stegmaier with Reed Smith. This represents, he said, one of the “most common steps in large-scale B2B deals.”

“Offering copyright infringement indemnity is one of the best examples of an effort in AI by the leading GenAI purveyors,” Stegmaier said.

Fair-Use Law is Critical to Companies Creating and Using AI

Until it is confirmed that large language models comply with fair-use laws, the “uncertainty surrounding potential legal implications could pose risks to the AI industry,” said Krishnan with Microsoft.

“Questions — such as whether generating content that mimics an author's style falls under fair use, and if it does, whether you can claim copyright for AI-generated data — are complex and yet to be fully answered,” Krishnan said.

Establishing Derivative or Original Work

Generative AI technologies are highly effective at synthesizing information, but they “present unique challenges if some of the training content contains copyrighted content,” said Benson with USF and SnapLogic.

An LLM can generate a response that is partially derived from copyrighted material, Benson said. Industries and the courts, he said, are “still trying to determine if this is considered derivative work or original work.”

Determining the Fate of Business Models

Many AI platforms seem to have “pinned their hopes or their business models” on courts finding their use of copyrighted materials to be fair use, said Poler with Poler Legal. If courts start to find the platforms' use of copyrighted material isn't fair use, she said, the platforms are “going to have to pay license fees for the materials both retrospectively and prospectively.”

“This could have a huge impact on the platforms' profitability,” Poler said. “If AI companies start losing lawsuits, I would expect that some will be forced out of business, because of the damages from lawsuits and/or the increased costs associated with licensing copyrighted material. It's also possible that AI companies might have to destroy databases they created and used for training.”

If AI companies start to lose copyright lawsuits that deal with ingestion, there are “several paths” forward, said Harriman with Foundation Law Group. The AI companies could enter into licensing agreements with large content holders, such as publishers and stock photo companies. Another approach, he said, is to promote legislation to make ingestion a type of “compulsory mechanical license,” which means any AI has “the right to ingest” as long as they pay a statutory fee to the copyright holder.

Significant or frequent copyright infringement losses would likely have “a broad chilling effect on enterprise adoption of AI in a variety of different contexts,” said Stegmaier with Reed Smith. Since the law is complex and fact-dependent, he said, “a few significant losses might have a tremendous effect on certain types of adoption.”

Related Article: AI Copyright Infringement: People vs. Machines

AI Copyright Infringement Allegations by Content Creators

Many of the copyright claims against AI companies by content creators center on the fact that AI platforms “seem to have used and are continuing to use vast databases that contain many, many, many copyrighted works, and these works have been used without the copyright holders' permission,” said Poler with Poler Legal.

They Have Sole Right to Copy

A copyright owner has the “sole right” to make copies of a copyrighted work or make derivative works, said Harriman with Foundation Law Group. Typically, when an AI is learning, it ingests material. The ingestion, he said, might “make a temporary or permanent copy of the material,” which is a violation of the copyright owner’s rights. Relatedly, an AI that makes an image, video, audio or writing in the style of a copyright holder “might be considered to be making a derivative work of that copyrighted material.”

The Method and Extent of Copying Matters

With billions of dollars at stake, or entire ways of doing business threatened, “creators often believe they have much to gain and little to lose from aggressively seeking to fence what they believe they've created through blood, sweat and sometimes tears,” said Stegmaier with Reed Smith. Many of the cases “with legs,” he said, rely on allegations of copying entire bodies of works that are well-established as copyright protected.

Creators will rarely argue that the alleged copyist only infringed, Stegmaier said. A goal of those seeking to assert their copyright rights, he said, is often to "catch" the alleged copyist red-handed and illustrate how the conduct is “inherently unfair.” They will point to terms-of-use violations, near-term and long-term economic impacts of the activity and technical steps they took to protect the information. With personal information, they will argue that “stopping and punishing” the alleged copying is central to protecting the privacy rights of consumers.

“It is important to realize, however, that most of the time when things go to court it is because the party alleging infringement has been unable to force the alleged copyist or imitator to stop what they're doing or to make a sufficient payment to stop or get a paid license to continue,” Stegmaier said.

Related Article: AI Copyright Infringement Quandary: Generative AI on Trial

Fair-Use Law’s Protection of AI Technologies

Fair use gives technology companies “an argument that they should be able to use copyrighted materials without paying copyright holders,” said Poler with Poler Legal. Whether courts agree with the technology companies “remains to be seen.”

Successfully litigating copyright claims is “fact-intensive” and “very expensive,” especially if the AI producer has “undertaken careful steps to try to ensure that its use is in fact fair,” said Stegmaier with Reed Smith.

Learning Opportunities

Enabling Innovation

Most IP disputes, including copyright, “don't result in litigation,” Stegmaier said. When there is litigation, it is often “a tool to drive a business outcome.” Without fair use and the “broad construction of fair use,” he said, the threat of infringement liability, especially with the possibility of statutory damages, could “significantly stifle new innovation.” This is because companies that have done well financially and have “decades of future royalties and revenue at stake” have a significant incentive to “threaten or sue to protect that revenue.”

“It is often a difficult game of chicken, and for those accused of illegal copying, fair use often remains a critical aspect of their strategies in the face of those threats," Stegmaier said.

Questioning Originality

Works of fiction might be a good example of content that is copyright protected, Stegmaier said. However, in areas such as music and journalism, he said, those asserting fair use, such as AI companies, will often “challenge the sufficiency of creative expression or originality” as a basic element for copyright protection.

Harriman with Foundation Law Group added that fair-use law is protecting AI companies “exactly in the same way as it would protect any accused party.”

“There is no need to change fair-use law for AI companies,” Harriman said.

Related Article: Pioneering or Pirating? The Minefield of AI Ethics & Innovation

About the Author
Chris Ehrlich

Chris Ehrlich is the former editor in chief and a co-founder of VKTR. He's an award-winning journalist with over 20 years in content, covering AI, business and B2B technologies. His versatile reporting has appeared in over 20 media outlets. He's an author and holds a B.A. in English and political science from Denison University. Connect with Chris Ehrlich:

Main image: By CHUTTERSNAP.
Featured Research