Former OpenAI Researcher Stirs Fair-Use Debate

A researcher who worked for almost four years at OpenAI, including on ChatGPT, is questioning how generative AI companies train their models on copyrighted data and the fair-use legal defense.

Suchir Balaji has some in the AI community reacting online after his recent interview with The New York Times — which is suing OpenAI for copyright infringement — for the article "Former OpenAI Researcher Says the Company Broke Copyright Law" and his blog post "When Does Generative AI Qualify for Fair Use?"

Balaji started his career by working at OpenAI from 2020 to 2024. He lives in San Francisco and holds a B.A. in computer science from UC Berkeley.

In his post, Balaji compares legal fair-use factors with AI model training practices.

He claims the process of training a generative model "involves making copies of copyrighted data."

"If these copies are unauthorized, this could potentially be considered copyright infringement, depending on whether or not the specific use of the model qualifies as 'fair use,'" Balaji says.

Balaji claims the training inputs for a model are "full copies of copyrighted data, so the 'amount used' is the entirety of the copyrighted work."

AI products can then "create substitutes that compete with the data they're trained on," Balaji says in an X post.

On the wave of content licensing agreements signed by GenAI companies, Balaji says in the blog post "it’s unclear why these agreements would be signed if training on this data was fair use."

"Given the existence of a data licensing market, training on copyrighted data without a similar licensing agreement is also a type of market harm, because it deprives the copyright holder of a source of revenue," Balaji says.

From his perspective, Balaji closes by saying that "none" of the key legal factors seem to "weigh in favor of ChatGPT being a fair use of its training data."

OpenAI offers this response in the Times article:

Learning Opportunities

Webinar

Mar

Content Leaders Collective: Navigating Content Decisions at Scale

Discover how content leaders are modernizing content operations, avoiding costly missteps and preparing for scale and AI.

Webinar

On demand

Content Strategy Leaders Live: Scaling for Speed, Complexity and AI in High Tech

A candid roundtable on how high-tech leaders are rethinking content at scale.

Watch Now

Webinar

On demand

Do More with Less: Modernizing the Cloud Contact Center for 2026

Learn how to leverage cloud platforms without adding a single hire to personalize every customer interaction.

Watch Now

Webinar

Complex, internal combustion engine or fine clockwork.

On demand

Cut the Noise: Deploying AI That Actually Moves the Needle

Learn how to turn AI experimentation into concrete revenue operations.

Watch Now

Webinar

On demand

Ditch the Desk Phones: How Modern Teams Drive AI-First Communications

Find out how one team finally pulled the plug on a legacy phone system. And built something smarter.

Watch Now

Webinar

On demand

Rebrand. Migrate. Optimize. How to Do It All (Without Slowing Down)

Cresta leveled up site speed, design flexibility and marketer sanity (in record time). Find out how.

Watch Now

Webinar

Mar

Content Leaders Collective: Navigating Content Decisions at Scale

Discover how content leaders are modernizing content operations, avoiding costly missteps and preparing for scale and AI.

Webinar

On demand

Content Strategy Leaders Live: Scaling for Speed, Complexity and AI in High Tech

A candid roundtable on how high-tech leaders are rethinking content at scale.

Watch Now

Webinar

On demand

Do More with Less: Modernizing the Cloud Contact Center for 2026

Learn how to leverage cloud platforms without adding a single hire to personalize every customer interaction.

Watch Now

“We build our AI models using publicly available data, in a manner protected by fair use and related principles and supported by longstanding and widely accepted legal precedents. We view this principle as fair to creators, necessary for innovators and critical for U.S. competitiveness.”

On X, most of the comments on Balaji's post support his position.

Most of the professionals sharing the Times article on LinkedIn are asking if AI model training is a fair-use practice, and they're siding with creators.

See more: AI Copyright Infringement Quandary: Generative AI on Trial