Artificial intelligence (AI) is the latest chapter in the information revolution, and Americans are adopting the technology far more quickly than earlier internet technologies. OpenAI launched ChatGPT on Nov. 30, 2022. By January 2023, the large language model (LLM) had more than 100 million monthly users. Claude, Gemini, and open-source models such as Meta’s Llama quickly followed. Other nations have entered the market, launching LLMs such as DeepSeek and Qwen (both based in China).

However, while Silicon Valley expands the frontiers of AI and works to keep the United States at the forefront of the technology, Hollywood and the commercial content industry threaten that progress. Specifically, authors, artists, and publishers are invoking copyright law and alleging that AI developers are stealing their material to train AI models. Developers, however, view this practice as fair use under American copyright law because their models are not designed to produce copies of expressive works.

This dispute is far-reaching, with more than 70 copyright cases currently pending against AI developers in the United States, and its resolution will shape America’s ability to maintain leadership in advancing this critical technology. 

Fair Use in the Age of AI

The crux of the issue is whether using copyrighted material to train an AI model constitutes fair use. American copyright law promotes freedom of expression and encourages innovation by allowing the unlicensed use of copyrighted materials under certain circumstances. When fair use disputes emerge, cases are evaluated individually in the court system.

One of the four primary factors the courts consider to determine fair use is whether the new use is “transformative,” that is, whether it creates something new that is not a substitute for the original work. Early court rulings suggest that AI training meets this criterion. In fact, the judge in a 2025 case between authors and AI developers went so far as to note that the technology behind AI models may be the “most transformative many of us will see in our lifetimes.”

Although LLMs are trained on massive datasets that include copyrighted works, the copyrighted material is not stored for end-user retrieval. Instead, it is broken down into statistical patterns and stored in mathematical models with trillions of parameters. As such, an AI model’s “knowledge” is a series of weights and biases that generate predictive values rather than returning specific material. Extracting copyrighted works is not the purpose of an AI model, nor is it easy to accomplish. In fact, regurgitating exact copies of original texts or images is something developers consider a bug, and they create guardrails to minimize such errors.

This is why it is essential to distinguish between inputs (training data) and outputs (user-generated products) when evaluating AI and fair use under copyright law. Using existing works to build a predictive model is a fundamentally transformative process that incorporates the content into statistical models rather than creating revenue-generating copies. Importantly, however, if a model is used in a way that generates infringing outputs, existing copyright law can fully address the issue, as the courts have a rich history of case law for adjudicating such disputes.

Market Dilution and the Limits of Copyright

Another factor considered when determining fair use is the effect on the potential market. Some copyright holders are focusing on this factor, claiming that AI outputs are creating harm through “market dilution.” But copyright protects a creator’s specific expressions—not relative market share or prices—a distinction that is particularly relevant as markets for creative content are undergoing fundamental changes as new content platforms gain popularity and alter the demand for content from legacy media outlets.

Rightsizing Copyright for the Future of AI

Copyright holders broadly do not view training AI models as fair use, and many are pressing for transparency and licensing fees when AI developers use their material to train models. But with hundreds of billions of web pages used by LLMs—many of which have unclear copyright ownership—tracing ownership and establishing licenses would not be feasible at scale.

Voluntary licensing agreements are emerging as one potential solution to the legal uncertainties surrounding AI training and copyright, particularly for content from well-known companies and platforms like the Associated Press, Reddit, and Shutterstock. This may be an effective approach for high-value content with clear ownership structures, but it is not scalable to the broader data requirements needed for AI training.

Protecting U.S. Leadership in AI

In the global race for leadership in next-generation AI, the United States remains the leader, but China is rapidly closing the gap. Unnecessary regulations or new compulsory licensing requirements that slow American AI development create genuine opportunities for foreign rivals. The courts can—and should—address legitimate concerns about AI models that generate infringing outputs. But the transformative use of copyrighted material for model training should continue unimpeded. This approach strikes an appropriate balance between protecting creators’ rights and securing the promise of AI innovation for the United States.