Getty Images sues OpenAI over AI training data

Getty Images filed a federal lawsuit against OpenAI

Getty Images, the stock photography company, sued OpenAI in federal court, alleging that OpenAI scraped millions of images from Getty's platform without permission or compensation to train its image and text AI models. The suit names OpenAI, Microsoft, and Stability AI as defendants and seeks damages on behalf of the class of copyright holders whose work was used.

Getty does not claim that OpenAI images are visible in the output. The allegation is that OpenAI copied Getty's images to build its training dataset, then deleted or concealed evidence of that copying. Getty argues this infringes on the exclusive right to reproduce and distribute copyrighted work.

OpenAI has not yet filed a public response. The company has previously argued that training on copyrighted material constitutes fair use under U.S. copyright law, a doctrine that permits limited use of copyrighted work for transformative purposes without permission.

The ruling will determine the legal foundation of large-scale AI training

This case sits at the intersection of two unresolved legal questions. First, whether copying entire copyrighted works into a training dataset, even if the original work does not appear in the model output, qualifies as fair use. Second, whether the commercial value and scale of the resulting model changes the analysis.

Getty's lawsuit is one of several filed against AI vendors by publishers, authors, and visual artists. A federal judge has already ruled that the New York Times lawsuit against OpenAI can proceed past early dismissal, signaling that courts view copyright-infringement claims against AI vendors as substantive rather than frivolous.

If Getty prevails, AI vendors face two paths: license training data retroactively (expensive) or retrain models on only licensed or public-domain material (slow and costly). If OpenAI prevails on fair use grounds, the precedent will shelter most AI training practices from copyright liability, though it may not resolve questions about confidential data or trade secret misuse.

Assume licensing will be required; plan accordingly

If you are procuring or building AI systems for regulated industries (healthcare, finance, legal), document the provenance of training data now. Courts will scrutinize vendors' ability to certify that training data was licensed or used with permission. Vendors who cannot produce licensing agreements or fair-use arguments will lose customer trust before they lose court cases.

For model builders, the safest path is to license training data or use only public-domain or synthetic datasets. This adds cost upfront but removes litigation risk and makes your model defensible in customer contracts.

Getty Images sues OpenAI over AI training data

Our Take

Why it matters

Do this week

Getty Images filed a federal lawsuit against OpenAI

The ruling will determine the legal foundation of large-scale AI training

Assume licensing will be required; plan accordingly

Related stories

Thomson Reuters Integrates DeepJudge Search Into CoCounsel Agent

Legal firms debate AI governance as LexisNexis convenes CTO panel July 9

Lilly and BioArctic team on brain-targeting drug delivery