Back to news
NewsJune 23, 2026· 2 min read

Getty Images sues OpenAI over AI training data

Getty Images filed suit against OpenAI, accusing the company of scraping millions of images without permission to train its AI models. The case tests whether AI training on copyrighted work constitutes fair use.

Our Take

Getty's lawsuit doesn't claim OpenAI stole images; it claims OpenAI built a $80B company on unlicensed training data, and courts haven't yet ruled whether that's legal.

Why it matters

This case will likely reach the U.S. Supreme Court and determine whether large language models can legally train on copyrighted material. The outcome affects every AI vendor's cost structure and liability exposure.

Do this week

Enterprise AI leads: audit your training-data provenance and document licensing chain-of-title now, before this ruling forces retroactive compliance.

Getty Images filed a federal lawsuit against OpenAI

Getty Images, the stock photography company, sued OpenAI in federal court, alleging that OpenAI scraped millions of images from Getty's platform without permission or compensation to train its image and text AI models. The suit names OpenAI, Microsoft, and Stability AI as defendants and seeks damages on behalf of the class of copyright holders whose work was used.

Getty does not claim that OpenAI images are visible in the output. The allegation is that OpenAI copied Getty's images to build its training dataset, then deleted or concealed evidence of that copying. Getty argues this infringes on the exclusive right to reproduce and distribute copyrighted work.

OpenAI has not yet filed a public response. The company has previously argued that training on copyrighted material constitutes fair use under U.S. copyright law, a doctrine that permits limited use of copyrighted work for transformative purposes without permission.

The ruling will determine the legal foundation of large-scale AI training

This case sits at the intersection of two unresolved legal questions. First, whether copying entire copyrighted works into a training dataset, even if the original work does not appear in the model output, qualifies as fair use. Second, whether the commercial value and scale of the resulting model changes the analysis.

Getty's lawsuit is one of several filed against AI vendors by publishers, authors, and visual artists. A federal judge has already ruled that the New York Times lawsuit against OpenAI can proceed past early dismissal, signaling that courts view copyright-infringement claims against AI vendors as substantive rather than frivolous.

If Getty prevails, AI vendors face two paths: license training data retroactively (expensive) or retrain models on only licensed or public-domain material (slow and costly). If OpenAI prevails on fair use grounds, the precedent will shelter most AI training practices from copyright liability, though it may not resolve questions about confidential data or trade secret misuse.

Assume licensing will be required; plan accordingly

If you are procuring or building AI systems for regulated industries (healthcare, finance, legal), document the provenance of training data now. Courts will scrutinize vendors' ability to certify that training data was licensed or used with permission. Vendors who cannot produce licensing agreements or fair-use arguments will lose customer trust before they lose court cases.

For model builders, the safest path is to license training data or use only public-domain or synthetic datasets. This adds cost upfront but removes litigation risk and makes your model defensible in customer contracts.

#LLM#AI Ethics#Enterprise AI#Legal AI
Share:
Keep reading

Related stories