Our Take
Getty's lawsuit doesn't claim OpenAI stole images; it claims OpenAI built a $80B company on unlicensed training data, and courts haven't yet ruled whether that's legal.
Why it matters
This case will likely reach the U.S. Supreme Court and determine whether large language models can legally train on copyrighted material. The outcome affects every AI vendor's cost structure and liability exposure.
Do this week
Enterprise AI leads: audit your training-data provenance and document licensing chain-of-title now, before this ruling forces retroactive compliance.
Getty Images filed a federal lawsuit against OpenAI
Getty Images, the stock photography company, sued OpenAI in federal court, alleging that OpenAI scraped millions of images from Getty's platform without permission or compensation to train its image and text AI models. The suit names OpenAI, Microsoft, and Stability AI as defendants and seeks damages on behalf of the class of copyright holders whose work was used.
Getty does not claim that OpenAI images are visible in the output. The allegation is that OpenAI copied Getty's images to build its training dataset, then deleted or concealed evidence of that copying. Getty argues this infringes on the exclusive right to reproduce and distribute copyrighted work.
OpenAI has not yet filed a public response. The company has previously argued that training on copyrighted material constitutes fair use under U.S. copyright law, a doctrine that permits limited use of copyrighted work for transformative purposes without permission.
The ruling will determine the legal foundation of large-scale AI training
This case sits at the intersection of two unresolved legal questions. First, whether copying entire copyrighted works into a training dataset, even if the original work does not appear in the model output, qualifies as fair use. Second, whether the commercial value and scale of the resulting model changes the analysis.
Getty's lawsuit is one of several filed against AI vendors by publishers, authors, and visual artists. A federal judge has already ruled that the New York Times lawsuit against OpenAI can proceed past early dismissal, signaling that courts view copyright-infringement claims against AI vendors as substantive rather than frivolous.
If Getty prevails, AI vendors face two paths: license training data retroactively (expensive) or retrain models on only licensed or public-domain material (slow and costly). If OpenAI prevails on fair use grounds, the precedent will shelter most AI training practices from copyright liability, though it may not resolve questions about confidential data or trade secret misuse.
Assume licensing will be required; plan accordingly
If you are procuring or building AI systems for regulated industries (healthcare, finance, legal), document the provenance of training data now. Courts will scrutinize vendors' ability to certify that training data was licensed or used with permission. Vendors who cannot produce licensing agreements or fair-use arguments will lose customer trust before they lose court cases.
For model builders, the safest path is to license training data or use only public-domain or synthetic datasets. This adds cost upfront but removes litigation risk and makes your model defensible in customer contracts.