Image Credits:Jaque Silva/SOPA Images/LightRocket / Getty Images4:44 PM PST · December 17, 2025
Like beauteous overmuch each different tech institution successful existence, Adobe has leaned heavy into AI implicit the past respective years. The bundle steadfast has launched a fig of antithetic AI services since 2023, including Firefly—its AI-powered media-generation suite. Now, however, the company’s full-throated clasp of the exertion whitethorn person led to trouble, arsenic a caller suit claims it utilized pirated books to bid 1 of its AI models.
A projected class-action suit filed connected behalf of Elizabeth Lyon, an writer from Oregon, claims that Adobe utilized pirated versions of galore books—including her own—to bid the company’s SlimLM program.
Adobe describes SlimLM arsenic a tiny connection exemplary bid that tin beryllium “optimized for papers assistance tasks connected mobile devices.” It states that SlimLM was pre-trained connected SlimPajama-627B, a “deduplicated, multi-corpora, open-source dataset” released by Cerebras successful June of 2023. Lyon, who has written a fig of guidebooks for non-fiction writing, says that immoderate of her works were included successful a pretraining dataset that Adobe had used.
Lyon’s lawsuit, which was primitively reported connected by Reuters, says that her penning was included successful a processed subset of a manipulated dataset that was the ground of Adobe’s program: “The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3),” the suit says. “Thus, due to the fact that it is simply a derivative transcript of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members.”
“Books3″—a immense collection of 191,000 books that person been utilized to bid genAI systems—has been an ongoing root of ineligible occupation for the tech community. RedPajama has besides been cited successful a fig of litigation cases. In September, a suit against Apple claimed the institution had utilized copyrighted worldly to train its Apple Intelligence model. The litigation mentioned the dataset and accused the tech institution of copying protected works “without consent and without recognition oregon compensation.” In October, a akin suit against Salesforce also claimed the institution had utilized RedPajama for grooming purposes.
Unfortunately for the tech industry, specified lawsuits have, by now, go somewhat commonplace. AI algorithms are trained connected monolithic datasets and, successful immoderate cases, those datasets person allegedly including pirated materials. In September, Anthropic agreed to wage $1.5 billion to a fig of authors who had sued it and accused it of utilizing pirated versions of their enactment to bid its chatbot, Claude. The lawsuit was considered a imaginable turning constituent successful the ongoing ineligible battles implicit copyrighted worldly successful AI grooming data, of which determination are many.
Lucas is simply a elder writer astatine TechCrunch, wherever helium covers artificial intelligence, user tech, and startups. He antecedently covered AI and cybersecurity astatine Gizmodo. You tin interaction Lucas by emailing lucas.ropek@techcrunch.com.















English (US) ·