The development of powerful AI models requires enormous amounts of data – including text from books. But what happens when this data is copyrighted? This is precisely the subject of a new class-action lawsuit against Apple, just filed in the US. Two authors accuse the company of illegally using their work to train AI models. The lawsuit could set legal standards for the handling of copyrighted material in the AI field.
A lawsuit has been filed in federal court in Northern California accusing Apple of using books to train its own language models without the consent of the rights holders. The plaintiffs are authors Grady Hendrix and Jennifer Robertson. They represent a class of plaintiffs whose works are alleged to have been part of the data used. The allegations are based, among other things, on technical documents from Apple itself.
Books3 – the controversial dataset
At the center of the allegations is the Books3 dataset. This is a collection of books that, according to the lawsuit, were digitized and distributed without the permission of the rights holders – i.e., pirated copies. This dataset was used, among other things, in the RedPajama project, which in turn is cited as a data source in Apple's own publication on the OpenELM models. Apple published the OpenELM models the previous year on the Hugging Face platform and explained in detail in an accompanying paper how they were trained. This information shows that RedPajama – and thus indirectly Books3 – was part of the training material. The plaintiffs argue that Apple knowingly used copyrighted content to develop its own commercial AI model. They are particularly critical of the fact that Apple did not create or license the content itself, but instead relied on a data source that has long been criticized for copyright infringement.
OpenELM and Foundation Language Models in focus
The lawsuit concerns not only OpenELM, Apple's open-source language model, but also the company's so-called Foundation Language Models. According to the lawsuit, these proprietary models were also trained with Books3, which the authors consider a systematic infringement of intellectual property rights. The goal of the lawsuit is, among other things, the complete destruction of all AI models and datasets containing content from the affected authors. This is based on Section 503(b) of the U.S. Copyright Act, which allows for the destruction of illegally produced copies. In addition, the plaintiffs are seeking certification of the class action lawsuit, damages, restitution, a permanent injunction against Apple, and the reimbursement of all legal fees.
Legal situation unclear – comparison with other cases
The case against Apple is not the first of its kind. In recent months, several major AI companies have been sued on similar charges. The largest settlement to date was reached with Anthropic: The company paid $1.5 billion to settle a copyright dispute. The situation was different with Meta. Here, a lawsuit was dismissed because the court ruled that the use of copyrighted books for training AI falls under the so-called "fair use" rule. This allows certain uses of protected content without the consent of the rights holders, for example, for research or analysis purposes. The issue has also reached the political sphere. US President Donald Trump recently commented on the ongoing proceedings. He questioned the realistic prospect of acquiring licenses for every single text source, arguing that this would significantly slow the development of AI.
Significance of the case for Apple and the industry
The lawsuit against Apple has the potential to influence legal standards for dealing with copyrighted texts in the AI field. If the court rules in favor of the plaintiffs, numerous language models could become legally vulnerable – not just at Apple. At the same time, as one of the world's leading technology companies, Apple is under special scrutiny. For the authors affected, it's about protecting their creative work. For Apple, it's about the legal framework within which AI development may take place in the future. And for the entire industry, the fundamental question arises: Where does the line between technological progress and intellectual property lie? (Image: Shutterstock / iwonder TV)
- Apple Vision Pro between innovation pressure and niche success
- Apple stock: Bank of America raises price target to $260