Apple in AI lawsuit: Dispute over data and copyright

Apple is at the center of a new AI lawsuit concerning the use of copyrighted content to train artificial intelligence. The case is particularly relevant because, in addition to Apple, several leading technology companies are involved, raising fundamental questions about data collection and use in the field of AI.

The development of modern AI systems relies on massive amounts of data. This is precisely where a growing conflict arises between technology companies and copyright holders. While companies point to research, technical necessity, and complex datasets, publishers and authors see their works being used unlawfully. The current lawsuit illustrates this conflict in concrete terms and highlights how many legal aspects remain unresolved.

The lawsuit at a glance

According to Reuters, the publisher "Chicken Soup for the Soul" has filed a lawsuit in a federal court in the US state of California. The lawsuit targets several major technology companies: Apple, Google, Nvidia, Meta Platforms, OpenAI, Anthropic, Perplexity AI, and Elon Musk's company xAI.

The publisher accuses these companies of misusing its content to train their AI systems. Specifically, the claim is that books were used without permission to teach chatbots how to respond to human input.

Specific allegations of copyright infringement

The lawsuit alleges "deliberate theft." The companies are accused of illegally copying large quantities of copyrighted books. These copies were then allegedly used to develop, train, or optimize large language models.

According to the lawsuit, the works of thousands of authors are affected. These include bestselling authors, Pulitzer Prize winners, and well-known authors of both nonfiction and fiction. The accusation therefore encompasses not just individual pieces of content, but the widespread use of copyrighted works.

Using shadow libraries

A central point of the lawsuit is the origin of the data. The companies are alleged to have obtained content from so-called shadow libraries. Among those named are "The Pile," "LibGen," Z-Library, and Anna's Archive.

These platforms offer access to books without the consent of the copyright holders. According to the lawsuit, the companies downloaded, copied, analyzed, and integrated this content into their AI models to accelerate the development of their systems and gain a competitive advantage.

Apple and the "Apple Foundation Models"

Apple is explicitly mentioned in the lawsuit. Specifically, it concerns the so-called "Apple Foundation Models." According to the complaint, these models are based on datasets such as "The Pile" and "Books3."

This directly accuses Apple of having used the same problematic data sources as the other defendant companies. The wording of the lawsuit clearly places Apple in the same category as the other tech giants.

Previous classification by Apple

However, an important aspect is a previous statement by Apple regarding this very dataset. As early as 2024, "The Pile" was the subject of criticism in a different context, including in relation to potential training data from YouTube videos.

Apple stated at the time that this dataset was used solely for research purposes. Furthermore, the company clarified that "The Pile" was not used in models powering Apple Intelligence or other productive machine learning features.

This account directly contradicts the allegations in the current lawsuit.

Significance for the procedure

Whether this difference is legally decisive remains open. Further proceedings will have to clarify whether the data were indeed used solely for research purposes or whether they were employed in commercial AI systems.

This question is central to the assessment of the allegations. At the same time, it is unclear whether the use for research purposes alone can cause legal problems, or whether only the use in commercial products is decisive.

Apple and the AI lawsuit: Dispute over training data and copyright

The AI lawsuit against Apple and other large technology companies illustrates how controversial the handling of training data has become. At its core, the issue is whether technological development comes at the expense of intellectual property rights.

Apple denies using the criticized datasets in production systems, while the plaintiff claims exactly that. Which version prevails will be decided in court.

Regardless of the outcome, this case demonstrates that the legal framework for AI is not yet fully defined and the industry faces fundamental decisions. (Image: Shutterstock / WESTOCK PRODUCTIONS)