Five models, distributed between iPhones, Apple servers, and third-party cloud hardware: At WWDC 2026, Apple unveiled the third generation of its Foundation Models. For the first time, one of the models does not run on Apple Silicon, but on Nvidia chips in Google's data centers – a break with a previously ironclad principle.
With the third generation of Apple Foundation Models, internally abbreviated as AFM, Apple is reorganizing the foundation of its entire AI strategy. Instead of a single model, the system now consists of five specialized models that, depending on the task, run on the device, on Apple's own servers, or on third-party infrastructure. This step is the provisional culmination of the realignment of Apple's AI future around Siri and Google's Gemini technology, which Apple has been preparing over the past few months. Only now is it becoming concretely clear how this architecture is structured in detail – and at what point Apple is abandoning its strict "everything on its own hardware" principle.
From the 2024 launch to the Google partnership
When Apple first introduced its Foundation Models 2024, the lineup consisted of two components: a language model with around three billion parameters that ran directly on the device, and a larger, server-based language model. The latter was tied to private cloud computing and ran on servers with Apple Silicon chips.
Private Cloud Compute, or PCC for short, was an ambitious project from the outset. It was designed to provide cloud-based AI capabilities while maintaining the same data privacy guarantees that users are accustomed to from processing directly on their devices. To achieve this, control over all the hardware was essential: PCC ran in Apple's own data centers on Apple Silicon servers, and the data privacy guarantees could be reviewed by independent security researchers.
However, Apple wasn't progressing quickly enough with its own AI ambitions. As a result, the company partnered with Google and is using its Gemini technology as the backbone of its new AI efforts. Apple emphasizes that this isn't simply Gemini on the iPhone, but rather models specifically modified for Apple based on the Gemini platform. Apple presented the results of this collaboration at WWDC 2026.
Five models, two of them on the device
The third generation of AFM comprises five models, distributed across three levels. Two run directly on the device, two on Apple's servers, and one on third-party hardware:
| Model | Where it's running | Task |
|---|---|---|
| AFM 3 Core | On the device | Basic model with three billion parameters, significantly improved quality |
| AFM 3 Core Advanced | On the device | Most powerful on-device model, natively multimodal, 20 billion parameters |
| AFM 3 Cloud | Apple Silicon Server | Server workhorse, optimized for speed, efficiency and performance |
| ADM 3 Cloud (Image) | Apple Silicon Server | Image generation and editing, among other things, powers Image Playground. |
| AFM 3 Cloud Pro | Nvidia GPUs on Google Cloud | The most demanding tasks, such as agent-based tool use and complex reasoning |
The "D" in the name AFM 3 Cloud (Image) stands for diffusion, the technology behind image generation. With the exception of AFM 3 Cloud Pro, all models were designed to run on Apple Silicon—either on the device itself or on Apple's servers. AFM 3 Cloud Pro is the exception, running on Nvidia GPUs hosted in Google Cloud. This was made possible because Apple extended its private cloud compute architecture to third-party infrastructure for the first time, reportedly without compromising security and privacy protections. The two most interesting models are AFM 3 Core Advanced and AFM 3 Cloud Pro.
AFM 3 Core Advanced: 20 billion parameters on the device
AFM 3 Core Advanced packs 20 billion parameters into a model that runs directly on the device—a remarkable feat, as most on-device models intended for the general public remain in the low single-digit billions. The model is also natively multimodal, enabling features like more expressive voices and more precise dictation that you'll notice immediately in everyday use. It's unlocked and optimized for Apple's most powerful Apple Silicon systems.
To make a 20-billion-parameter model function effectively on a device, Apple employs a so-called sparse architecture. Instead of keeping all 20 billion parameters active for every query, as in a dense architecture, the model activates only up to four billion parameters simultaneously, depending on the input. Conceptually, this is similar to the mixture-of-experts approach, but it is based on a technique developed by Apple itself, which the company described a year ago in the study "Instruction-Following Pruning for Large Language Models."
AFM 3 Cloud Pro and the opening of Private Cloud Compute
AFM 3 Cloud Pro is the model that runs on external infrastructure, representing a true break with Apple's previous approach. To deliver Gemini-based peak performance without compromising its data privacy commitment, Apple has, for the first time, extended its PCC architecture to hardware outside its own data centers. The extent of this move is evident in the security measures Apple has implemented in collaboration with Google, which form the same basis as the expansion of Private Cloud Compute to Google Cloud.
Apple explicitly does not rely solely on confidential computing techniques to defend against attacks via privileged access or side channels. Instead, the company includes every component—from firmware and host and guest operating systems to application code—in its trusted base, which is subject to guarantees of verifiable transparency and the absence of privileged access. To combat supply chain attacks, Apple maintains a cryptographically verifiable, extensible directory of all Google Cloud hardware within the PCC network. For particularly sensitive components, software attestation relies on at least two separate trust anchors from independent providers. The processing stack itself follows the same patterns as on Apple Silicon: Incoming data is initially processed in its own isolated process, shared software is recycled after a short time, and cryptographic keys reside in a separate, isolated environment.
How Apple trained the models
According to Apple's research blog, all five models initially shared a common foundation before being specialized for their respective architecture and use cases. This process added multimodal capabilities – such as understanding audio and images, processing long contexts, and generating high-quality visuals.
For training, Apple used a mix of publicly available information, third-party licensed or purchased data, open-source data, data collected specifically for studies, and synthetic data. The company emphasizes that neither user data nor interactions were included in the training. Furthermore, web publishers can opt out of the Foundation Models training.
What the tests show

To evaluate the third generation, Apple relied on extensive human assessments. Internal testers evaluated the models' responses in categories such as following instructions, accuracy, presentation, and image understanding. Where possible, the new models were pitted against their predecessors.
The comparisons included global English as well as other language groups to demonstrate consistent performance across international variants. For the dictation function, Apple directly compared AFM 3 Core Advanced to the existing dictation system and observed a consistent improvement across seven quality dimensions. For a more in-depth analysis, you can find the complete comparison data on Apple's Machine Learning Research Blog.
Apple's AI architecture between device and third-party cloud
The third generation of AFM makes two things clear: Apple is surprisingly venturing to integrate a great deal of processing directly into the device with AFM 3 Core Advanced, while pragmatically relying on Google's Gemini technology and third-party hardware for the most demanding tasks. The real challenge lies less in the models themselves than in the attempt to uphold the privacy promise of Private Cloud Compute even when the computation takes place in a third-party data center. Whether this balancing act will live up to Apple's promises in practice will become clear once the new Siri and the other features are widely rolled out.
The best products for you: Our Amazon Storefront offers a wide selection of accessories, including for HomeKit. (Image: Apple)
- What's New in the iOS 27 Photos App
- Siri AI is based on Gemini – but is not Gemini
- WWDC 2026: Apple's keynote breaks with an old tradition
- Why the DMA Stops Siri AI on iPhone and iPad
- Apfelpatient Weekly #9
- iPhone 18 Pro: All the rumors at a glance
- iPhone Ultra (iPhone Fold): All rumors and facts at a glance
- WWDC 2026: All expectations, rumors and hardware hopes at a glance
- Apple TV in June 2026: All the highlights at a glance
- Apfelpatient Weekly #8
- New Apple TV 4K: All the rumors about the update
- HomePod 3 & HomePod mini 2: All the rumors at a glance
- Apple Glasses: All the rumors at a glance
- Apfelpatient Weekly #7
- iPhone 2027: All the rumors about the anniversary model
- MacBook Ultra: All the rumors at a glance
- Apple Watch Ultra 4: All the rumors at a glance
- Apfelpatient Weekly #6
- Apple Watch Series 12: All the rumors at a glance
- Apfelpatient Weekly #5
- AirPods Ultra: All the rumors at a glance
- Apple TV in summer 2026: The strongest lineup yet
Frequently Asked Questions: Apple's Third Generation Foundation Models
Apple Foundation Models (AFM) are Apple's proprietary AI models that power Apple's intelligence features. The third generation was introduced at WWDC 2026 and consists of five specialized models.
Five: AFM 3 Core and AFM 3 Core Advanced run directly on the device, AFM 3 Cloud and AFM 3 Cloud (Image) run on Apple's servers, and AFM 3 Cloud Pro runs on third-party hardware.
The AFM 3 Cloud Pro is the only model of its generation that runs on Nvidia GPUs in the Google Cloud, instead of Apple Silicon chips. All other models run on Apple Silicon.
It delivers 20 billion parameters directly to the device – an unusually high number for an on-device model. A sparse architecture activates only up to four billion parameters simultaneously, depending on the request.
Apple has extended its private cloud compute architecture to third-party infrastructure for the first time and, according to its own statements, transferred the same protection mechanisms – including a verifiable hardware directory and several independent trust anchors.
No. Apple emphasizes that neither user data nor interactions were used in the training. A mix of public, licensed, open-source, custom-collected, and synthetic data was used; web publishers can object.
Diffusion is the technology behind image generation. This model powers, among other things, image editing tools and Image Playground.



