Apple AI tests solutions in parallel Apfelpatient

Apple researchers are introducing LaDiR, a framework that enables language models to explore multiple paths simultaneously when thinking – with impressive results in math and code.

Apple's AI research is once again attracting attention. A team from its in-house machine learning division, in collaboration with the University of California, San Diego, has unveiled a new framework called LaDiR. The core idea sounds simple and compelling: before the model responds, it should consider multiple solutions simultaneously. The result is significantly improved performance – especially for tasks that require genuine thought.

What's behind LaDiR

LaDiR stands for "Latent Diffusion Enhances LLMs for Text Reasoning" and combines two very different AI approaches. Classical language models like ChatGPT work autoregressively. They generate text word by word, with each new word building upon the previous one. Diffusion models, on the other hand, generate text in multiple parallel iterations, in which an initially random pattern is refined step by step into a coherent response.

The Apple team now proposes combining both methods. During the actual reasoning phase, LaDiR uses diffusion. Only when the model arrives at its final answer does it switch to the classic, autoregressive method. The best of both worlds in one system.

Importantly, LaDiR is not a standalone language model, but a framework that is built upon existing models. It doesn't change the model itself, but rather the way it approaches problems. This means the approach can theoretically be applied to many existing language models.

Multiple solutions simultaneously

The real strength of LaDiR lies in its parallelization. Instead of pursuing only one solution path, the framework runs multiple lines of reasoning simultaneously. Each of these paths executes its own diffusion routine and potentially arrives at a different solution.

The system incorporates a clever mechanism that prevents all paths from converging too early. The model is actively encouraged to explore different possibilities. Only when enough options are available does the system choose the best answer and display it.

To put it in human terms: LaDiR works a bit like a chess player who plays through several variations before each move, choosing the best one. This is slower than just checking one move, but leads to significantly better results.

Impressive results in three categories

Apple researchers tested LaDiR with two well-known language models: Meta's LLaMA 3.1 with 8 billion parameters for math and logic puzzles, and Qwen3-8B-Base for code generation. The results are impressive.

In mathematical tasks, LaDiR achieved higher accuracy than existing approaches. Particularly striking was the fact that the advantages were most evident in difficult problems not included in the training material. This is precisely where many AI models currently fail.

In code generation, LaDiR performed significantly better than the standard models on the HumanEval benchmark. The advantage was particularly noticeable with complex programming tasks – an area where language models often reach their limits.

In logic puzzles like the countdown game, LaDiR was able to explore more valid solutions than all comparable models. While a specialized AI trained specifically for this task was more accurate in individual attempts, LaDiR found the correct answer more reliably as soon as multiple attempts were possible.

What Apple's research strategy reveals

LaDiR is just one of many Apple AI papers published in recent months. Cupertino regularly releases research showing that the company is deeply involved in topics such as diffusion models, language understanding, and AI efficiency. In the past, Apple has applied diffusion models to areas such as protein folding and coding.

Apple's focus on efficiency is noteworthy. While OpenAI and Google primarily rely on ever-larger models, Apple is looking for ways to get more out of smaller ones. This aligns with its strategy of running AI functions directly on iPhones, iPads, and Macs – where processing power and energy consumption are crucial.

LaDiR fits seamlessly into this pattern. Instead of changing the model itself, it optimizes the way it thinks. The result: better answers without increased hardware requirements.

What this could mean for Apple Intelligence

It will be interesting to see if and when such research approaches find their way into Apple's own products. LaDiR is currently purely a research contribution, but Apple's research pipeline and Apple Intelligence product development are closely intertwined. Methods presented in a paper today could end up in iOS in two or three years.

It will be interesting to see whether Apple actually uses the LaDiR method for its own products. The connection to practical application is obvious: Better answers to math problems, coding tasks, and logic puzzles would be a clear advantage for the planned revamped Siri in iOS 27 – especially if Siri is to answer more complex queries without having to rely on external AI services.

The Apple-Google deal also plays a role here. Apple plans to power parts of Siri's personalized features with Google's Gemini models. In-house research approaches like LaDiR could help to get the most out of these external models without relinquishing control over the end-user experience.

Apple's AI profile continues to sharpen

With LaDiR, Apple is once again delivering a research contribution that demonstrates where the company's strengths lie in the world of AI. Not in the competition for the biggest models, but in the targeted improvement of existing systems. This is a low-key strategy, but it fits Apple's brand image – a company that makes headlines less than it does with the quality of its final products. If LaDiR or similar approaches eventually make it into iOS, it would be concrete proof that Apple's AI strategy isn't lagging behind – but simply taking a different path. (Image: Shutterstock / SWKStock)