apple patient
  • Home
  • News
  • Rumors
  • Tips & Tricks
  • Tests & Experience Reports
  • Generally
No Result
View All Result
  • Home
  • News
  • Rumors
  • Tips & Tricks
  • Tests & Experience Reports
  • Generally
No Result
View All Result
apple patient
No Result
View All Result

Apple boosts LLM performance thanks to optimized M5 design

by Milan
November 20, 2025
Apple M5

Image: Apple

In its latest Machine Learning Research blog, Apple demonstrates the significant improvements made by the new M5 chip in executing local LLMs. In direct comparison to the M4, the chip achieves noticeably higher speeds. The focus is primarily on how quickly local language models generate the first token and how efficiently subsequent tokens are generated. The report provides clear metrics and explains why the M5 offers advantages in both areas.

To put these results into perspective, it's helpful to look at MLX. Apple released this framework a few years ago to make machine learning natively accessible on Apple Silicon. MLX is open source and built as an array framework. It's based on NumPy and utilizes Apple Silicon's unified memory architecture. This allows operations to seamlessly switch between the CPU and GPU without moving memory. MLX includes packages for neural networks, optimization, automatic differentiation, and graph optimization.

A key component is MLX LM. This package allows Hugging Face models to run locally, including text generation and fine-tuning. MLX LM supports quantization, which reduces the memory requirements of models and enables faster inference. This makes large models viable even on devices with less RAM. Apple bases its comparisons between the M4 and M5 on this.

Background information on MLX and MLX LM

MLX offers a flexible system that covers numerical simulations, scientific computations, and machine learning. For language models, MLX LM provides the appropriate tools. These allow large models to be loaded, executed, and fine-tuned, with quantization playing a key role. Quantization reduces both memory requirements and computational load, and accelerates inference.

MLX fully leverages the unified architecture of the Apple Silicon platform. Support for BF16, mixed-precision formats, and automatic differentiation ensures efficient model execution. The entire memory pool is available for running large models. This is crucial for LLMs, as subsequent tokens are memory-intensive to compute.

M5 compared to the M4

Apple tested several models to highlight the differences between the two chips. These include:

  • Qwen 1.7B in BF16
  • Qwen 8B in BF16
  • Qwen 8B in 4 bit quantization
  • Qwen 14B in 4 bit quantization
  • Qwen 30B as a Mixture of Experts model with 3B active parameters in 4 bits
  • GPT OSS 20B in MXFP4

All benchmarks used a prompt size of 4096 tokens. The measurement includes both the time to generate the first token and the speed at which 128 additional tokens are produced.

The results show that the M5 is significantly faster at generating the first token. This is due to the redesigned GPU with neural accelerators. These perform dedicated matrix multiplications, which are particularly common in LLMs. For subsequent tokens, however, memory bandwidth plays a more significant role. The M5 achieves 153 GB per second, while the M4 reaches 120 GB per second. This represents an increase of 28 percent. Overall, this results in a performance improvement of 19 to 27 percent when generating additional tokens.

A MacBook Pro with 24 GB of RAM can easily store both an 8B model in BF16 and a 30B MoE model in 4-bit. Inference remains below 18 GB and therefore runs stably on both architectures.

Image generation in comparison

Apple also measured image generation performance. Here, the difference is even more pronounced. The M5 performs these tasks more than 3.8 times faster than the M4. The higher bandwidth and optimized units of the new GPU are particularly beneficial in generative image processing.

M5 significantly increases the efficiency of local AI

The M5 shows clear performance gains compared to the M4 in the local execution of large language models. Through improved neural accelerators, higher memory bandwidth, and an optimized GPU, Apple increases the efficiency of LLM inference across the entire Apple Silicon platform. MLX and MLX LM play a central role in this, as they enable the execution of large models in the first place and further accelerate them through quantization.

The results demonstrate that Apple is specifically aligning its hardware with machine learning. LLMs run faster, require less waiting time for the first token, and benefit from a more stable memory connection. The M5 also clearly outperforms the M4 in image generation. This strengthens Apple's use of local AI and expands the possibilities across all devices with Apple Silicon. (Image: Apple)

  • Apple strengthens its presence in sports through Real Madrid content
  • iPhones can now send files to Android devices via AirDrop.
  • Apple Music names Tyler, The Creator Artist of the Year 2025
  • Apple achieves strong revenue per employee in tech comparison
  • Matter 1.5 brings new standards for cameras and sensors
  • Apple TV: Mysterious deletions raise new questions
  • TikTok introduces new AI controls for the "For You" feed
  • Apple TV will keep Friday Night Baseball free until 2028
  • iPhone Air: iFixit examines the 3D-printed USB-C unit
  • ChatGPT Atlas Update brings new features to macOS
  • The EU is putting an end to annoying cookie banners across the entire internet
  • Apple presents the finalists of the App Store Awards 2025
  • Cloudflare outage explained: How the worldwide problem occurred
  • Gemini 3 sets new AI standards & changes Google's products
  • WhatsApp is finally testing two accounts on a single iPhone
  • Apple releases new firmware for key accessories
  • WhatsApp vulnerability remained unprotected for eight years
  • How Apple is creating new titanium components using 3D printing
  • Apple releases the major podcast charts for 2025
  • The iPhone 17 lifts Apple to its strongest level in China in years.
  • Apple loses another key designer amidst ongoing changes
  • F1 The Movie: How realistic is a sequel really?
  • Apple wins long-running dispute over iPhone camera patents
Have you already visited our Amazon Storefront? There you'll find a hand-picked selection of various products for your iPhone and other devices – enjoy browsing !
This post contains affiliate links .
Add Apfelpatient to your Google News Feed. 
Was this article helpful?
YesNo
Tags: Apple SiliconDeveloperMac
Previous Post

iOS 26.2 changes Wi-Fi synchronization in the EU – here's why

Next Post

Apple and F1 are already exploring new avenues of cooperation

Next Post
Apple TV F1 Formula 1

Apple and F1 are already exploring new avenues of cooperation

ChatGPT group chats

ChatGPT group chats are now available worldwide

November 21, 2025
Google Nano Banana Pro

Google optimizes AI images and text through Nano Banana Pro

November 20, 2025
Apple TV F1 Formula 1

Apple and F1 are already exploring new avenues of cooperation

November 20, 2025

About APFELPATIENT

Welcome to your ultimate source for everything Apple - from the latest hardware like iPhone, iPad, Apple Watch, Mac, AirTags, HomePods, AirPods to the groundbreaking Apple Vision Pro and high-quality accessories. Dive deep into the world of Apple software with the latest updates and features for iOS, iPadOS, tvOS, watchOS, macOS and visionOS. In addition to comprehensive tips and tricks, we offer you the hottest rumors, the latest news and much more to keep you up to date. Selected gaming topics also find their place with us, always with a focus on how they enrich the Apple experience. Your interest in Apple and related technology is served here with plenty of expert knowledge and passion.

Legal

  • Imprint – About APFEPATIENT
  • Cookie Settings
  • Privacy Policy
  • Terms of Use

service

  • Partner Program
  • Netiquette – About APPLEPATIENT

RSS Feed

Follow Apfelpatient:
Facebook Instagram YouTube threads threads
Apfelpatient Logo

© 2025 Apfelpatient. All rights reserved. | Sitemap

No Result
View All Result
  • Home
  • News
  • Rumors
  • Tips & Tricks
  • Tests & Experience Reports
  • Generally

© 2025 Apfelpatient. All rights reserved. | Page Directory

Change language to Deutsch