Apple has published a new study demonstrating how Large Language Models can use audio and motion data to identify current activities. The research combines traditional sensor technology with AI and highlights the reliability of even brief or incomplete information. Apple is thus bringing to the forefront a topic that could enable relevant applications for fitness, health, and everyday life.
Many devices today collect audio and motion data, but this raw data alone is often insufficient to clearly identify activities. Apple's new study therefore investigates an approach that uses LLMs to draw precise inferences from text descriptions. Instead of directly analyzing audio or motion data, the models are given short texts previously generated by smaller audio models and an IMU model. This allows them to recognize what is happening without needing a specially trained multimodal model.
How Apple uses LLMs
The article , titled "Using LLMs for Late Multimodal Sensor Fusion for Activity Recognition," describes how Apple combines various information sources. LLMs receive text about sounds, movements, and class predictions and use this information to infer activity. This approach is less invasive because it doesn't access the actual audio recordings, but only descriptive text labels.
The researchers argue that this approach offers significant advantages. Even if sensors provide only limited data, the LLM can combine the information to create a much clearer picture. This saves memory and computing power because there is no need to train or deploy specially adapted multimodal models.
The dataset used is Ego4D
For the experiments, Apple used the Ego4D dataset. It contains many hours of video and audio material from a first-person perspective and covers everyday situations. From this material, Apple compiled a set of 20-second examples. Twelve activities were selected: vacuuming, cooking, doing laundry, eating, playing basketball, playing soccer, playing with pets, reading a book, sitting at the computer, washing dishes, watching television, and exercising or lifting weights.
This selection covers typical household, leisure, and sports activities and occurs frequently in the dataset. Audio descriptions, audio labels, and predictions from the IMU model were generated for each example.
How the LLMs were tested
The results were tested on two LLMs: Gemini 2.5 Pro and Qwen 32B. The researchers investigated two scenarios. In one, the models were given a list of twelve possible activities. In the second scenario, there was no predefined selection.
Even without specific training, the models achieved F1 values that were clearly above the random level. In zero-shot mode, they were already able to make meaningful classifications. With exactly one example for each activity, the accuracy increased even further. The study thus demonstrates that LLMs are very good at identifying the correct activity from text-based descriptions.
Why the results are relevant
Apple emphasizes that this type of latent heat fusion is particularly helpful when raw sensor data alone doesn't provide a clear picture. LLMs can bridge the gap between individual information sources, creating a comprehensive understanding that traditional models can't achieve without additional training data. This allows for improvements to health features, fitness analytics, and assistive systems without requiring large amounts of coordinated training data.
Apple also provides supplementary material, including segment IDs, timestamps, prompts, and one-shot examples used in the experiments. This openness makes it easier for researchers to replicate the results and build their own studies upon them.
How Apple meaningfully combines sensors and AI
The new study shows how Apple combines the strengths of sensors and AI. Learning Life Generators (LLMs) receive short text descriptions from audio and motion data, enabling them to reliably recognize various activities. The approach is efficient, flexible, and requires no extensive specialized training. Apple is thus providing important impetus for future applications related to health, exercise, and everyday life, while simultaneously opening up a field of research that can continue to grow thanks to the materials provided. (Image: Shutterstock / issaro prakalung)
- macOS 26.2: The three biggest new features at a glance
- ChatGPT Atlas Update brings new features to macOS
- The EU is putting an end to annoying cookie banners across the entire internet
- Apple presents the finalists of the App Store Awards 2025
- Cloudflare outage explained: How the worldwide problem occurred
- Gemini 3 sets new AI standards & changes Google's products
- WhatsApp is finally testing two accounts on a single iPhone
- Apple releases new firmware for key accessories
- WhatsApp vulnerability remained unprotected for eight years
- How Apple is creating new titanium components using 3D printing
- Apple releases the major podcast charts for 2025
- The iPhone 17 lifts Apple to its strongest level in China in years.
- Apple loses another key designer amidst ongoing changes
- F1 The Movie: How realistic is a sequel really?
- Apple wins long-running dispute over iPhone camera patents
- iOS 26.2 Beta 3: An overview of the most exciting new features
- iOS 26.2 opens iPhones in Japan to alternative assistants
- iPadOS 26.2 significantly improves Slide Over and Split View
- Apple lays the foundation for open assistant switching in iOS 26.2
- iOS 26.2 introduces 30 days of AirDrop access via codes.
- Apple emphasizes the strength of Apple Silicon on its anniversary
- Apple releases iOS 26.2 Beta 3: New testing phase underway
- Tim Cook could change roles instead of leaving Apple entirely.
- Apple expands Sneaky Sasquatch with a new sticker pack
- Tim Cook in focus: Apple tests market reaction to CEO change



