Apple has been working for years to unify and enhance the power of artificial intelligence. With UniGen 1.5, Apple is now introducing an AI model that can understand, generate, and manipulate images, all within a single system. The goal is to combine several previously separate tasks into one model, thereby achieving more consistent and higher-quality results.
AI research at Apple follows a clear approach. Instead of developing a separate model for each individual task, Apple is increasingly relying on unified, multimodal systems. UniGen 1.5 is the logical next step in this strategy. The model builds on previous work and extends it to include image processing without abandoning the unified framework. With this, Apple demonstrates how image understanding can be actively used to improve image generation and processing.
Building upon the original UniGen
In May of last year, a team of Apple researchers published the study "UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation." It presented a large, multimodal language model capable of both understanding and generating images. Crucially, these two capabilities were not distributed across separate models but integrated into a single system.
With the new study "UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning," Apple directly builds upon this work. UniGen 1.5 extends the existing model to include image editing functions, thus fully covering image understanding, image generation, and image editing.
A unified model for three tasks
Combining these three capabilities is technically challenging. Image understanding and image generation require different approaches, while image editing additionally demands the precise grasp of often very specific instructions. Apple argues, however, that a unified model can strategically leverage its understanding capabilities to enhance the quality of both generation and editing.
Many existing models struggle with image editing. They often fail to fully grasp complex or subtle editing instructions. Changes affecting only small details or those described in great detail are particularly problematic. UniGen 1.5 aims to address precisely this issue.
Edit Instruction Alignment as an additional training step
To improve the understanding of edit instructions, Apple introduces a new step after supervised fine-tuning with UniGen 1.5. This step is called Edit Instruction Alignment.
According to Apple, after standard fine-tuning, the model still does not reliably process editing scenarios because its understanding of the instructions is insufficient. Edit Instruction Alignment is intended to close this gap. The model receives the source image and the editing instruction as input and is trained to predict a detailed text description of the desired target image.
Instead of immediately generating a processed image, the model first learns to verbally formulate the semantic content of the target image. This intermediate step helps UniGen 1.5 to better internalize the intended processing before the final image is generated. According to the test results, this approach significantly increases processing performance.
Reinforcement learning with unified rewards
After Edit Instruction Alignment, the researchers employ reinforcement learning. This is one of the key contributions of Apple's work. UniGen 1.5 uses the same reward system for both image generation and image processing.
Previously, this was difficult because image editing can take very different forms, ranging from minor corrections to comprehensive visual changes. By standardizing the rewards, Apple is able to optimize both tasks in a single training process. The model is rewarded for high-quality, instruction-compliant results and penalized for poorer output.
Benchmark results
In extensive tests, UniGen 1.5 performs very well. In benchmarks measuring instruction adherence, visual quality, and the ability to perform complex edits, the model achieves results that are at least comparable to or better than those of other modern multimodal systems.
In GenEval, UniGen 1.5 achieves a score of 0.89. In the DPG-Bench, the model reaches 86.83. This significantly surpasses current methods such as BAGEL and BLIP30.
UniGen 1.5 also demonstrates strong performance in image processing. In the ImgEdit benchmark, the model achieves a total score of 4.31. This places it above open-source models like OminiGen2 and comparable in performance to proprietary models like GPT-Image-1.
Known weaknesses and errors
Apple also clearly points out existing limitations in the study. UniGen 1.5 has problems rendering text in images under certain conditions. This is due to a lightweight, discrete detokenizer that struggles to control the fine structural details necessary for precise text generation.
Furthermore, identity consistency issues arise in some cases. Among other things, changes in a cat's fur texture and facial shape, or differences in the feather color of a bird, have been observed. These identity shifts indicate that UniGen 1.5 needs further improvement in this area.
UniGen 1.5 strengthens Apple's position in AI research
With UniGen 1.5, Apple takes an important step towards unified, multimodal AI systems. The model combines image understanding, image generation, and image processing in a single framework and utilizes new training strategies such as Edit Instruction Alignment and a shared reward system in reinforcement learning.
Despite its weaknesses, UniGen 1.5 demonstrates that a unified model can compete with, or even surpass, current open and proprietary solutions. Apple thus lays a solid foundation for further research and underscores its commitment to solving complex AI tasks holistically, rather than in a fragmented way. (Image: agsandrew / DepositPhotos.com)
- A macOS bug has caused Studio Display to flicker for months
- Apple plans to increase advertising in App Store search starting in 2026
- ChatGPT now supports Apple Music directly within the app
- Apple opens App Store in Japan and changes iOS rules
- Apple TV expands Monarch universe with new spin-off
- Apple introduces SHARP: 3D scenes from just one photo
- Apple stock: Morgan Stanley raises price target to $315
- The Trump administration is threatening the EU with retaliation over DMA
- MacBook Pro M5: Apple significantly simplifies battery replacement
- Apple and DMA: Why Europe's developers are protesting
- ChatGPT relies on Apple Music and faster image generation
- Apple plans fabric displays for HomePod and other devices
- Apple in focus: US criticism of Europe's new digital laws
- iOS 26.3 Beta 1: All new features and functions at a glance
- Apple can absorb rising DRAM costs better than others
- iOS 26 leak provides insight into upcoming Apple software plans
- Apple as a White House partner in the „Tech Force“ program



