Apple has been working with AI technologies for some time, but what the company has now unveiled together with research teams from China is something special. The new model is called Matrix3D and creates three-dimensional scenes from just three photos. This could fundamentally change several areas – such as virtual reality, 3D design, and the Apple Vision Pro headset.
Apple is launching Matrix3D, a new AI model that creates 3D reconstructions from a small number of 2D images. It's based on modern photogrammetry but works differently than previous approaches. Instead of using multiple models for individual processing steps, Matrix3D uses a consistent architecture. The technology was developed jointly with Nanjing University and the Hong Kong University of Science and Technology. The source code is publicly available.
What distinguishes Matrix3D from previous photogrammetry methods
Photogrammetry involves extracting measurement data from photographs to create 3D models or maps. In practice, this typically involves multiple specialized models—for example, to estimate camera position or calculate depth information. This division often leads to inaccuracies and complicates the entire process. Matrix3D takes a different approach. It uses a unified architecture that simultaneously processes images, camera parameters (such as angle and focal length), and depth data. This simplifies the workflow and reduces sources of error. At the same time, it increases the accuracy of reconstructions.
The training procedure of Matrix3D
A key aspect of Matrix3D is the way it was trained. The model uses a so-called masked learning strategy. During training, parts of the input data are randomly hidden. Matrix3D must independently fill in these gaps. Similar methods were also used in early Transformer models, such as those used in language models like ChatGPT. This training strategy allows Matrix3D to work with incomplete or small datasets. This is particularly useful when few photos are available. Despite this, it can still generate detailed 3D models.
What Matrix3D can do with just three images
The results are impressive. Matrix3D requires only three images to reconstruct complex objects or entire scenes in 3D. This opens up many new possibilities – especially for augmented and virtual reality applications. Its use in conjunction with the Apple Vision Pro headset, which is designed for immersive digital experiences, is particularly obvious.
Open Source: Apple publishes the code
Unusually for Apple, the Matrix3D source code has been published on GitHub . This means developers can try out the model for themselves or integrate it into their own projects. There's also a companion website where you can watch example videos and work with interactive point clouds. These point clouds represent reconstructed objects and environments.
Why Matrix3D is an important step for Apple
For Apple, Matrix3D represents another step toward integrating AI into everyday technologies. The model fits well with the strategy surrounding the Vision Pro headset, as it delivers content that can be directly used for immersive applications. At the same time, Apple demonstrates that AI is also playing an increasingly important role in areas such as photography, 3D design, and spatial visualization.
Matrix3D as a building block for Apple's mixed reality future?
Matrix3D is an AI model that can create 3D scenes from just three images—with high accuracy, efficient workflow, and innovative training. Apple developed the model together with Chinese universities and made it surprisingly openly accessible. The technology could play a central role in future applications, especially in the area of mixed reality. Anyone interested in AI, 3D models, or Apple's technological future should take a closer look at Matrix3D. (Image: Shutterstock / pio3)
- Apple Vision Pro: Why this headset is changing the market
- Apple Vision Pro: The biggest innovation of 2024
- Apple Vision Pro M5: A unique dilemma for Apple
- Apple Vision Pro 2: The right step thanks to AI and M5
- Apple Vision Pro: One year later – what’s next?