 580
                580
             2025-05-14
                2025-05-14
            
Apple’s Machine Learning team, in collaboration with researchers from Nanjing University and The Hong Kong University of Science and Technology, has announced an interesting 3D AI model called Matrix3D.
This so-called Large Photogrammetry Model is able to reconstruct 3D objects and scenes from just a few 2D photos, but with a big difference from current pipelines. Here’s why this is a big deal.
First things first: photogrammetry. It uses photographs to make measurements in order to create 3D models or maps. Currently, this process involves using different models for steps like pose estimation and depth prediction, which can lead to inefficiencies and errors.
Matrix3D simplifies this by doing it all in one go. It takes in images, camera parameters (such as angle and focal length), and depth data, and processes them using a unified architecture. This not only simplifies the workflow but also improves accuracy.
Even more interesting is how the model was trained. Researchers used a masked learning strategy, very similar to early Transformer-based AI systems that helped pave the way for the first versions of ChatGPT.
They randomly hid parts of the input data during the training process, which forced Matrix3D to basically learn how to fill in the gaps. This technique is key because it enables Matrix3D to train effectively even with smaller or incomplete datasets.
The results are impressive. With just three input images, Matrix3D can generate detailed 3D reconstructions of objects and even entire environments, which obviously could have very interesting applications for immersive headsets like the Apple Vision Pro.
The researchers made the source code for Matrix3D available on GitHub, and published their paper on arXiv. They also created a website where you can watch more sample videos and even interact with a few point cloud recreations of objects and environments.
Source: 9to5mac