NEWS
New Apple Study Teaches Robots How to Act by Watching First-Person Videos of Humans
307
2025-05-22
Posted by 3uTools

New Apple Study Teaches Robots How to Act by Watching First-Person Videos of Humans

 

In a new paper called “Humanoid Policy ∼ Human Policy,” Apple researchers propose an interesting way to train humanoid robots. And it involves wearing an Apple Vision Pro.

 

Robot see, robot do

 

The project is a collaboration between Apple, MIT, Carnegie Mellon, the University of Washington, and UC San Diego. It explores how first-person footage of people manipulating objects can be used to train general-purpose robot models.

 

In total, the researchers gathered over 25,000 human demonstrations and 1,500 robot demonstrations (a dataset they called PH2D), and fed them into a unified AI policy that could then control a real humanoid robot in the physical world.

 

As the authors explain:

 

  • Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive and requires expensive teleoperated data collection, which is difficult to scale.
  • This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embodiment training data for robot learning.

 

Their solution? Let humans show the way.

 

Cheaper, faster training

 

To collect the training data, the team developed an Apple Vision Pro app that captures video from the device’s bottom-left camera, and uses Apple’s ARKit to track 3D head and hand motion.

 

However, to explore a more affordable solution, they also 3D-printed a mount to attach a ZED Mini Stereo camera to other headsets, like the Meta Quest 3, offering similar 3D motion tracking at a lower cost.

 

New Apple Study Teaches Robots How to Act by Watching First-Person Videos of Humans

 

The result was a setup that let them record high-quality demonstrations in seconds, a pretty big improvement over traditional robot tele-op methods, which are slower, more expensive, and harder to scale.

 

And here’s one last interesting detail: since people move way faster than robots, the researchers slowed down the human demos by a factor of four during training, just enough for the robot to keep up without needing further adjustments.

 

The Human Action Transformer (HAT)

 

The key to the whole study is the HAT, a model trained on both human and robot demonstrations in a shared format.

 

Instead of splitting the data by source (humans v. robots), HAT learns a single policy that generalizes across both types of bodies, making the system more flexible and data-efficient.

 

In some tests, this shared training approach helped the robot handle more challenging tasks, including ones that it hadn’t seen before, when compared to more traditional methods.

 

New Apple Study Teaches Robots How to Act by Watching First-Person Videos of Humans

 

Overall, the study is pretty interesting and worth checking out if you are into robotics.

 

Is the idea of a house humanoid robot scary, exciting, or pointless to you? Let us know in the comments.

 

Source: 9to5mac

Related Articles
Apple Manufacturer Foxconn to Fully Replace Humans With Robots - Foxbots Apple's iPhone-destroying Robots Are 'operating' in California and Europe Apple Updating Swift Playgrounds to Control Drones, Robots, and Musical Instruments New Apple Lab Uses Robots to Rip Apart Devices for Recycling Materials Apple's Work on Robots: What We Know So Far Apple's New Research Study Is the Most Ambitious One Yet