From Generated Human Videos to Physically Plausible Robot Trajectories

1UC Berkeley, 2New York University, 3Johannes Kepler University
* Equal contribution, † Equal advising


GenMimic enables zero-shot humanoid control from generated videos.

Abstract

Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot control. To realize this potential, a key research question remains open: how can a humanoid execute the human actions from generated videos in a zero-shot manner?

This challenge arises because generated videos are often noisy and exhibit morphological distortions that make direct imitation difficult compared to real video. To address this, we introduce a two-stage pipeline. First, we lift video pixels into a 4D human representation and then retarget to the humanoid morphology. Second, we propose GenMimic—a physics-aware reinforcement learning policy conditioned on 3D keypoints, and trained with symmetry regularization and keypoint-weighted tracking rewards. As a result, GenMimic can mimic human actions from noisy, generated videos. We curate GenMimicBench, a synthetic human-motion dataset generated using two video generation models across a spectrum of actions and contexts, establishing a benchmark for assessing zero-shot generalization and policy robustness. Extensive experiments demonstrate improvements over strong baselines in simulation and confirm coherent, physically stable motion tracking on a Unitree G1 humanoid robot without fine-tuning. This work offers a promising path to realizing the potential of video generation models as high-level policies for robot control.



Method Overview

GenMimicBench Examples

Given an initial frame and text prompt, we can generate videos of humans performing a wide range of actions.




BibTeX

@misc{ni2025generatedhumanvideosphysically,
  title={From Generated Human Videos to Physically Plausible Robot Trajectories}, 
  author={James Ni and Zekai Wang and Wei Lin and Amir Bar and Yann LeCun and Trevor Darrell and Jitendra Malik and Roei Herzig},
  year={2025},
  eprint={2512.05094},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2512.05094}, 
}