[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80851":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":14,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":28,"discoverSource":29},80851,"PhyMotion","h6kplus\u002FPhyMotion","h6kplus","Official implementation of paper \"PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation\"","https:\u002F\u002Fphy-motion.github.io\u002F",null,"Python",38,5,2,0,1,43.03,"MIT License",false,"main",true,[23,24],"reinforcement-learning","video-generation","2026-06-12 04:01:30","# PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation\n\n* Authors: [Yidong Huang](https:\u002F\u002Fowenh-unc.github.io\u002F)\\*, [Zun Wang](https:\u002F\u002Fzunwang1.github.io\u002F)\\*, [Han Lin](https:\u002F\u002Fhl-hanlin.github.io\u002F), [Dong-Ki Kim](https:\u002F\u002Fdkkim93.github.io\u002F), [Shayegan Omidshafiei](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fshayegan\u002F), [Jaehong Yoon](https:\u002F\u002Fjaehong31.github.io\u002F), [Jaemin Cho](https:\u002F\u002Fj-min.io\u002F), [Yue Zhang](https:\u002F\u002Fzhangyuejoslin.github.io\u002F) and [Mohit Bansal](https:\u002F\u002Fwww.cs.unc.edu\u002F~mbansal\u002F) (UNC Chapel Hill, FieldAI, NTU Singapore, AI2, Johns Hopkins University)\n\n\\* Equal contribution.\n\n* [Project page](https:\u002F\u002Fphy-motion.github.io) · [Arxiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.14269) · [Model](https:\u002F\u002Fhuggingface.co\u002F6kplus\u002FPhyMotion-CausalForcing-1.3B) · [Dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002F6kplus\u002FPhyMotion-MotionX-Prompts)\n\nGenerating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has driven recent gains in general video quality, extending it to human motion remains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulated human motion, and often assign high scores to videos with floating bodies or physically implausible movements. To address this, we propose PhyMotion, a structured, fine-grained motion reward that grounds recovered 3D human trajectories in a physics simulator and evaluates motion quality along multiple dimensions of physical feasibility. Concretely, we recover SMPL body meshes from generated videos, retarget them onto a humanoid in the MuJoCo physics simulator, and evaluate the resulting motion along three axes: kinematic plausibility, contact and balance consistency, and dynamic feasibility. Each component provides a continuous and interpretable signal tied to a specific aspect of motion quality, allowing the reward to capture which aspects of motion are physically correct or violated. Experiments show that PhyMotion achieves stronger correlation with human judgments than existing reward formulations. These gains carry over to RL-based post-training, where optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive and bidirectional video generators under both automatic metrics and blind human evaluation (+68 Elo gain). Ablations show that the three axes provide complementary supervision signals, while the reward preserves overall video generation quality with only modest training overhead.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\".\u002Fassets\u002Fteaser.jpg\" alt=\"teaser image\"\u002F>\n\u003C\u002Fp>\n\n\n## Pretrained Checkpoints and Data\n\n| Asset | Hugging Face | Notes |\n|---|---|---|\n| **PhyMotion-CausalForcing-1.3B** LoRA  | [`6kplus\u002FPhyMotion-CausalForcing-1.3B`](https:\u002F\u002Fhuggingface.co\u002F6kplus\u002FPhyMotion-CausalForcing-1.3B) (model) | LoRA adapter for the Causal Forcing 1.3B base, post-trained with the PhyMotion reward. |\n| **MotionX prompts** (train 21,348 \u002F test 1,123) | [`6kplus\u002FPhyMotion-MotionX-Prompts`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002F6kplus\u002FPhyMotion-MotionX-Prompts) (dataset) | `train.txt` is used for RL rollout during post-training; `test.txt` is used for evaluation. |\n\nDownload both:\n\n```bash\n# LoRA adapter\nhuggingface-cli download 6kplus\u002FPhyMotion-CausalForcing-1.3B \\\n  --local-dir checkpoints\u002Fphymotion-causalforcing\n\n# Train + test prompt splits\nhuggingface-cli download 6kplus\u002FPhyMotion-MotionX-Prompts \\\n  --repo-type dataset --local-dir dataset\u002Fmotionx\n```\n\n\n## Environment Setup\n\n1. Create the Python environment and install dependencies. `requirements.txt` covers the full stack including MuJoCo 3.3.6 and SMPL-X — no separate steps needed.\n\n```bash\nconda create -n phymotion python=3.10 -y\nconda activate phymotion\npip install torch==2.6.0 torchvision==0.21.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\npip install -r requirements.txt\npip install flash-attn==2.7.4.post1 --no-build-isolation\n```\n\n\nQuick sanity check the env:\n\n```bash\npython -c \"import torch, flash_attn, mujoco, smplx; \\\nprint(f'torch={torch.__version__} cuda={torch.cuda.is_available()}, flash_attn={flash_attn.__version__}, mujoco={mujoco.__version__}')\"\n# Expected output:\n# torch=2.6.0+cu124 cuda=True, flash_attn=2.7.4.post1, mujoco=3.3.6\n```\n\n2. Install GVHMR. The reward calls GVHMR in-process to recover SMPL-X meshes from generated frames.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fzju3dv\u002FGVHMR.git ~\u002FGVHMR\nexport GVHMR_ROOT=~\u002FGVHMR\n```\n\nDownload the GVHMR checkpoint bundle (~9 GB) from HuggingFace:\n\n```bash\nfor ckpt in \\\n  gvhmr\u002Fgvhmr_siga24_release.ckpt \\\n  hmr2\u002Fepoch=10-step=25000.ckpt \\\n  vitpose\u002Fvitpose-h-multi-coco.pth \\\n  yolo\u002Fyolov8x.pt; do\n  huggingface-cli download camenduru\u002FGVHMR \"$ckpt\" \\\n    --local-dir $GVHMR_ROOT\u002Finputs\u002Fcheckpoints\ndone\n```\n\n**SMPL-X body models (required):** The GVHMR bundle does *not* include the SMPL-X body model files — these must be obtained separately.\n\n1. Register (free academic license) at https:\u002F\u002Fsmpl-x.is.tue.mpg.de\u002F and download the SMPL-X model zip.\n2. Extract and place the following three files:\n\n```\n$GVHMR_ROOT\u002Finputs\u002Fcheckpoints\u002Fbody_models\u002Fsmplx\u002FSMPLX_NEUTRAL.npz\n$GVHMR_ROOT\u002Finputs\u002Fcheckpoints\u002Fbody_models\u002Fsmplx\u002FSMPLX_MALE.npz\n$GVHMR_ROOT\u002Finputs\u002Fcheckpoints\u002Fbody_models\u002Fsmplx\u002FSMPLX_FEMALE.npz\n```\n\nThe training script and reward module read `GVHMR_ROOT` from the environment.\n\nAfter GVHMR's pip dependencies are resolved, pin scipy to avoid a numpy\u002Fufunc incompatibility:\n\n```bash\npip install --force-reinstall scipy==1.15.2\n```\n\nThe humanoid MJCF model used to retarget SMPL is bundled inside this repo\n(`astrolabe\u002Fscorers\u002Fvideo\u002F`), so no additional asset is required.\n\n3. Download the **Wan2.1 T2V-1.3B** base components (transformer config, VAE, and UMT5-XXL text encoder). ~17 GB.\n\n```bash\nhuggingface-cli download Wan-AI\u002FWan2.1-T2V-1.3B --local-dir wan_models\u002FWan2.1-T2V-1.3B\n```\n\n4. Download the **Causal Forcing 1.3B** sampler weights (the autoregressive distilled version of Wan2.1 T2V-1.3B). PhyMotion's RL post-training starts from this. (~5.3 GB)\n\n```bash\nhuggingface-cli download zhuhz22\u002FCausal-Forcing \\\n  chunkwise\u002Fcausal_forcing.pt \\\n  --local-dir checkpoints\u002Fcasualforcing\n# Result: checkpoints\u002Fcasualforcing\u002Fchunkwise\u002Fcausal_forcing.pt\n```\n\n5. (Optional) Download our pretrained PhyMotion-CausalForcing-1.3B LoRA + the MotionX prompt splits from Hugging Face:\n\n```bash\n# LoRA adapter (700 MB)\nhuggingface-cli download 6kplus\u002FPhyMotion-CausalForcing-1.3B \\\n  --local-dir checkpoints\u002Fphymotion-causalforcing\n\n# Prompt splits: train.txt (21,348) and test.txt (1,123)\nhuggingface-cli download 6kplus\u002FPhyMotion-MotionX-Prompts \\\n  --repo-type dataset --local-dir dataset\u002Fmotionx\n```\n\nTo train on your own prompt list instead, drop your one-prompt-per-line files at\n`dataset\u002Fmotionx\u002Ftrain.txt` and `dataset\u002Fmotionx\u002Ftest.txt`.\n\n\n\n\n## Stage 1: PhyMotion Reward\n\nThe reward grounds each generated video in a 3D body and scores it along three feasibility axes (kinematic, contact, dynamic). It is implemented as a single function in `astrolabe\u002Frewards.py`.\n\n| Axis | Sub-scores |\n|---|---|\n| **Kinematic** | joint velocity, joint acceleration, self-penetration |\n| **Contact**   | foot slip, ground penetration, foot float, balance |\n| **Dynamic**   | joint torque, ground reaction force, metabolic effort |\n\nThe final reward is the mean of the three axes. All feasibility code (joint-based kinematics and MuJoCo-based contact \u002F dynamics) lives in a single file: `astrolabe\u002Fscorers\u002Fvideo\u002Fsmpl_feasibility.py`.\n\nTo wire the reward into a config:\n\n```bash\nconfig.reward_fn = {\"phymotion_score\": 1.0}\n```\n\nTo combine with a perceptual reward (e.g. HPSv3) for balanced training:\n\n```bash\nconfig.reward_fn = {\n    \"phymotion_score\":   1.0,\n    \"video_hpsv3_local\": 1.0,\n}\n```\n\n\n## Stage 2: RL Post-Training\n\nLaunch RL post-training of Causal Forcing 1.3B with the PhyMotion reward.\n\n```bash\nexport GVHMR_ROOT=\u002Fpath\u002Fto\u002FGVHMR\ntorchrun --nproc_per_node=8 scripts\u002Ftrain_nft_wan.py \\\n  --config configs\u002Fnft_casual_forcing.py:casual_forcing_video_phymotion\n```\n\n* `nproc_per_node`: number of GPUs on a single node.\n\n* `--config`: a `\u003Cfile>:\u003Centry>` selector. The entry `casual_forcing_video_phymotion` uses the PhyMotion reward (see `configs\u002Fnft_casual_forcing.py` for other entries that mix in perceptual rewards).\n\nOutputs are written to `logs\u002Fnft\u002F\u003Cbase_model>\u002F\u003Crun_name>_\u003Ctimestamp>\u002F`:\n\n* `checkpoints\u002Fcheckpoint-\u003Cstep>\u002Flora\u002F` — PEFT LoRA adapter (rank 256 on `CausalWanAttentionBlock`).\n\n* `optimizer.pt`, `scaler.pt`, and W&B \u002F TensorBoard logs.\n\n\n## Stage 3: Inference\n\nRoll out a trained LoRA on a list of prompts.\n\n```bash\n# Using the released PhyMotion-CausalForcing-1.3B LoRA \ntorchrun --nproc_per_node=1 scripts\u002Finference_wan.py \\\n  --base_model checkpoints\u002Fcasualforcing\u002Fchunkwise\u002Fcausal_forcing.pt \\\n  --lora_path  checkpoints\u002Fphymotion-causalforcing \\\n  --prompt_file prompts\u002Fsample.txt \\\n  --output_dir outputs\u002Ftest \\\n  --num_frames 45 --height 480 --width 832 \\\n  --guidance_scale 3.0 \\\n  --denoising_steps \"1000,750,500,250\" \\\n  --num_frame_per_block 3 \\\n  --mixed_precision bf16 --seed 42\n```\n\nTo use your own freshly trained LoRA, point `--lora_path` at your checkpoint dir:\n\n```bash\n--lora_path  logs\u002Fnft\u002Fwan_casual_chunk\u002Fcasual_forcing_video_phymotion_\u003CTS>\u002Fcheckpoints\u002Fphymotion-causalforcing\n```\n\n* `--base_model`: path to the Causal Forcing 1.3B checkpoint.\n\n* `--lora_path`: a `checkpoint-\u003Cstep>\u002F` folder or its `lora\u002F` subdir.\n\n* `--prompt_file`: a one-prompt-per-line text file.\n\n* `--output_dir`: directory for the generated mp4s. Expect ~5 seconds per video on a single A100.\n\n\n## Hardware and Reference Runtimes\n\nOur reported numbers were produced on:\n\n* **Hardware**: 1 node with 8× NVIDIA A100 80 GB\n* **CUDA**: 12.4; **Python**: 3.10; **PyTorch**: 2.6.0; **flash-attn**: 2.7.4.post1.\n\nApproximate per-stage compute \u002F wall-clock:\n\n| Stage | Hardware | Wall clock |\n|---|---|---|\n| Stage 2 (RL post-training) | 8× A100 80 GB | ~60 hours for 330 steps (≈ 10 min\u002Fstep at batch 8) |\n| Stage 3 (inference, 45 frames @ 480×832) | 1× A100 \u002F RTX 4090 | ~5 seconds per video |\n| Stage 1 (reward, 1 video) | 1× A100 (GVHMR + MuJoCo) | ~3 seconds per video |\n\n\n## Acknowledgements\n\nThis codebase builds on several excellent open-source projects. We thank the authors and maintainers of [Astrolabe](https:\u002F\u002Fgithub.com\u002Ffranklinz233\u002FAstrolabe) for the RL \u002F reward-training infrastructure, [FastVideo](https:\u002F\u002Fgithub.com\u002Fhao-ai-lab\u002Ffastvideo) for efficient video generation and training utilities, and [GVHMR](https:\u002F\u002Fgithub.com\u002Fzju3dv\u002FGVHMR) for human mesh recovery used in our 3D motion reward pipeline. Their publicly released code made this work possible.\n\n## Citation\n\nIf you find this work useful, please consider citing:\n\n```bibtex\n@article{huang2026phymotion,\n  title={PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation},\n  author={Huang, Yidong and Wang, Zun and Lin, Han and Kim, Dong-Ki and Omidshafiei, Shayegan and Yoon, Jaehong and Cho, Jaemin and Zhang, Yue and Bansal, Mohit},\n  journal={arXiv preprint arXiv:2605.14269},\n  year={2026}\n}\n```\n\n","PhyMotion 是一个基于物理模拟的人体视频生成项目，旨在通过结构化的3D运动奖励提高生成视频中人体动作的真实感。其核心功能是利用SMPL模型从生成的视频中恢复出人体网格，并将其重定向到MuJoCo物理模拟器中的类人模型上，进而评估运动的质量，包括运动学合理性、接触和平衡一致性以及动态可行性。这种多维度的评估方法能够更准确地反映人体动作的真实性。PhyMotion 适合用于需要高度真实感的人体动作视频生成场景，如虚拟现实、游戏开发或电影特效制作等，同时也为强化学习在视频后处理中的应用提供了新的方向。","2026-06-11 04:02:33","CREATED_QUERY"]