[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79911":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":13,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},79911,"smp","SUZ-tsinghua\u002Fsmp","SUZ-tsinghua",null,"Python",181,16,89,1,0,14,69,7,53.59,false,"master",true,[],"2026-06-12 04:01:25","# SMP — Score-Matching Motion Priors (reproduction)\n\nA reproduction of **SMP: Reusable Score-Matching Motion Priors for Physics-Based\nCharacter Control** (Mu et al., 2025) on the **Unitree G1** humanoid — the\noriginal MimicKit implementation does not include a G1 setup, so this repo ports\nthe method to G1 end to end (motion features, priors, tasks, and rewards).\n\nA small diffusion model (DDPM) is pretrained on motion windows; its **frozen\nscore** is then reused as an SDS-style *guidance reward* during PPO, so a policy\nlearns naturalistic motion for a downstream task without any per-task motion\nclip or adversarial discriminator.\n\nThis is a reproduction for a course project. It re-implements the SMP idea on top\nof [**mjlab**](https:\u002F\u002Fgithub.com\u002Fmujocolab\u002Fmjlab) (the `ManagerBasedRlEnv` and\n`mjlab.scripts.train` \u002F `play` entrypoints are reused). The original method and\nreference implementation are:\n\n- **Paper:** SMP, Mu et al. 2025 — [arXiv:2512.03028](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03028) · [project page](https:\u002F\u002Fyxmu.foo\u002Fsmp-page\u002F)\n- **Original code:** [`xbpeng\u002FMimicKit`](https:\u002F\u002Fgithub.com\u002Fxbpeng\u002FMimicKit) (see `docs\u002FREADME_SMP.md`)\n\n> The main intentional divergence from the original is the reward composition —\n> see [Reward design](#reward-design-task--smp) below.\n\n## Provided pretrained priors\n\nTo let you skip pretraining and run RL directly, **three pretrained diffusion\npriors are shipped** in `datasets\u002Fpretrain_ckpt\u002F`. Each task's env config already\npoints its `init_smp_state` event at the right one, so no setup is needed:\n\n| Checkpoint                       | Trained on            | Used by                          |\n| -------------------------------- | --------------------- | -------------------------------- |\n| `pretrained_loco.pt`             | walk \u002F jog \u002F run      | `Smp-Forward-G1`                 |\n| `pretrained_lafan_run.pt`        | LAFAN run subset      | `Smp-Steering-G1`, `Smp-Location-G1` |\n| `pretrained_getup_f2s2.pt`       | get-up (fall→stand)   | `Smp-Getup-G1`                   |\n\n## Setup\n\n[`uv`](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) is the canonical package manager; dependencies\n(including the pinned `mjlab` git rev) are locked in `uv.lock`.\n\n```bash\nuv sync\n```\n\n## Pipeline\n\n1. **Data processing** (CSV → windowed NPZ → normalization stats) — _TODO (docs pending)._\n2. **Diffusion pretraining** (DDPM ε-predictor on motion windows) — _TODO (docs pending)._\n   You can skip this entirely using the [shipped checkpoints](#provided-pretrained-priors).\n3. **RL** (PPO with the frozen prior as a guidance reward) — documented below.\n\n---\n\n## RL\n\nFour downstream tasks are registered with `mjlab.tasks.registry` (importing\n`smp.rl.tasks` self-registers them):\n\n| Task              | Demo | Description                              |\n| ----------------- | :--: | ---------------------------------------- |\n| `Smp-Forward-G1`  | \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FSUZ-tsinghua\u002Fsmp\u002Fassets\u002Fforward.gif\" width=\"200\"\u002F> | walk \u002F jog \u002F run at a commanded `+x` speed |\n| `Smp-Steering-G1` | \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FSUZ-tsinghua\u002Fsmp\u002Fassets\u002Fsteering.gif\" width=\"200\"\u002F> | track a commanded velocity + facing direction |\n| `Smp-Location-G1` | \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FSUZ-tsinghua\u002Fsmp\u002Fassets\u002Flocation.gif\" width=\"200\"\u002F> | walk to a world-frame xy goal |\n| `Smp-Getup-G1`    | \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FSUZ-tsinghua\u002Fsmp\u002Fassets\u002Fgetup.gif\" width=\"200\"\u002F> | stand up from a fallen pose |\n\n### Train \u002F play\n\n```bash\n# Train (checkpoints land under logs\u002F)\nuv run scripts\u002Ftrain.py Smp-Forward-G1 --env.scene.num-envs=4096\n\n# Play a trained policy from a W&B run\nuv run scripts\u002Fplay.py Smp-Forward-G1 --wandb-run-path \u003Corg>\u002F\u003Cproject>\u002F\u003Crun> --num-envs 4\n```\n\nSwap the task id for any of the four. Because the priors are shipped and already\nwired into each env config, no editing is required before training.\n\n### Reward design: `task × SMP`\n\nEvery task uses a single **multiplicative** reward term, `task_smp_product`:\n\n```\nr  =  ( Σᵢ wᵢ · taskᵢ(s) )  ×  r_smp(s)\n```\n\nwhere `r_smp = exp(−wₛ\u002F|K| · Σ_{i∈K} ‖ε̂_i − ε_i‖²)` is the SDS guidance reward\n(the frozen denoiser's ε-prediction error at a fixed set of diffusion timesteps\n`K`, per-timestep normalized).\n\nThis is the **key divergence from the original SMP \u002F MimicKit**, which combines\nthe two **additively** and balances them with separate weights\n(`task_reward_weight`, `smp_reward_weight`):\n\n```\n# original (additive):     r = task_reward_weight · task  +  smp_reward_weight · r_smp\n# here     (multiplicative): r = task · r_smp\n```\n\nWe want the policy to **complete the task _while_ keeping the SMP reward high** —\nwhich is exactly what a product expresses: it is large only when *both* factors\nare large, and collapses toward 0 if *either* drops. This makes reward tuning\n**easier and more robust**:\n\n- **No task-vs-prior weight to balance.** The additive form needs a\n  `task_reward_weight : smp_reward_weight` ratio whose sweet spot shifts per task\n  (and per training stage); the product removes that knob entirely.\n- **Neither term can be farmed alone.** Additively, a policy can max one term and\n  ignore the other — e.g. stand still looking natural (high prior, no task\n  progress) or lunge at the goal off-manifold (high task, low prior). With the\n  product both failure modes score ≈ 0, so the only way to earn reward is to do\n  the task *and* stay on the motion manifold.\n\nPer-task `taskᵢ` components (each weighted, summed, then gated by `r_smp`):\n\n- **Forward** — velocity tracking only: `exp(−s·‖v_cmd − v_xy‖²)`, zeroed when the\n  velocity projects backwards onto the target direction. Fixed `+x` heading,\n  commanded speed 0.5–5 m\u002Fs.\n- **Steering** — `0.5·` velocity tracking `+ 0.5·` facing alignment\n  `max(face_dir · heading, 0)`; randomized target direction + facing, speed 0.5–2 m\u002Fs.\n- **Location** — position tracking only: `exp(−s·‖xy_goal − xy_robot‖)` toward a\n  periodically resampled world-frame goal (uses `ws=4`).\n- **Get-up** — `0.7·` upward head velocity `+ 0.3·` head-height tracking, each\n  `exp(−s·max(target − ·, 0)²)`, from a fallen GSI start.\n\n### Generative State Initialization (GSI)\n\nOn every reset, an init state is drawn from a pool of windows pre-sampled from the\nfrozen prior; its last frame seeds the sim state and the whole window primes the\nonline feature buffer, so `r_smp` is meaningful from step 0. Each env is reset to\nits own scene origin while the feature buffer is kept **env-origin-relative**, so\nthe guidance reward is invariant to where the env sits in the world grid.\n\n### Motion features\n\nThe guidance reward scores a rolling window of motion features rebuilt online by\n`smp.rl.utils.MotionFeatureBuffer`, matching the pretraining layout (59-dim\u002Fframe\nfor G1), anchored to the last frame's yaw-only local frame:\n\n```\n[root_pos(3), root_rot(6), joint_pos(29), ee_pos(15), root_lin_vel(3), root_ang_vel(3)]\n```\n\n## Citation & acknowledgements\n\nThis repository reproduces SMP; please cite the original work and credit the\nreference implementation:\n\n- **SMP** — Mu et al., *Reusable Score-Matching Motion Priors for Physics-Based Character Control*, 2025. [arXiv:2512.03028](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03028)\n- **MimicKit** — the original SMP implementation: \u003Chttps:\u002F\u002Fgithub.com\u002Fxbpeng\u002FMimicKit>\n- **mjlab** — RL environment backbone: \u003Chttps:\u002F\u002Fgithub.com\u002Fmujocolab\u002Fmjlab>\n","SMP项目是一个基于Score-Matching Motion Priors的物理基础角色控制方法在Unitree G1人形机器人上的复现。其核心功能包括使用小规模扩散模型（DDPM）预训练运动窗口，并将冻结的分数作为PPO中的引导奖励，使策略能够学习自然的运动而无需针对每个任务的特定动作片段或对抗性判别器。该项目适合于需要通过强化学习让机器人执行如行走、转向等自然动作的研究和开发场景。技术实现上，它基于mjlab框架，提供了三个预训练好的扩散先验模型，可直接用于强化学习阶段，简化了实验流程。",2,"2026-06-11 03:58:29","CREATED_QUERY"]