[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11362":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":13,"stars7d":12,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":16,"compositeScore":17,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":8,"pushedAt":8,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":24,"discoverSource":25},11362,"MocapAnything","phongdaot\u002FMocapAnything","phongdaot",null,"Python",271,27,8,2,0,54,6,4.34,false,"main",[],"2026-06-12 02:02:31","# MoCapAnything V2\n\n**End-to-End Motion Capture for Arbitrary Skeletons from Monocular Videos**\n\n[Project Page](https:\u002F\u002Fanimotionlab.github.io\u002FMoCapAnythingV2\u002F) · [Paper (arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.28130) · [Install & Run](RUN.md)\n\n> ⚠️ **Unofficial code release.** This repository is a reimplementation based on the paper — use as a reference, not a reproduction.\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fanimotionlab.github.io\u002FMoCapAnythingV2\u002F\" title=\"Click to watch the 90-second teaser on the project page\">\n    \u003Cimg src=\"assets\u002Fteaser_play.png\" width=\"92%\" alt=\"MoCapAnything V2 teaser — click to watch the video on the project page\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Csub>▶ Click the image to watch the 90-second teaser on the project page.\u003C\u002Fsub>\u003C\u002Fp>\n\n## Highlights\n\n- 🔗 **Fully end-to-end.** Video → Pose → Rotation jointly optimized — no analytical IK in the loop.\n- ⚓ **Reference-anchored rotation.** A single reference pose–rotation pair from the target asset defines the rotation coordinate system, turning pose-to-rotation into a well-constrained problem.\n- ⚡ **Mesh-free and fast.** Joints predicted directly from video, ~20× faster than mesh-based pipelines.\n\n## Pipeline\n\nThe V2 main model is **`video2pose2rot`** — a single end-to-end network that maps a video directly to BVH-ready joint rotations. Internally it composes two subtasks, `video2pose` and `pose2rot`, that share weights and are jointly fine-tuned; they can be run standalone (e.g. for ablations or debugging), but normal usage is the joint model. The V1 mesh-based pipeline (`video2mesh` + `mesh2pose`) is included as a baseline for comparison.\n\n| Stage | Role | Input | Output |\n| --- | --- | --- | --- |\n| **`video2pose2rot`** | **V2 — main end-to-end model** | Image sequence + reference | Joint rotations (BVH) |\n| &nbsp;&nbsp;↳ `video2pose` | V2 subtask (standalone-runnable) | Image sequence + reference pose | Joint positions |\n| &nbsp;&nbsp;↳ `pose2rot` | V2 subtask (standalone-runnable) | Joint positions + rest pose \u002F reference pose-rot pair | Joint rotations |\n| `video2mesh` | V1 baseline — mesh sequence (TripoSG) | Image sequence | Mesh (`.glb` \u002F latent) |\n| `mesh2pose` | V1 baseline — joints from per-frame meshes | Mesh sequence + reference pose | Joint positions |\n\nA reference frame from a matching species guides the per-species skeleton and scale, enabling generalization to unseen animals.\n\n## Install & Run\n\nEnvironment setup, dataset layout, training commands, and inference (including in-the-wild mode) live in **[RUN.md](RUN.md)**.\n\n## Citation\n\nIf you use this code, please consider cite:\n\n```bibtex\n@article{gong2026mocapanythingv2,\n  title   = {MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons},\n  author  = {Gong, Kehong and Wen, Zhengyu and Phong, Dao Thien and\n             Xu, Mingxi and He, Weixia and Wang, Qi and Zhang, Ning and\n             Li, Zhengyu and Hou, Guanli and Lian, Dongze and He, Xiaoyu and\n             Zhang, Mingyuan and Zhang, Hanwang},\n  journal = {arXiv preprint arXiv:2604.28130},\n  year    = {2026}\n}\n```\n\nIf you build on the V1 baselines, please also cite the underlying papers — `mesh2pose` is from **MoCapAnything (V1)**, and `video2mesh` is from **SWiT-4D**:\n\n```bibtex\n@article{gong2025mocapanything,\n  title   = {MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos},\n  author  = {Gong, Kehong and Wen, Zhengyu and He, Weixia and Xu, Mingxi and\n             Wang, Qi and Zhang, Ning and Li, Zhengyu and\n             Lian, Dongze and Zhao, Wei and He, Xiaoyu and Zhang, Mingyuan},\n  journal = {arXiv preprint arXiv:2512.10881},\n  year    = {2025}\n}\n\n@article{gong2025swit4d,\n  title   = {SWiT-4D: Sliding-Window Transformer for Lossless and Parameter-Free Temporal 4D Generation},\n  author  = {Gong, Kehong and Wen, Zhengyu and Xu, Mingxi and He, Weixia and Wang, Qi and\n             Zhang, Ning and Li, Zhengyu and Li, Chenbin and Lian, Dongze and\n             Zhao, Wei and He, Xiaoyu and Zhang, Mingyuan},\n  journal = {arXiv preprint arXiv:2512.10860},\n  year    = {2025}\n}\n```\n\n## License\n\nSee the repository for license information.\n","MoCapAnything V2 是一个从单目视频中捕捉任意骨骼动作的端到端系统。其核心功能包括直接从视频中预测关节位置和旋转角度，无需中间的IK计算步骤，支持基于参考姿态-旋转对定义的旋转坐标系，从而将姿态转换为旋转问题变得约束良好。此外，该系统不依赖于网格模型，处理速度比基于网格的方法快约20倍。适用于需要快速、准确地从视频中提取骨骼动画数据的各种场景，如动画制作、虚拟现实以及生物运动研究等。","2026-06-11 03:31:44","CREATED_QUERY"]