[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-76164":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},76164,"minWM","shengshu-ai\u002FminWM","shengshu-ai","A Minimal and Elegant Framework & Tutorial for Real-Time Interactive World Models",null,"Python",569,8,11,5,0,19,33,452,57,93.36,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:20","# 🌍 minWM: The First Full-Stack Open-Source World Model Framework\n\n>  ***A full-stack framework and tutorial for newcomers, rather than a specific model.***\n\n**minWM** is our contribution to the world-model community: a **full-stack open-source framework** that walks you end-to-end through turning a bidirectional T2V foundation model into an action-conditioned video world model — with example data, runnable scripts, **Claude skills** capturing our hands-on experience, and **onboarding knowledge** for newcomers. We hope more researchers and developers join us in growing the community together.\n\n## 🎬 Demo\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F74d1f157-66a7-4f88-9f57-d76382ef55c2\n\n## 🔥 News\n\n- **2026-05-17** 🚀 We release **minWM** — the first full-stack open-source world model framework.\n\n## 📋 Table of Contents\n\n- [🎬 Demo](#-demo)\n- [🔥 News](#-news)\n- [✨ Why minWM?](#-why-minwm)\n  - [1. Full-Stack Framework](#1-full-stack-framework)\n  - [2. Multi-Backbone Support](#2-multi-backbone-support)\n  - [3. Multi-Condition Injection](#3-multi-condition-injection)\n  - [4. Claude Skills — Modify the Framework with an LLM Assistant](#4-claude-skills--modify-the-framework-with-an-llm-assistant)\n  - [5. Onboarding Knowledge — for Newcomers to World Models](#5-onboarding-knowledge--for-newcomers-to-world-models)\n- [🛠️ Installation](#️-installation)\n- [🧱 Model Checkpoints](#-model-checkpoints)\n- [🚀 Quick Start](#-quick-start)\n- [⚙️ Data & Training & Reproduction](#️-data--training--reproduction)\n- [📚 Citation](#-citation)\n- [Contact](#contact)\n- [🙏 Acknowledgements](#-acknowledgements)\n\n## ✨ Why minWM?\n\n### 1. Full-Stack Framework\n\nThe complete **data → training → inference** pipeline is open-sourced; every stage exposes input\u002Foutput checkpoints so you can stop, swap, or fork anywhere.\n\n**1.1 Data.** We walk you through how to construct training-ready datasets paired with camera poses, and the full data processing pipeline that turns them into latents.\n\n**1.2 Training.** Including FSDP + sequence parallelism, single-\u002Fmulti-node training, and the full distillation pipeline from a bidirectional diffusion model to a 4-step AR student:\n\n```\nPhase 1                            Phase 2 — Distillation to Causal Few-Step\n─────────────────────              ────────────────────────────────────────────\nBidirectional SFT      ──▶   Stage 1   Teacher Forcing AR Diffusion\n                             Stage 2a  Causal ODE  (proposed in [Causal Forcing](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.02214))\n                             Stage 2b  Causal CD   (proposed in [Causal Forcing++](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15141))\n                             Stage 3   Asymmetric DMD with Self Rollout\n                                                ▼\n                                         4-step real-time\n```\n\n**1.3 Inference.**\n\n- ✅ 4-step DMD inference for HY Action2V \u002F HY TI2V \u002F Wan Action2V, multi-GPU sequence parallelism, camera-trajectory control via pose strings (`\"a*4,w*8,s*7\"`) or JSON files\n- 🚧 Inference acceleration [to do]\n\n### 2. Multi-Backbone Support\n\nminWM supports two paths to arriving at an interactive world model.\n\n#### 2.1 From Scratch: Bidirectional T2V Foundation → Real-Time World Model\n\nThe HunyuanVideo 1.5 and Wan 2.1 lines walk through the full 4-stage pipeline — starting from a bidirectional T2V foundation model and ending at a 4-step autoregressive world model.\n\n\n| Backbone             | Architecture          | Params | Training       | Inference    |\n| -------------------- | --------------------- | ------ | -------------- | ------------ |\n| **HunyuanVideo 1.5** | MMDiT                 | 8 B    | ✅ all 4 stages | ✅ 4-step DMD |\n| **Wan 2.1**          | Cross-attention + DiT | 1.3 B  | ✅ all 4 stages | ✅ 4-step DMD |\n\n\nBoth lines share the same trainer \u002F loss \u002F dataset abstractions, so adding a third backbone is structurally a wrapper-and-config exercise.\n\n#### 2.2 Finetuning an Existing Video World Model 🚧 [to do]\n\nThe forthcoming `worldplay-finetune` entry will let you start from an already-trained video world model and adapt it to new conditions, scenes, or resolutions — without rerunning the 4-stage pipeline from scratch.\n\n### 3. Multi-Condition Injection\n\nWe aim to support both multiple condition types and multiple injection methods, mixable along either axis.\n\n#### 3.1 Supported Conditions\n\n- ✅ Camera pose\n- 🚧 Human pose [to do]\n\n#### 3.2 Supported Injection Methods\n\n- ✅ ProPE\n- 🚧 Latent concat [to do]\n- 🚧 Cross-attention [to do]\n\n### 4. Claude Skills — Modify the Framework with an LLM Assistant\n\n> 🚧 *In development.*\n\nWe are packaging our project experience across the CF \u002F CF++ pipeline as Claude skills, so that an LLM assistant can help users debug failures and integrate new models without reverse-engineering the whole repo.\n\n**Planned skills:**\n\n- 🐛 **`debug-world-model`** — collected failure modes from the training pipeline (loss NaN, frame-to-frame jitter, camera drift, memory attenuation, distillation collapse, …). Claude diagnoses likely root causes from your symptoms instead of guessing.\n- 🔌 **`integrate-new-backbone`** — step-by-step recipe for plugging a new video DiT into minWM, grounded in the HunyuanVideo and Wan reference integrations — e.g. *\"look at how HY does teacher forcing here, do the same for your model there\"*.\n\n### 5. Onboarding Knowledge — for Newcomers to World Models\n\n> 🚧 *In development.*\n\nA third Claude skill aimed at researchers entering the world-model space for the first time. Two parts:\n\n- 🎓 **Foundations** — the minimal background to follow the pipeline: Teacher Forcing for AR diffusion training and Causal Forcing & Causal Forcing++ for AR diffusion distillation.\n- 🪤 **Pitfalls** — the non-obvious mistakes we hit while building minWM, distilled so you don't repeat them.\n\nIntended audience: graduate students, independent researchers, and junior labs that want to enter the world-model space without spending three months reverse-engineering existing repos.\n\n## 🛠️ Installation\n\n```bash\nconda create -n minwm python=3.10 -y \nconda activate minwm\npip install -r requirements.txt\npip install flash-attn --no-build-isolation\nexport PYTHONPATH=\"$PWD\u002FHY15:$PWD\u002FWan21:$PWD\u002Fshared:$PYTHONPATH\"\n```\n\n## 🧱 Model Checkpoints\n\nAll weights live under `.\u002Fckpts\u002F` after download.\n\n\n| Checkpoint                                                                | Backbone | Stage                               | Use case                               | Download                                              |\n| ------------------------------------------------------------------------- | -------- | ----------------------------------- | -------------------------------------- | ----------------------------------------------------- |\n| `HunyuanVideo-1.5` (base)                                                 | HY 1.5   | —                                   | Required by both HY pipelines          | [HF](https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FHunyuanVideo-1.5) |\n| `Wan2.1-T2V-1.3B` (base)                                                  | Wan 2.1  | —                                   | Required by Wan pipeline               | [HF](https:\u002F\u002Fhuggingface.co\u002FWan-AI\u002FWan2.1-T2V-1.3B)   |\n| `HY15\u002FAction2V\u002Fbidirectional`                                             | HY 1.5   | Phase 1 SFT                         | Starting point for HY Action2V Phase 2 | [HF](https:\u002F\u002Fhuggingface.co\u002FMIN-Lab\u002FminWM)            |\n| `HY15\u002FAction2V\u002Far_diffusion_tf`                                           | HY 1.5   | Phase 2 Stage 1                     | Teacher Forcing AR diffusion           | [HF](https:\u002F\u002Fhuggingface.co\u002FMIN-Lab\u002FminWM)            |\n| `HY15\u002FAction2V\u002Fcausal_ode`                                                | HY 1.5   | Phase 2 Stage 2a (proposed in Causal Forcing)   | DMD initialization               | [HF](https:\u002F\u002Fhuggingface.co\u002FMIN-Lab\u002FminWM)            |\n| `HY15\u002FAction2V\u002Fcausal_cd`                                                 | HY 1.5   | Phase 2 Stage 2b (proposed in Causal Forcing++) | DMD initialization               | [HF](https:\u002F\u002Fhuggingface.co\u002FMIN-Lab\u002FminWM)            |\n| `HY15\u002FAction2V\u002Fdmd`                                                       | HY 1.5   | Phase 2 Stage 3                     | **4-step real-time inference**         | [HF](https:\u002F\u002Fhuggingface.co\u002FMIN-Lab\u002FminWM)            |\n| `HY15\u002FTI2V\u002F{bidirectional,ar_diffusion_tf,causal_ode,causal_cd,dmd}`      | HY 1.5   | Same 4 stages, TI2V variant         | TI2V pipeline                          | [HF](https:\u002F\u002Fhuggingface.co\u002FMIN-Lab\u002FminWM)            |\n| `Wan21\u002FAction2V\u002F{bidirectional,ar_diffusion_tf,causal_ode,causal_cd,dmd}` | Wan 2.1  | Same 4 stages                       | Wan pipeline                           | [HF](https:\u002F\u002Fhuggingface.co\u002FMIN-Lab\u002FminWM)            |\n\n\n## 🚀 Quick Start\n\n> The fastest path: install → download three DMD checkpoints → run three demo commands. Full reproduction (all 4 training stages × 3 model lines) is in [§ Data & Training & Reproduction](#️-training--reproduction).\n\n### 1. Download the demo checkpoints\n\n```bash\n# HY base + text\u002Fvision encoders (required by HY pipelines)\nhf download tencent\u002FHunyuanVideo-1.5 --local-dir .\u002Fckpts\u002FHunyuanVideo-1.5 \\\n    --include \"vae\u002F*\"  \"scheduler\u002F*\" \"transformer\u002F480p_i2v\u002F*\"\nhf download Qwen\u002FQwen2.5-VL-7B-Instruct --local-dir .\u002Fckpts\u002FHunyuanVideo-1.5\u002Ftext_encoder\u002Fllm\nhf download google\u002Fbyt5-small           --local-dir .\u002Fckpts\u002FHunyuanVideo-1.5\u002Ftext_encoder\u002Fbyt5-small\nmodelscope download --model AI-ModelScope\u002FGlyph-SDXL-v2 \\\n    --local_dir .\u002Fckpts\u002FHunyuanVideo-1.5\u002Ftext_encoder\u002FGlyph-SDXL-v2\nhf download black-forest-labs\u002FFLUX.1-Redux-dev \\\n    --local-dir .\u002Fckpts\u002FHunyuanVideo-1.5\u002Fvision_encoder\u002Fsiglip --token \u003Cyour_hf_token>\n\n# Wan base (T2V-1.3B)\nhf download Wan-AI\u002FWan2.1-T2V-1.3B --local-dir .\u002Fckpts\u002FWan2.1-T2V-1.3B \n\n# Code hardcodes the load path; create a symlink.\nmkdir -p Wan21\u002Fwan_models\nln -s \"$(realpath .\u002Fckpts\u002FWan2.1-T2V-1.3B)\" Wan21\u002Fwan_models\u002FWan2.1-T2V-1.3B\n\n# Three 4-step DMD checkpoints\n## HY Action2V (DMD, 4-step)\nhf download MIN-Lab\u002FminWM --local-dir .\u002Fckpts \\\n    --include \"HY15\u002FAction2V\u002Fdmd\u002F*\"\n\n## HY TI2V (DMD, 4-step)\nhf download MIN-Lab\u002FminWM --local-dir .\u002Fckpts \\\n    --include \"HY15\u002FTI2V\u002Fdmd\u002F*\"\n\n## Wan Action2V (DMD, 4-step)\nhf download MIN-Lab\u002FminWM --local-dir .\u002Fckpts \\\n    --include \"Wan21\u002FAction2V\u002Fdmd\u002F*\"\n```\n\n\n### 2. Run the three demos\n\n```bash\n# 2.1  HY Action2V (4-step DMD, camera control)\nTRANSFORMER_DIR=.\u002Fckpts\u002FHY15\u002FAction2V\u002Fdmd \\\nOUTPUT_DIR=.\u002Foutputs\u002Fquickstart_hy_action2v \\\n    bash HY15\u002Fscripts\u002Finference\u002Frun_infer_causal_camera.sh\n\n# 2.2  HY TI2V (4-step DMD)\nTRANSFORMER_DIR=.\u002Fckpts\u002FHY15\u002FTI2V\u002Fdmd \\\nOUTPUT_DIR=.\u002Foutputs\u002Fquickstart_hy_ti2v \\\n    bash HY15\u002Fscripts\u002Finference\u002Frun_infer_causal.sh\n\n# 2.3  Wan Action2V (4-step DMD, camera control)\nOUTPUT_FOLDER=.\u002Foutputs\u002Fquickstart_wan_action2v \\\nTRAJECTORY_PATH=\"Wan21\u002Fprompts\u002Ftrajectories.txt\" \\\n    bash Wan21\u002Fscripts\u002Finference\u002Frun_infer_causal_camera.sh\n```\n\n> **Camera control.** For HY Action2V, trajectories are read per-sample from `assets\u002Fexample.json` under the `\"trajectory\"` field. Format: `w\u002Fs\u002Fa\u002Fd` keys with `*N` repeats; comma-separated segments — e.g. `\"a*4,w*8,s*7\"`.\n\n## ⚙️ Data & Training & Reproduction\n\nThree model lines × two phases × four stages, each documented as **(1) Model download → (2) Data preparation → (3) Training script → (4) Validation**. Full reproduction guides are split by backbone:\n\n- 📘 [`training_hunyuan.md`](training_hunyuan.md) — **HY Action2V** + **HY TI2V** (HunyuanVideo 1.5 backbone)\n- 📗 [`training_wan.md`](training_wan.md) — **Wan Action2V** + **Wan T2V** (Wan 2.1 backbone)\n\n## 📚 Citation\n\nIf minWM helps your research, please cite:\n\n```bibtex\n@article{zhu2026causal,\n  title={Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation},\n  author={Zhu, Hongzhou and Zhao, Min and He, Guande and Su, Hang and Li, Chongxuan and Zhu, Jun},\n  journal={arXiv preprint arXiv:2602.02214},\n  year={2026}\n}\n\n@article{zhao2026causal,\n  title={Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation},\n  author={Zhao, Min and Zhu, Hongzhou and Zheng, Kaiwen and Zhou, Zihan and Yan, Bokai and Li, Xinyuan and Yang, Xiao and Li, Chongxuan and Zhu, Jun},\n  journal={arXiv preprint arXiv:2605.15141},\n  year={2026}\n}\n\n```\n\n## Contact\n\nFor questions, suggestions, or collaboration, please open a GitHub issue or contact: [gracezhao1997@gmail.com](mailto:gracezhao1997@gmail.com).\n\n## 🙏 Acknowledgements\n\nminWM stands on the shoulders of giants. We thank the authors and maintainers of [HunyuanVideo 1.5](https:\u002F\u002Fgithub.com\u002FTencent-Hunyuan\u002FHunyuanVideo-1.5), [HY-WorldPlay](https:\u002F\u002Fgithub.com\u002FTencent-Hunyuan\u002FHY-WorldPlay), [Wan 2.1](https:\u002F\u002Fgithub.com\u002FWan-AI\u002FWan), [Causal-Forcing](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FCausal-Forcing), and [FastVideo](https:\u002F\u002Fgithub.com\u002Fhao-ai-lab\u002FFastVideo) for their open-source contributions, which made this framework possible.\n","minWM 是一个用于实时交互世界模型的全栈开源框架与教程。它提供了一个从数据准备到训练再到推理的完整流程，支持多骨干网络和多条件注入，并且通过Claude技能让使用者能够借助大语言模型助手修改框架。此外，项目还特别为新手提供了入门知识，帮助他们更好地理解和使用世界模型。适用于希望快速上手并深入研究世界模型的研究者和开发者，尤其是在需要构建基于动作条件的视频世界模型时。",2,"2026-06-11 03:54:41","CREATED_QUERY"]