[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1583":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":14,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":36,"lastSyncTime":37,"discoverSource":38},1583,"MultiWorld","CIntellifusion\u002FMultiWorld","CIntellifusion","Official Implementation of MultiWorld: Scalable Multi-Agent Multi-View Video World Models","https:\u002F\u002Fmulti-world.github.io\u002F",null,"Python",225,12,3,1,0,6,24,9,51.74,false,"main",true,[25,26,27,28,29,30,31,32],"action-conditioned","diffusion-models","game-generation","interactive-video","multi-agent","robotics","video-generation","worldmodel","2026-06-12 04:00:10","\n\u003Cdiv align=\"center\">\n\n\u003Ch1>MultiWorld: Scalable Multi-Agent Multi-View Video World Models \u003C\u002Fh1>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.18564\">\n\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-multiworld-darkred' alt='Paper PDF'>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002FMulti-World.github.io\u002F\">\n\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Website-orange' alt='Project Page'>\u003C\u002Fa>\n\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHaoyuwu\u002FMultiWorldData\">\n\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-Dataset-yellow?logo=huggingface' alt='HuggingFace Dataset'>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fmodelscope.cn\u002Fdatasets\u002FHaoyuWuRUC\u002FMultiWorldData\">\n\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Dataset-blue' alt='ModelScope Dataset'>\u003C\u002Fa>\n\n[Haoyu Wu](https:\u002F\u002Fcintellifusion.github.io\u002F)$^{1*}$, [Jiwen Yu](https:\u002F\u002Fyujiwen.github.io\u002F) $^{1}$, [Yingtian Zou](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=APA-glsAAAAJ&hl=en)$^{2}$, [Xihui Liu](https:\u002F\u002Fxh-liu.github.io\u002F) $^{1}†$\n\n$^1$ The University of Hong Kong $^2$ SReal AI\n\n(† Corresponding Author)\n\n\u003C\u002Fdiv>\n\n\n## 🎯 Overview \n\n![](assets\u002Fmain.png)\n\nWe present **MultiWorld**, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency.\n\nIn the Multi-Agent Condition Module (Sec. 3.2), Agent Identity Embedding and Adaptive Action Weighting are employed to achieve multi-agent controllability. In the Global State Encoder (Sec. 3.3), we use a frozen VGGT backbone to extract implicit 3D global environmental information from partial observations, thereby improving multi-view consistency. MultiWorld scales effectively across varying agent counts and camera views, supporting autoregressive inference to generate beyond the training context length (Sec. 3.4).\n\n\n## 🚀 News\n\n- [2026\u002F4\u002F21] Paper,code,data and project page are available. Welcome to have a try. \n\n\n## Setup Environments \n\n```shell\nconda create -n multiworld python=3.13 \nconda activate multiworld\n# install torch \npip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \\\n    --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n\npip install -r requirements.txt\n```\n\n### Dataset Download\n\nMultiWorld release contains two parts: **It Takes Two** game videos and **Robotics** videos.  \nAll `.tar` archives are stored flat in the same dataset repository.\n\n#### ModelScope Download\n\n```bash\nmodelscope login \u003CYOUR_API_KEY>\nmodelscope download --dataset HaoyuWuRUC\u002FMultiWorldData \\\n    --local_dir .\u002Fdata\nbash preprocess\u002Funtar_chunks.sh\n```\n\n#### HuggingFace Download\n\n```bash\nhf auth login\nhf download Haoyuwu\u002FMultiWorldData --repo-type dataset \\\n    --local-dir .\u002Fdata\nbash preprocess\u002Funtar_chunks.sh\n```\n\nAfter running `preprocess\u002Funtar_chunks.sh`, the archives are extracted to:\n- `data\u002Fittakestwo_release\u002F` — It Takes Two dataset\n- `data\u002Frobots_release\u002F` — Robotics dataset\n\n### Checkpoint Download \n\n```bash \nmodelscope login \u003CYOUR_API_KEY>\nmodelscope download --model HaoyuWuRUC\u002FMultiWorldCheckpoint \\\n    multiworld_480p_fulldata.safetensors --local_dir .\u002Fcheckpoints\nmodelscope download --model HaoyuWuRUC\u002FMultiWorldCheckpoint \\\n    multiworld_480p_toydata.safetensors --local_dir .\u002Fcheckpoints\nmodelscope download --model HaoyuWuRUC\u002FMultiWorldCheckpoint \\\n    multiworld_320p_robots.safetensors --local_dir .\u002Fcheckpoints\n```\n\n```bash\nhf auth login\nhf download Haoyuwu\u002FMultiWorldCheckpoint multiworld_480p_fulldata.safetensors --local-dir .\u002Fcheckpoints --repo-type model\nhf download Haoyuwu\u002FMultiWorldCheckpoint multiworld_480p_toydata.safetensors --local-dir .\u002Fcheckpoints --repo-type model\nhf download Haoyuwu\u002FMultiWorldCheckpoint multiworld_320p_robots.safetensors --local-dir .\u002Fcheckpoints --repo-type model\n```\n\n## Inference\n\nInference checkpoint trained on full dataset. \n```shell\npython -m torch.distributed.run --nproc_per_node=8 \\\n    ittakestwo\u002Fparallel_inference.py \\\n    --inference-seed 0 \\\n    --num-inference-steps 50 \\\n    --config-path ittakestwo\u002Fconfigs\u002Finference_480P_full.yaml \\\n    --model-path checkpoints\u002Fmultiworld_480p_fulldata.safetensors \\\n    --output-dir outputs\u002Feval_480P_full \n```\n\nInference checkpoint trained on toy dataset. \n\n```shell\npython -m torch.distributed.run --nproc_per_node=8 \\\n    ittakestwo\u002Fparallel_inference.py \\\n    --inference-seed 0 \\\n    --num-inference-steps 35 \\\n    --config-path ittakestwo\u002Fconfigs\u002Finference_480P_toy.yaml \\\n    --model-path checkpoints\u002Fmultiworld_480p_toydata.safetensors \\\n    --output-dir outputs\u002Feval_480P_toy\n```\n\n## Robotics\n\nInference on robotics dataset.\n\n```shell\npython -m torch.distributed.run --nproc_per_node=8 \\\n    robots\u002Fparallel_inference.py \\\n    --config-path robots\u002Fconfigs\u002Finference.yaml \\\n    --model-path checkpoints\u002Fmultiworld_320p_robots.safetensors \\\n    --output-dir outputs\u002Ftest_robotics_output \n```\n\n\n## Acknowledgements\nThis codebase is built on top of the open-source implementation of [DiffSynth-Studio](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fdiffsynth-studio), [VGGT](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fvggt) and the [Wan2.2](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.2) repo.\n\n\n## Contact\n\nWelcome to have a discussion on the project and Video World Models. You can find me at through `wuhaoyu556@connect.hku.hk`.\n\n\n## 📜 Citation\n\nIf you find our work useful for your research, please consider citing our paper:\n```\n@article{wu2025multiworld,\n  title={MultiWorld: Scalable Multi-Agent Multi-View Video World Models},\n  author={Wu, Haoyu and Yu, Jiwen and Zou, Yingtian and Liu, Xihui},\n  journal={arXiv preprint arXiv:2604.18564},\n  year={2026}\n}\n```\n\n","MultiWorld 是一个用于多智能体多视角视频世界建模的统一框架，旨在实现对多个智能体的精确控制并保持多视角一致性。该项目的核心功能包括通过Agent Identity Embedding和Adaptive Action Weighting来增强多智能体可控性，以及利用冻结的VGGT骨干网络从部分观察中提取隐式3D全局环境信息以提高多视角一致性。此外，MultiWorld支持自回归推理，能够生成超出训练上下文长度的内容，并且能有效扩展到不同的智能体数量和摄像机视角上。此项目适用于需要在复杂动态环境中进行多智能体协作与控制的研究场景，如游戏生成、机器人学等领域。",2,"2026-06-11 02:44:51","CREATED_QUERY"]