[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74066":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":30,"readmeContent":31,"aiSummary":32,"trendingCount":16,"starSnapshotCount":16,"syncStatus":33,"lastSyncTime":34,"discoverSource":35},74066,"lingbot-world","Robbyant\u002Flingbot-world","Robbyant","Advancing Open-source World Models","https:\u002F\u002Ftechnology.robbyant.com\u002Flingbot-world",null,"Python",3890,348,44,25,0,29,52,188,87,109.63,"Apache License 2.0",false,"main",[26,27,5,28,29],"aigc","image-to-video","video-generation","world-models","2026-06-12 04:01:13","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fteaser.png\">\n\n\u003Ch1>LingBot-World: Advancing Open-source World Models\u003C\u002Fh1>\n\nRobbyant Team\n\n\u003C\u002Fdiv>\n\n\n\u003Cdiv align=\"center\">\n\n[![Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%8C%90%20Project%20Page-Demo-00bfff)](https:\u002F\u002Ftechnology.robbyant.com\u002Flingbot-world)\n[![Tech Report](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Paper&message=PDF&color=red&logo=arxiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.20540)\n[![Model](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=%F0%9F%A4%97%20Model&message=HuggingFace&color=yellow)](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Frobbyant\u002Flingbot-world)\n[![Model](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=%F0%9F%A4%96%20Model&message=ModelScope&color=purple)](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FRobbyant\u002Flingbot-world-base-cam)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache--2.0-green)](LICENSE.txt)\n\n\n\u003C\u002Fdiv>\n\n-----\n\nWe are excited to introduce **LingBot-World**, an open-sourced world simulator stemming from video generation. Positioned\nas a top-tier world model, LingBot-World offers the following features. \n- **High-Fidelity & Diverse Environments**: It maintains high fidelity and robust dynamics in a broad spectrum of environments, including realism, scientific contexts, cartoon styles, and beyond. \n- **Long-Term Memory & Consistency**: It enables a minute-level horizon while preserving contextual consistency over time, which is also known as long-term memory. \n- **Real-Time Interactivity & Open Access**: It supports real-time interactivity, achieving a latency of under 1 second when producing 16 frames per second. We provide public access to the code and model in an effort to narrow the divide between open-source and closed-source technologies. We believe our release will empower the community with practical applications across areas like content creation, gaming, and robot learning.\n\n## 🎬 Video Demo\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fea4a7a8d-5d9e-4ccf-96e7-02f93797116e\" width=\"100%\" poster=\"\"> \u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n## 🔥 News\n- Apr 10, 2026: 🎉 We release user-friendly scripts for the **LingBot-World-Base (Act)**.\n- Apr 7, 2026: 🎉 We release the **LingBot-World-Fast** inference scripts.\n- Apr 2, 2026: 🎉 We release the **LingBot-World-Fast** model weights.\n- Mar 2, 2026: 🎉 We release the **LingBot-World-Base (Act)** model weights.\n- Jan 29, 2026: 🎉 We release the technical report, code, and models for LingBot-World.\n\n\u003C!-- ## 🔖 Introduction of LingBot-World\nWe present **LingBot-World**, an **open-sourced** world simulator stemming from video generation. Positioned\nas a top-tier world model, LingBot-World offers the following features. \n- It maintains high fidelity and robust dynamics in a broad spectrum of environments, including realism, scientific contexts, cartoon styles, and beyond. \n- It enables a minute-level horizon while preserving contextual consistency over time, which is also known as **long-term memory**. \n- It supports real-time interactivity, achieving a latency of under 1 second when producing 16 frames per second. We provide public access to the code and model in an effort to narrow the divide between open-source and closed-source technologies. We believe our release will empower the community with practical applications across areas like content creation, gaming, and robot learning. -->\n\n## ⚙️ Quick Start\nThis codebase is built upon [Wan2.2](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.2). Please refer to their documentation for installation instructions.\n### Installation\nClone the repo:\n```sh\ngit clone https:\u002F\u002Fgithub.com\u002Frobbyant\u002Flingbot-world.git\ncd lingbot-world\n```\nInstall dependencies:\n```sh\n# Ensure torch >= 2.4.0\npip install -r requirements.txt\n```\nInstall [`flash_attn`](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention):\n```sh\npip install flash-attn --no-build-isolation\n```\n### Model Download\n\n| Model | Control Signals | Resolution | Download Links |\n| :---  | :--- | :--- | :--- |\n| **LingBot-World-Base (Cam)** | Camera Poses | 480P & 720P | 🤗 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Frobbyant\u002Flingbot-world-base-cam) 🤖 [ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FRobbyant\u002Flingbot-world-base-cam) |\n| **LingBot-World-Base (Act)** | Actions | 480P & 720P | 🤗 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Frobbyant\u002Flingbot-world-base-cam) |\n| **LingBot-World-Fast**       | Camera Poses | 480P & 720P | 🤗 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Frobbyant\u002Flingbot-world-fast)  |\n\nDownload models using huggingface-cli:\n```sh\npip install \"huggingface_hub[cli]\"\nhuggingface-cli download robbyant\u002Flingbot-world-base-cam --local-dir .\u002Flingbot-world-base-cam\n```\nDownload models using modelscope-cli:\n ```sh\npip install modelscope\nmodelscope download robbyant\u002Flingbot-world-base-cam --local_dir .\u002Flingbot-world-base-cam\n```\n### Inference\nBefore running inference, you need to prepare:\n- Input image\n- Text prompt\n- Control signals (optional, can be generated from a video using [ViPE](https:\u002F\u002Fgithub.com\u002Fnv-tlabs\u002Fvipe))\n  - `intrinsics.npy`: Shape `[num_frames, 4]`, where the 4 values represent `[fx, fy, cx, cy]`\n  - `poses.npy`: Shape `[num_frames, 4, 4]`, where each `[4, 4]` represents a transformation matrix in OpenCV coordinates\n\nWe provide the following reference inference scripts:\n- `LingBot-World-Base (Cam)`:\n  - 480P:\n  ``` sh\n  torchrun --nproc_per_node=8 generate.py --task i2v-A14B --size 480*832 --ckpt_dir lingbot-world-base-cam --image examples\u002F00\u002Fimage.jpg --action_path examples\u002F00 --dit_fsdp --t5_fsdp --ulysses_size 8 --frame_num 161 --prompt \"The video presents a soaring journey through a fantasy jungle. The wind whips past the rider's blue hands gripping the reins, causing the leather straps to vibrate. The ancient gothic castle approaches steadily, its stone details becoming clearer against the backdrop of floating islands and distant waterfalls.\"\n  ```\n  - 720P:\n  ``` sh\n  torchrun --nproc_per_node=8 generate.py --task i2v-A14B --size 720*1280 --ckpt_dir lingbot-world-base-cam --image examples\u002F00\u002Fimage.jpg --action_path examples\u002F00 --dit_fsdp --t5_fsdp --ulysses_size 8 --frame_num 161 --prompt \"The video presents a soaring journey through a fantasy jungle. The wind whips past the rider's blue hands gripping the reins, causing the leather straps to vibrate. The ancient gothic castle approaches steadily, its stone details becoming clearer against the backdrop of floating islands and distant waterfalls.\"\n  ```\n  Alternatively, you can run inference without control signals:\n  ``` sh\n  torchrun --nproc_per_node=8 generate.py --task i2v-A14B --size 480*832 --ckpt_dir lingbot-world-base-cam --image examples\u002F00\u002Fimage.jpg --dit_fsdp --t5_fsdp --ulysses_size 8 --frame_num 161 --prompt \"The video presents a soaring journey through a fantasy jungle. The wind whips past the rider's blue hands gripping the reins, causing the leather straps to vibrate. The ancient gothic castle approaches steadily, its stone details becoming clearer against the backdrop of floating islands and distant waterfalls.\"\n  ```\n- `LingBot-World-Base (Act)`:\n  - 480P:\n  ``` sh\n  torchrun --nproc_per_node=8 generate.py --task i2v-A14B --size 480*832 --ckpt_dir lingbot-world-base-cam --image examples\u002F05\u002Fimage.jpg --action_path examples\u002F05 --allow_act2cam --sample_steps 20 --dit_fsdp --t5_fsdp --ulysses_size 8 --frame_num 121 --prompt \"The video presents a soaring journey through a fantasy jungle. The wind whips past the rider's blue hands gripping the reins, causing the leather straps to vibrate. The ancient gothic castle approaches steadily, its stone details becoming clearer against the backdrop of floating islands and distant waterfalls.\"\n  ```\n  - 480P with **user-friendly** action string control:\n  ``` sh\n  torchrun --nproc_per_node=8 generate.py --task i2v-A14B --size 480*832 --ckpt_dir lingbot-world-base-cam --image examples\u002F05\u002Fimage.jpg --action_path examples\u002F05 --action_string \"w-10,a-10,d-10,iw-15,none-10,j-10,l-10,s-15\" --allow_act2cam --sample_steps 20 --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt \"The video presents a soaring journey through a fantasy jungle. The wind whips past the rider's blue hands gripping the reins, causing the leather straps to vibrate. The ancient gothic castle approaches steadily, its stone details becoming clearer against the backdrop of floating islands and distant waterfalls.\"\n  ```\nTips:\nIf you have sufficient CUDA memory, you may increase the `frame_num` parameter to a value such as 961 to generate a one-minute video at 16 FPS. Otherwise if the CUDA memory is not sufficient, you may use ``--t5_cpu`` to decrease the memory usage.\n\n### Fast Inference\nWe provide `generate_fast.py` for accelerated causal inference with KV caching, which processes video frames chunk-by-chunk instead of all at once:\n\nDownload models using huggingface-cli. (If you have not already downloaded `lingbot-world-base-cam`, please download it first.)\n```sh\nhuggingface-cli download robbyant\u002Flingbot-world-fast --local-dir .\u002Flingbot-world-base-cam\u002Flingbot_world_fast\n```\n\n- `LingBot-World-Fast` — 480P, multi-GPU:\n  ``` sh\n  torchrun --nproc_per_node=8 generate_fast.py --task i2v-A14B --size 480*832 --ckpt_dir lingbot-world-base-cam --image examples\u002F03\u002Fimage.jpg --action_path examples\u002F03 --dit_fsdp --t5_fsdp --ulysses_size 8 --frame_num 81 --prompt \"A serene lakeside scene with a lone tree standing in calm water, surrounded by distant snow-capped mountains under a bright blue sky with drifting white clouds — gentle ripples reflect the tree and sky, creating a tranquil, meditative atmosphere.\"\n  ```\n\nYou can also use the provided `run_fast.sh` script:\n``` sh\nbash run_fast.sh \u003Cweights_dir> \u003Cframe_num>\n# e.g. bash run_fast.sh lingbot-world-base-cam 201\n```\n\n### Quantized Model for Limited GPU Resources\nWe sincerely thank the community for their valuable support and contributions in LingBot-World. For users with limited GPU memory, we recommend using a **4-bit quantized version** of LingBot-World-Base (Cam), which significantly reduces GPU memory consumption while maintaining competitive visual quality for inference.\n\n👉 Download link: https:\u002F\u002Fhuggingface.co\u002Fcahlen\u002Flingbot-world-base-cam-nf4\n\n> ⚠️ Note: This quantized model is intended **for inference only**. Minor degradation in visual fidelity and temporal consistency may occur compared to the full-precision model.\n\n### 🎬 Demo Results\n\n#### ⚡ Real-Time Interactive Demo Videos (Lingbot-World-Fast)\n\n> These videos showcase **Lingbot-World-Fast** responding to user inputs and rendering results in real time.\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F79b99272-5258-43b4-a466-8f3ac966fb8f\" width=\"100%\" poster=\"\">\u003C\u002Fvideo>\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fbfcdaecd-0a48-4a9f-bfdd-3f8e11b40227\" width=\"100%\" poster=\"\">\u003C\u002Fvideo>\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F5e231315-cde0-478d-9567-f8e92e877fdd\" width=\"100%\" poster=\"\">\u003C\u002Fvideo>\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F3941fed7-42dc-40cc-a72a-adf6fab37f28\" width=\"100%\" poster=\"\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n#### 🔍 Comparison Demo Videos (Lingbot-World-Base, Camera Pose Version)\n\n> Camera parameters are estimated by [ViPE](https:\u002F\u002Fgithub.com\u002Fnv-tlabs\u002Fvipe) from original videos downloaded from [Genie3](https:\u002F\u002Fdeepmind.google\u002Fblog\u002Fgenie-3-a-new-frontier-for-world-models\u002F).\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ffc95ee9e-e8a9-4f70-9aa2-9536c8365ccc\" width=\"100%\" poster=\"\">\u003C\u002Fvideo>\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fbac89021-b394-4f68-a688-9a0b90e30241\" width=\"100%\" poster=\"\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n## 📚 Related Projects\n- [HoloCine](https:\u002F\u002Fholo-cine.github.io\u002F)\n- [Ditto](https:\u002F\u002Fezioby.github.io\u002FDitto_page\u002F)\n- [WorldCanvas](https:\u002F\u002Fworldcanvas.github.io\u002F)\n- [RewardForcing](https:\u002F\u002Freward-forcing.github.io\u002F)\n- [CoDeF](https:\u002F\u002Fqiuyu96.github.io\u002FCoDeF\u002F)\n\n## 📜 License\nThis project is licensed under the Apache 2.0 License. Please refer to the [LICENSE file](LICENSE.txt) for the full text, including details on rights and restrictions.\n\n## ✨ Acknowledgement\nWe would like to express our gratitude to the Wan Team for open-sourcing their code and models. Their contributions have been instrumental to the development of this project.\n\n## 📖 Citation\nIf you find this work useful for your research, please cite our paper:\n\n```\n@article{lingbot-world,\n      title={Advancing Open-source World Models}, \n      author={Robbyant Team and Zelin Gao and Qiuyu Wang and Yanhong Zeng and Jiapeng Zhu and Ka Leong Cheng and Yixuan Li and Hanlin Wang and Yinghao Xu and Shuailei Ma and Yihang Chen and Jie Liu and Yansong Cheng and Yao Yao and Jiayi Zhu and Yihao Meng and Kecheng Zheng and Qingyan Bai and Jingye Chen and Zehong Shen and Yue Yu and Xing Zhu and Yujun Shen and Hao Ouyang},\n      journal={arXiv preprint arXiv:2601.20540},\n      year={2026}\n}\n```\n","LingBot-World 是一个开源的世界模拟器，专注于视频生成。该项目能够创建高保真度和多样化的环境，包括现实、科学场景以及卡通风格等，并且能够在长时间范围内保持上下文一致性，实现分钟级的长期记忆。此外，LingBot-World 支持实时交互，每秒生成16帧时延迟低于1秒。该模型适合用于内容创作、游戏开发及机器人学习等领域，旨在缩小开源与闭源技术之间的差距。通过提供公开访问的代码和模型，LingBot-World 促进了社区在这些领域的实际应用。",2,"2026-06-11 03:48:39","high_star"]