[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74187":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},74187,"dreamzero","dreamzero0\u002Fdreamzero","dreamzero0","Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals","https:\u002F\u002Fdreamzero0.github.io\u002F",null,"Python",2237,193,17,20,0,26,75,310,78,108.86,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:13","# NVIDIA DreamZero: World Action Models Are Zero-Shot Policies\nA research project from [NVIDIA GEAR Lab](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fgear\u002F).\n\n[![NVIDIA](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNVIDIA-76B900?style=flat&logo=nvidia&logoColor=white)](https:\u002F\u002Fwww.nvidia.com) [![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-blue.svg)](LICENSE) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2602.15922-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.15922)\n\n[[Project Page](https:\u002F\u002Fdreamzero0.github.io\u002F)] [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.15922)]\n\nDreamZero is a World Action Model that jointly predicts actions and videos, achieving strong zero-shot performance on unseen tasks. This release package contains everything needed to load a pretrained DreamZero model and run distributed inference via a WebSocket server.\n\n## News\n\n- **02\u002F27:** DreamZero is **#1 on both [MolmoSpaces]([https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fai2-adapt\u002FMolmoSpaces](https:\u002F\u002Fmolmospaces.allen.ai\u002Fleaderboard)) and [RoboArena]([https:\u002F\u002Frobo-arena.github.io\u002F](https:\u002F\u002Frobo-arena.github.io\u002Fleaderboard))**! DreamZero-DROID is trained *from scratch* using only the DROID dataset — no pretraining on large-scale robot data, unlike competing VLAs. This demonstrates the strength of video-model backbones for generalist robot policies (VAMs\u002FWAMs).\n- **02\u002F27:** Released **DreamZero-AgiBot checkpoint** and **post-training code** for efficient few-shot adaptation. Post-train on just ~30 minutes of play data for your specific robot, and see the robot do basic language following and pick-and-place (see YAM experiments in our paper for more detail).\n- **02\u002F20:** Released the **full training codebase, preprocessed dataset, and guide for new embodiments** to replicate the DreamZero-DROID checkpoint and train on your own robot. See [Adding a New Embodiment to DreamZero](docs\u002FDATASET_TO_GEAR_AND_TRAIN.md) for a step-by-step walkthrough.\n\n## Features\n\n**Available Now**\n- Pretrained DreamZero-DROID model checkpoint [[Huggingface](https:\u002F\u002Fhuggingface.co\u002FGEAR-Dreams\u002FDreamZero-DROID)]\n- Pretrained DreamZero-AgiBot checkpoint (for post-training on new embodiments) [[Huggingface](https:\u002F\u002Fhuggingface.co\u002FGEAR-Dreams\u002FDreamZero-AgiBot)]\n- Distributed WebSocket inference server (GB200, H100)\n- DiT caching for optimized inference (~0.6s on GB200, ~3s on H100)\n- DROID simulation evaluation support\n- [RoboArena](https:\u002F\u002Frobo-arena.github.io\u002F) integration (DROID real)\n- Video generation and saving (MP4)\n- LoRA and full fine-tuning training scripts\n- Training on new embodiments (AgiBot, YAM) — see [guide](docs\u002FDATASET_TO_GEAR_AND_TRAIN.md)\n\n**Coming Soon**\n- [PolaRiS](https:\u002F\u002Fpolaris-evals.github.io\u002F) simulation environment support\n- [Genie 3.0](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.02078) sim environment support for DreamZero-AgiBot\n\n## Testing Out DreamZero in Simulation with API\nWe provide an inference script that directly evaluates a hosted DreamZero-DROID policy on [`sim_evals`](https:\u002F\u002Fgithub.com\u002Farhanjain\u002Fsim-evals). To test out the policy, first request access to the API via this form [link](https:\u002F\u002Fforms.gle\u002FzCj5zjDvHsoeuMXU7). Then, follow these instructions to install [`sim_evals`](https:\u002F\u002Fgithub.com\u002Farhanjain\u002Fsim-evals) and launch evaluation.\n\n```bash\n# Clone repository\ngit clone --recurse-submodules https:\u002F\u002Fgithub.com\u002Farhanjain\u002Fsim-evals.git\ncd sim-evals\n\n# Install uv\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n\n# Activate uv environment\nuv sync\nsource .venv\u002Fbin\u002Factivate\n\n# [Optional] update pytorch versions\npip install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu129\n\n# Download assets (may need to export HF_TOKEN=\u003CYOUR_HUGGINGFACE_TOKEN> first)\nuvx hf download owhan\u002FDROID-sim-environments --repo-type dataset --local-dir assets\n\n# Run eval script\ncd ..\npython eval_utils\u002Frun_sim_eval.py --host \u003CAPI_HOST> --port \u003CAPI_PORT> \n```\n\nThe outputs are saved in `runs` directory.\n\n\n## Quick Start\n\n### Prerequisites\n\n- **Python**: 3.11\n- **Hardware**: Multi-GPU setup (tested on GB200, H100)\n  - Minimum: 2 GPUs for distributed inference\n- **CUDA**: Compatible GPU with CUDA 12.9+\n\n### Installation\n\n1. **Create conda environment:**\n```bash\nconda create -n dreamzero python=3.11\nconda activate dreamzero\n```\n\n2. **Install dependencies (PyTorch 2.8+ with CUDA 12.9+):**\n```bash\npip install -e . --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu129\n```\n\n3. **Install flash attention:**\n```bash\nMAX_JOBS=8 pip install --no-build-isolation flash-attn\n```\n\n4. **[GB200 ONLY, SKIP FOR H100] Install Transformer Engine:**\n```bash\npip install --no-build-isolation transformer_engine[pytorch]\n```\n\n5. **[GB200 ONLY FOR TENSORRT, SKIP FOR H100] Install Tensorrt:**\n```bash\npip install tensorrt==10.13.2.6 tensorrt_cu13==10.13.2.6 tensorrt_cu13_libs==10.13.2.6 tensorrt_cu13_bindings==10.13.2.6 --no-deps\npip install transformer_engine==2.10.0 transformer_engine_cu12==2.10.0 transformer_engine_torch==2.10.0\n```\n\n## Downloading Pretrained Checkpoints\n\n### DreamZero-DROID (for inference)\n\nWe release a 14B pretrained DROID checkpoint on [Huggingface](https:\u002F\u002Fhuggingface.co\u002FGEAR-Dreams\u002FDreamZero-DROID). To download the checkpoint, run\n\n```bash\nhf download GEAR-Dreams\u002FDreamZero-DROID --repo-type model --local-dir \u003Cpath\u002Fto\u002Fcheckpoint>\n```\n\n### DreamZero-AgiBot (for fine-tuning on new embodiments)\n\nTo fine-tune DreamZero on a new embodiment (e.g. YAM, AgiBot), download the pretrained [DreamZero-AgiBot](https:\u002F\u002Fhuggingface.co\u002FGEAR-Dreams\u002FDreamZero-AgiBot) checkpoint (~45GB) to `.\u002Fcheckpoints\u002FDreamZero-AgiBot`:\n\n```bash\ngit clone https:\u002F\u002Fhuggingface.co\u002FGEAR-Dreams\u002FDreamZero-AgiBot .\u002Fcheckpoints\u002FDreamZero-AgiBot\n```\n\nOr with the Hugging Face CLI:\n\n```bash\nhf download GEAR-Dreams\u002FDreamZero-AgiBot --repo-type model --local-dir .\u002Fcheckpoints\u002FDreamZero-AgiBot\n```\n\nThe YAM and AgiBot training scripts use `pretrained_model_path=.\u002Fcheckpoints\u002FDreamZero-AgiBot` by default. See the [new embodiment guide](docs\u002FDATASET_TO_GEAR_AND_TRAIN.md) for usage.\n\n## Running the Inference Server\n\n### Command Overview\n\nThe inference server uses PyTorch distributed training utilities to parallelize the model across multiple GPUs:\n\n```bash\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --standalone --nproc_per_node=2 socket_test_optimized_AR.py --port 5000 --enable-dit-cache --model-path \u003Cpath\u002Fto\u002Fcheckpoint>\n```\n\n(Optional only for GB200) Tensorrt enables faster generation\n```bash\nexport LOAD_TRT_ENGINE=\u003Cpath\u002Fto\u002Fcheckpoint>\u002Ftensorrt\u002Fwan\u002FWanModel_nvfp4.trt \nexport DYNAMIC_CACHE_SCHEDULE=true \nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --standalone --nproc_per_node=2 \u002Fmnt\u002Faws-lfs-02\u002Fshared\u002Fseonghyeony\u002Fdreamzero\u002Fsocket_test_optimized_AR.py --port 8000 --enable-dit-cache --model-path \u003Cpath\u002Fto\u002Fcheckpoint>\n```\nTo verify the server is working, run a test client. The first few inferences will take a few minutes to warm up. After warming up, inference takes ~0.6s on GB200 and ~3s on H100.\n\n```\npython test_client_AR.py --port 5000\n```\n\n### Command-line Arguments\n\n- `--port`: Port number for the WebSocket server (default: 8000)\n- `--model-path`: Path to the pretrained model checkpoint directory\n- `--enable-dit-cache`: Enable caching in DiT layers for faster inference (recommended)\n- `--max-chunk-size`: Override max_chunk_size for inference (optional)\n- `--timeout-seconds`: Server timeout in seconds (default: 50000)\n- `--index`: Index for output directory naming (default: 0)\n\n\n### Output\n\nThe server saves:\n- **Videos**: Generated video predictions as MP4 files in `{model_path}\u002Freal_world_eval_gen_{date}_{index}\u002F{checkpoint_name}\u002F`\n- **Input observations**: Saved per message in `{output_dir}\u002Finputs\u002F{msg_index}_{timestamp}\u002F`\n\n\n## Training\n\n> **Training on a new embodiment?** See [Adding a New Embodiment to DreamZero](docs\u002FDATASET_TO_GEAR_AND_TRAIN.md) for a complete guide on converting your dataset, configuring modalities, and launching training. \u003Cem>Make sure to align the 3 camera view order to ensure positive transfer.\u003C\u002Fem>\n\n### Downloading Pretrained Base Model Weights\n\nDreamZero is built on top of [Wan2.1-I2V-14B-480P](https:\u002F\u002Fhuggingface.co\u002FWan-AI\u002FWan2.1-I2V-14B-480P) and uses the [umt5-xxl](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fumt5-xxl) tokenizer. Download both before training:\n\n```bash\npip install \"huggingface_hub[cli]\"\n\n# You may need to set your HuggingFace token:\n# export HF_TOKEN=\u003CYOUR_HUGGINGFACE_TOKEN>\n\n# Download Wan2.1 model weights (~28GB)\nhf download Wan-AI\u002FWan2.1-I2V-14B-480P --local-dir .\u002Fcheckpoints\u002FWan2.1-I2V-14B-480P\n\n# Download umt5-xxl tokenizer\nhf download google\u002Fumt5-xxl --local-dir .\u002Fcheckpoints\u002Fumt5-xxl\n```\n\n> **Note:** The training script will auto-download these if they are not found at the configured paths, but pre-downloading is recommended to avoid delays at launch.\n\n### DROID Dataset\n\nWe release the preprocessed DROID dataset used to train DreamZero on HuggingFace: [GEAR-Dreams\u002FDreamZero-DROID-Data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FGEAR-Dreams\u002FDreamZero-DROID-Data).\n\nThis dataset is derived from the [DROID 1.0.1](https:\u002F\u002Fdroid-dataset.github.io\u002F) dataset with the following modifications:\n- Converted from RLDS\u002FTFDS format to [LeRobot](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flerobot) v2.0 format\n- Idle frames removed using [Physical Intelligence's idle frame detector](https:\u002F\u002Fgithub.com\u002FPhysical-Intelligence\u002Fopenpi\u002Fblob\u002Fmain\u002Fexamples\u002Fdroid\u002FREADME_train.md#data-filtering) (`droid_sample_ranges_v1_0_1.json`)\n- Episodes without language annotations are filtered out\n- Successful episodes only (episodes with non-zero reward)\n- 3 camera views: `exterior_image_1_left`, `exterior_image_2_left`, `wrist_image_left`\n\n**To download the preprocessed dataset (~131GB):**\n\n```bash\nhuggingface-cli download GEAR-Dreams\u002FDreamZero-DROID-Data --repo-type dataset --local-dir .\u002Fdata\u002Fdroid_lerobot\n```\n\nIf you want to reproduce the dataset conversion from raw DROID 1.0.1 yourself (or modify the filtering), see [docs\u002FDROID_CONVERSION.md](docs\u002FDROID_CONVERSION.md).\n\n### Running Training\n\n```bash\n# Configure paths (override defaults as needed)\nexport DROID_DATA_ROOT=\".\u002Fdata\u002Fdroid_lerobot\"\nexport OUTPUT_DIR=\".\u002Fcheckpoints\u002Fdreamzero_droid\"\nexport NUM_GPUS=4\n\n# Point to your downloaded model weights (if not using default paths)\nexport WAN_CKPT_DIR=\".\u002Fcheckpoints\u002FWan2.1-I2V-14B-480P\"\nexport TOKENIZER_DIR=\".\u002Fcheckpoints\u002Fumt5-xxl\"\n\n# Launch training\nbash scripts\u002Ftrain\u002Fdroid_training.sh\n```\n\n**Using Wan2.2-TI2V-5B backbone (5B params, lower VRAM):** To train with the smaller Wan2.2-TI2V-5B model instead of Wan2.1-I2V-14B, see [docs\u002FWAN22_BACKBONE.md](docs\u002FWAN22_BACKBONE.md) and run `bash scripts\u002Ftrain\u002Fdroid_training_wan22.sh`.\n\n### Training Configuration\n\nThe training script uses Hydra for configuration and DeepSpeed ZeRO Stage 2 for distributed training. Key defaults:\n\n| Parameter | Default | Description |\n|---|---|---|\n| `NUM_GPUS` | 4 | Number of GPUs |\n| `per_device_train_batch_size` | 1 | Batch size per GPU |\n| `learning_rate` | 1e-5 | Learning rate |\n| `max_steps` | 10 | Max training steps (increase for full training) |\n| `warmup_ratio` | 0.05 | Warmup ratio |\n| `weight_decay` | 1e-5 | Weight decay |\n| `image_resolution_width` | 320 | Image width |\n| `image_resolution_height` | 176 | Image height |\n| `num_frames` | 33 | Number of video frames |\n| `action_horizon` | 24 | Action prediction horizon |\n| `save_lora_only` | true | Only save LoRA weights |\n| `bf16` | true | Use bfloat16 precision |\n\n> **Note:** `max_steps=10` is set for a quick sanity check. For full training, increase this to your desired number of steps and configure `save_steps` \u002F `save_strategy` accordingly.\n\n\n## Citation\n\nIf you use DreamZero in your research, please cite:\n\n```bibtex\n@misc{ye2026worldactionmodelszeroshot,\n      title={World Action Models are Zero-shot Policies}, \n      author={Seonghyeon Ye and Yunhao Ge and Kaiyuan Zheng and Shenyuan Gao and Sihyun Yu and George Kurian and Suneel Indupuru and You Liang Tan and Chuning Zhu and Jiannan Xiang and Ayaan Malik and Kyungmin Lee and William Liang and Nadun Ranawaka and Jiasheng Gu and Yinzhen Xu and Guanzhi Wang and Fengyuan Hu and Avnish Narayan and Johan Bjorck and Jing Wang and Gwanghyun Kim and Dantong Niu and Ruijie Zheng and Yuqi Xie and Jimmy Wu and Qi Wang and Ryan Julian and Danfei Xu and Yilun Du and Yevgen Chebotar and Scott Reed and Jan Kautz and Yuke Zhu and Linxi \"Jim\" Fan and Joel Jang},\n      year={2026},\n      eprint={2602.15922},\n      archivePrefix={arXiv},\n      primaryClass={cs.RO},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.15922}, \n}\n```\n\n## License\n\nThis project is licensed under the [Apache License 2.0](LICENSE).\n\n## Support\n\nFor issues and questions:\n- Check the troubleshooting section above\n- Review server logs for detailed error messages\n- Verify your checkpoint is compatible with this release\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=dreamzero0\u002Fdreamzero&type=Date)](https:\u002F\u002Fstar-history.com\u002F#dreamzero0\u002Fdreamzero&Date)\n","DreamZero是一个由NVIDIA GEAR实验室开发的世界行动模型，能够同时预测动作和视频，并在未见过的任务上实现强大的零样本性能。该项目的核心功能包括预训练、微调及评估DreamZero模型的能力，支持通过WebSocket服务器进行分布式推理。它还提供了对DROID仿真环境的支持以及与RoboArena的集成，适用于机器人技术研究和开发场景，特别是对于需要高效适应新任务或新机器人体态的应用。此外，DreamZero展示了仅使用DROID数据集从头开始训练也能达到顶尖表现，强调了视频模型作为通用机器人策略基础的重要性。",2,"2026-06-11 03:49:25","high_star"]