[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72455":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72455,"giga-brain-0","open-gigaai\u002Fgiga-brain-0","open-gigaai","GigaBrain-0: A World Model-Powered Vision-Language-Action Model",null,"Python",2540,197,144,10,0,3,7,26,9,70.99,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:05","![GigaBrain-0 Overview](docs\u002Fsource\u002Fimgs\u002Fgiga_brain_0_overview.png)\n\n\u003Cdiv align=\"center\" style=\"font-family: charter;\">\n    \u003Ch1> GigaBrain-0: A World Model-Powered Vision-Language-Action Model \u003C\u002Fh1>\n\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![Project](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-99cc2)](https:\u002F\u002Fgigabrain0.github.io\u002F)\n[![Papers](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-ArXiv-99cc2)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19430)\n[![Models](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModels-Huggingface-red?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002Fopen-gigaai\u002Fmodels)\n\n\u003C\u002Fdiv>\n\n## 📰 News\n- **`[2026\u002F03\u002F10]`** We will host the [GigaBrain Challenge 2026 @ CVPR 2026](https:\u002F\u002Fgigaai-research.github.io\u002FGigaBrain-Challenge-2026\u002F) with three competition tracks: RoboTwin (simulation), GigaWorld (World Model), and RoboChallenge (real robot). We also have a call for papers on [OpenReview](https:\u002F\u002Fopenreview.net\u002Fgroup?id=thecvf.com\u002FCVPR\u002F2026\u002FWorkshop\u002FGigaBrain_Challenge) and will select a Best Paper Award.\n- **`[2026\u002F02\u002F13]`** Released [GigaBrain-0.5M* technique report](https:\u002F\u002Fgigabrain05m.github.io\u002F). GigaBrain-0.5M* is a VLA that learns from world model-based reinforcement learning.\n- **`[2026\u002F02\u002F09]`** 🎉 GigaBrain-0.1 achieved 1st place on the RoboChallenge leaderboard.\n- **`[2026\u002F02\u002F02]`** Released GigaBrain-0.1 model weights, which follow the same usage as GigaBrain-0 but achieve better performance.\n- **`[2025\u002F11\u002F27]`** Released GigaBrain-0 model weights. This version of the model excludes depth images and intermediate 2D manipulation trajectories for more user-friendly use. However, the code supports these features — if your dataset contains them and you wish to use them, simply enable the corresponding options in the configuration.\n- **`[2025\u002F11\u002F27]`** Released the model architecture, as well as the pre-training and post-training implementations.\n\n## ✨ Introduction\n\nTraining Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of data collection severely limits the scalability, and generalization capacity of current VLA systems. Therefore, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data. By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints.\n\n![GigaBrain-0 Architecture](docs\u002Fsource\u002Fimgs\u002Fgiga_brain_0_architecture.png)\n\n## 💾 Data\nGigaBrain-0 was trained using approximately 1k hours of real-world robot data together with a large amount of World Model–generated data. GigaBrain-0.1 scales the training data to 10k hours, with the detailed data composition shown in the figure below.\n\n![GigaBrain-0 Architecture](docs\u002Fsource\u002Fimgs\u002Fgiga_brain_0.1_data.png)\n\n## 📊 Results\n\nLeveraging the efficient world-model data engine and innovations in model architecture, GigaBrain-0.1 has demonstrated rapid performance improvements. GigaBrain-0.1 outperforms GigaBrain-0 across all real-robot tasks and achieves performance on complex long-horizon tasks comparable to $\\pi_{0.5}$.\n\n![GigaBrain-0 Performance](docs\u002Fsource\u002Fimgs\u002Fgiga_brain_0.1_performance.png)\n\n- Paper Towel Preparation*: Compared to the release with GigaBrain-0, the \"Paper Towel Preparation\" task has been re-evaluated under a new setting.\n\n\n## 🤖 RoboChallenge\n\nUsing GigaBrain-0.1 to train on RoboChallenge tasks, we achieved 1st place on the leaderboard.\n\n![GigaBrain-0 Performance](docs\u002Fsource\u002Fimgs\u002Fgiga_brain_0.1_robochallenge.png)\n\n## ⚡ Installation\n\nGigaBrain-0 depends on the following three frameworks:\n\n- [GigaTrain](https:\u002F\u002Fgithub.com\u002Fopen-gigaai\u002Fgiga-train): An Efficient and Scalable Training Framework for AI Models.\n- [GigaDatasets](https:\u002F\u002Fgithub.com\u002Fopen-gigaai\u002Fgiga-datasets): A Unified and Lightweight Framework for Data Curation, Evaluation and Visualization.\n- [GigaModels](https:\u002F\u002Fgithub.com\u002Fopen-gigaai\u002Fgiga-models): A Comprehensive Repository for Multi-modal, Generative, and Perceptual Models.\n\nWe recommend a fresh conda environment.\n\n```bash\nconda create -n giga_brain_0 python=3.11.10 -y\nconda activate giga_brain_0\n\npip3 install giga-train\npip3 install giga-datasets\npip3 install lerobot==0.3.2\npip3 install matplotlib\npip3 install numpydantic\n\ngit clone https:\u002F\u002Fgithub.com\u002Fopen-gigaai\u002Fgiga-models.git\ncd giga-models\npip3 install -e .\n\ngit clone https:\u002F\u002Fgithub.com\u002Fopen-gigaai\u002Fgiga-brain-0.git\ncd giga-brain-0\n\n```\n\n## 🚀 Quick Start\n\n### 1. Data preparation (LeRobot format) and normalization\n\nTo begin, convert your data to the LeRobot format. For reference, see `scripts\u002Fconvert_from_hdf5.py`, which demonstrates how to convert AgileX data (HDF5 files) to LeRobotDataset.\n\n```bash\npython scripts\u002Fconvert_from_hdf5.py \\\n  --data-path \u002Fpath\u002Fto\u002Fraw_hdf5_data_path \\\n  --out-dir \u002Fpath\u002Fto\u002Flerobot_dataset \\\n  --task \"Task prompt here\"\n```\n\nIf your dataset is already in LeRobot format, compute normalization stats for `observation.state` and `action` using our script:\n\n```bash\n\npython scripts\u002Fcompute_norm_stats.py \\\n  --data-paths \u002Fpath\u002Fto\u002Flerobot_dataset1 \u002Fpath\u002Fto\u002Flerobot_dataset2 \\\n  --output-path \u002Fpath\u002Fto\u002Fnorm_stats.json \\\n  --embodiment-id {embodiment-id} \\\n  --delta-mask {delta-mask} \\\n  --sample-rate 1.0 \\\n  --action-chunk 50 \\\n  --action-dim 32\n\n```\n\nFor AgileX Cobot Magic:\n\n- embodiment_id = 0\n- delta_mask = \\[True, True, True, True, True, True, False, True, True, True, True, True, True, False\\]\n\nFor Agibot G1:\n\n- embodiment_id = 1\n- delta_mask = \\[True, True, True, True, True, True, True, False, True, True, True, True, True, True, True, False, True, True, True, True\\]\n\nTo support custom robot-type data, you can add a newly initialized action-specific linear and train the newly added linear only and freeze other weights.\n\nThen point your training config to the produced `norm_stats.json` (see examples in `configs`).\n\n### 2. Download GigaBrain-0\u002F0.1 checkpoints from Hugging Face\n\n|         Model         |                                  HF Link                                   |                                                           Description                                                            |\n| :-------------------: | :------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: |\n| GigaBrain-0.1-3.5B-Base | 🤗 [Huggingface](https:\u002F\u002Fhuggingface.co\u002Fopen-gigaai\u002FGigaBrain-0.1-3.5B-Base) | More generalizable, more robust, and more powerful. |\n| GigaBrain-0-3.5B-Base | 🤗 [Huggingface](https:\u002F\u002Fhuggingface.co\u002Fopen-gigaai\u002FGigaBrain-0-3.5B-Base) | The current release of the model excludes depth images and intermediate 2D manipulation trajectories for more user-friendly use. |\n\n\n### 3. Training\n\nWe provide ready-to-use configs for GigaBrain-0. Adjust `gpu_ids`, `batch_size_per_gpu`, `data_paths`, and `norm_stats_path` as needed.\n\nLogs, configs and checkpoints will be stored at the path `project_dir`\n\nPre-training:\n\n```bash\npython scripts\u002Ftrain.py --config configs.giga_brain_0_from_scratch.config\n```\n\nFine-tuning:\n\n```bash\npython scripts\u002Ftrain.py --config configs.giga_brain_0_agilex_finetune.config  # or\n\npython scripts\u002Ftrain.py --config configs.giga_brain_0_agibot_finetune.config\n```\n\nConfiguration details can be checked in [configure_introduction.md](docs\u002Fconfigure_introduction.md)\n\n### 4. Inference\n\nRun inference on a LeRobot dataset and optionally visualize predictions.\n\nFor the same model weights, we provide three different scripts to support different output information.\n\n- Inference continuous action:\n\n  ```bash\n  python scripts\u002Finference.py \\\n    --model-path \u002Fpath\u002Fto\u002Fgiga_brain_0_checkpoints \\\n    --data-path \u002Fpath\u002Fto\u002Flerobot_dataset \\\n    --norm-stats-path \u002Fpath\u002Fto\u002Fnorm_stats.json \\\n    --output-path \u002Ftmp\u002Fvis_path \\\n    --delta-mask \u003CDELTA_MASK> \\\n    --embodiment-id \u003CEMBODIMENT_ID> \\\n    --action-chunk 50 \\\n    --original-action-dim \u003CACTION_DIM> \\\n    --tokenizer-model-path google\u002Fpaligemma2-3b-pt-224 \\\n    --fast-tokenizer-path physical-intelligence\u002Ffast \\\n    --device cuda\n  ```\n\n- Inference subgoal prediction:\n\n  ```bash\n  python scripts\u002Finference_task_planning.py \\\n    --model-path \u002Fpath\u002Fto\u002Fgiga_brain_0_checkpoints \\\n    --data-path \u002Fpath\u002Fto\u002Flerobot_dataset \\\n    --norm-stats-path \u002Fpath\u002Fto\u002Fnorm_stats.json \\\n    --delta-mask \u003CDELTA_MASK> \\\n    --embodiment-id \u003CEMBODIMENT_ID> \\\n    --original-action-dim \u003CACTION_DIM> \\\n    --tokenizer-model-path google\u002Fpaligemma2-3b-pt-224 \\\n    --fast-tokenizer-path physical-intelligence\u002Ffast \\\n    --device cuda\n  ```\n\n- Inference discrete action in autoregressive mode (usually for debugging):\n\n  ```bash\n  python scripts\u002Finference_discrete_action.py \\\n    --model-path \u002Fpath\u002Fto\u002Fgiga_brain_0_checkpoints \\\n    --data-path \u002Fpath\u002Fto\u002Flerobot_dataset \\\n    --norm-stats-path \u002Fpath\u002Fto\u002Fnorm_stats.json \\\n    --output-path \u002Ftmp\u002Fvis_path \\\n    --delta-mask \u003CDELTA_MASK> \\\n    --embodiment-id \u003CEMBODIMENT_ID> \\\n    --original-action-dim \u003CACTION_DIM> \\\n    --tokenizer-model-path google\u002Fpaligemma2-3b-pt-224 \\\n    --fast-tokenizer-path physical-intelligence\u002Ffast \\\n    --device cuda\n  ```\n\n### 5. Robot deployment\n\n- Run the server:\n\n  ```bash\n  python scripts\u002Finference_server.py \\\n    --model-path \u002Fpath\u002Fto\u002Fgiga_brain_0_checkpoints \\\n    --tokenizer-model-path google\u002Fpaligemma2-3b-pt-224 \\\n    --fast-tokenizer-path physical-intelligence\u002Ffast \\\n    --delta-mask \u003CDELTA_MASK> \\\n    --embodiment-id \u003CEMBODIMENT_ID> \\\n    --norm-stats-path \u002Fpath\u002Fto\u002Fnorm_stats.json \\\n    --original-action-dim \u003CACTION_DIM> \\\n    --autoregressive-mode-only False\n  ```\n\n- Run the client:\n\n  ```bash\n  python scripts\u002Finference_client.py\n  ```\n\nThis is a minimal client example. It generates random observations to demonstrate the end-to-end request\u002Fresponse flow with the server. You can copy the relevant client code onto your robot and replace the random inputs with real onboard sensor data (e.g., cameras, proprioception) and your robot's control interface. Ensure input shapes and field names remain consistent with the server's expectations.\n\nWe also provide an inference client script for AgileX robots: `scripts\u002Finference_agilex_client.py`.\n\nMake sure the host and port are the same in both server and client.\n\n## 📄 License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## Citation\n\n```bibtex\n@article{gigaai2025gigabrain0,\n  title={GigaBrain-0: A World Model-Powered Vision-Language-Action Model},\n  author={GigaAI},\n  year={2025},\n  eprint={2510.19430},\n  archivePrefix={arXiv},\n  primaryClass={cs.CV},\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19430},\n}\n```\n","GigaBrain-0 是一个基于世界模型的视觉-语言-动作（VLA）模型，旨在通过合成数据减少对大规模真实机器人数据的依赖，从而提高跨任务的泛化能力。其核心功能包括利用世界模型生成多样化数据、RGBD输入建模以及具身链式思维监督，这些技术共同提升了模型在执行复杂任务时的空间几何理解、物体状态识别及长期依赖关系处理能力。适用于需要高效训练和良好泛化性能的通用机器人场景，特别是在灵巧操作、长周期任务及移动操作等领域展现出显著优势。",2,"2026-06-11 03:42:08","high_star"]