[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74261":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":23,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},74261,"giga-world-policy","open-gigaai\u002Fgiga-world-policy","open-gigaai","GigaWorld-Policy: An Efficient Action-Centered World–Action Model",null,"Python",1280,98,75,7,0,4,9,33,12,66.79,false,"main",true,[],"2026-06-12 04:01:14","# GigaWorld-Policy\n\n> **GigaWorld-Policy: An Efficient Action-Centered World-Action Model**\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2603.17240-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.17240)\n[![Website](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-Project_Page-blue.svg)](https:\u002F\u002Fgigaai-research.github.io\u002FGigaWorld-Policy\u002F)\n\n## 📖 Overview\n\nWorld-Action Models (WAM) initialized from pre-trained video generation backbones have demonstrated remarkable potential for robot policy learning. **GigaWorld-Policy** is an action-centered WAM that learns 2D pixel-action dynamics while enabling efficient action decoding, with optional video generation.\n\n### Key Features\n\n- 🚀 **9x faster** inference compared to Motus (leading WAM baseline)\n- 📈 **7% higher** task success rate than Motus\n- 💪 **95% improvement** over pi-0.5 on RoboTwin 2.0\n\n## 🛠️ Installation\n\n### 1. Create Conda Environment\n\n```bash\nconda create -n gigaworld-policy python==3.11\nconda activate gigaworld-policy\n```\n\n### 2. Install Dependencies\n\n```bash\npip install .\u002Fthird_party\u002Fgiga-train\npip install .\u002Fthird_party\u002Fgiga-models\npip install .\u002Fthird_party\u002Fgiga-datasets\n```\n\n## 📊 Data Preprocessing\n\nBefore training, you need to compute the normalization statistics and pre-compute the T5 text embeddings for your dataset.\n\n### 1. Compute Normalization Statistics\n\nThis will generate a `norm_stats_delta.json` file which is required by the policy.\n\n```bash\npython -m scripts.compute_norm_stats \\\n  --data_paths \"\u002Fpath\u002Fto\u002Fdataset_dir\" \\\n  --output_path \"\u002Fpath\u002Fto\u002Fnorm_stats_delta.json\" \\\n  --embodiment_id {embodiment-id} \\\n  --delta-mask {delta-mask} \\\n  --sample-rate 1.0 \\\n  --action-chunk 48 \\\n```\n\n*   `--embodiment_id`: Check `compute_norm_stats.py` for the mapping from robot type to ID.\n*   `--delta_mask`: A boolean mask indicating which action dimensions are deltas (True) vs. absolute values (False).\n\n### 2. Compute T5 Embeddings\n\nThis will pre-compute and save the T5 text embeddings for the language instructions in your dataset.\n\n```bash\npython -m scripts.compute_t5_embedding \\\n  --repo_id \"\u002Fpath\u002Fto\u002Fdataset_dir\" \\\n  --root \"\u002Fpath\u002Fto\u002Fdataset_dir\" \\\n  --wan_path \"\u002Fpath\u002Fto\u002FWan2.2-TI2V-5B\" \\\n  --device \"cuda\" \\\n  --text_len 512 \\\n  --t5_folder_name \"t5_embedding\"\n```\n\n## ⚙️ Configuration\n\nAfter completing the data preprocessing steps, modify the config file `world_action_model\u002Fconfigs\u002Fexample.py` to point to the generated files and your model weights:\n\n| Parameter | Description |\n|-----------|-------------|\n| `models.pretrained` | Path to your pretrained model weights |\n| `transform.norm_path` | Path to the generated `norm_stats_delta.json` |\n| `data_dir` | Path to your dataset |\n\n## 🚀 Training\n\nOnce the data is preprocessed and the configuration is set, you can start training:\n\n```bash\npython -m scripts.train --config world_action_model.configs.example.config\n```\n\n## 🚀 Inference\n\nWe provide an inference server and a simple open-loop evaluation client. Open-loop here means we sample observations (images\u002Fstate) from an offline dataset and run inference, without executing actions in a real environment to collect the next observations.\n\n### 1. Start Server\n\n```bash\npython -m scripts.inference_server \\\n  --model_id \"\u002Fpath\u002Fto\u002Fhuggingface_model_dir_or_id\" \\\n  --transformer_path \"\u002Fpath\u002Fto\u002Ftransformer_checkpoint_dir\" \\\n  --stats_path \"\u002Fpath\u002Fto\u002Fnorm_stats_delta.json\" \\\n  --t5_embedding_pkl \"\u002Fpath\u002Fto\u002Ft5_embedding.pt\"\n```\n\nOptionally, add `--return_images` to enable video visualization during inference (videos will be saved under `--vis_dir`):\n\n```bash\npython -m scripts.inference_server \\\n  --model_id \"\u002Fpath\u002Fto\u002Fhuggingface_model_dir_or_id\" \\\n  --transformer_path \"\u002Fpath\u002Fto\u002Ftransformer_checkpoint_dir\" \\\n  --stats_path \"\u002Fpath\u002Fto\u002Fnorm_stats_delta.json\" \\\n  --t5_embedding_pkl \"\u002Fpath\u002Fto\u002Ft5_embedding.pt\" \\\n  --return_images \\\n  --vis_dir \".\u002Fvis\"\n```\n\n### 2. Run Open-loop Client\n\n```bash\npython -m scripts.inference_client \\\n  --dataset_paths \"\u002Fpath\u002Fto\u002Fdataset_dir\" \\\n  --save_dir \".\u002Fvis\"\n```\n\n## 📅 Roadmap\n\n| Component | Status |\n|-----------|--------|\n| Inference Code | ✅ |\n| Training Code | ✅ |\n| Pre-trained Weights | 🔲 |\n\n## 📚 Citation\n\n```bibtex\n@article{ye2026gigaworld,\n  title={GigaWorld-Policy: An Efficient Action-Centered World-Action Model},\n  author={Ye, Angen and Wang, Boyuan and Ni, Chaojun and Huang, Guan and Zhao, Guosheng and Li, Hao and Li, Hengtao and Li, Jie and Lv, Jindi and Liu, Jingyu and Cao, Min and Li, Peng and Deng, Qiuping and Mei, Wenjun and Wang, Xiaofeng and Chen, Xinze and Zhou, Xinyu and Wang, Yang and Chang, Yifan and Li, Yifan and Zhou, Yukun and Ye, Yun and Liu, Zhichao and Zhu, Zheng},\n  journal={arXiv preprint arXiv:2603.17240},\n  year={2026}\n}\n```\n","GigaWorld-Policy 是一个高效的以动作为中心的世界-动作模型，旨在通过预训练的视频生成骨干网络来学习2D像素-动作动态，并支持快速的动作解码和可选的视频生成。其核心功能包括比现有领先基线Motus快9倍的推理速度、7%更高的任务成功率以及在RoboTwin 2.0上相比pi-0.5有95%的改进。该项目使用Python开发，适合需要高效机器人策略学习与执行的应用场景，如自动化控制、机器人导航等。通过提供详尽的数据预处理指南、配置文件调整说明及训练和推理脚本，GigaWorld-Policy为研究人员和技术开发者提供了一个强大的工具包。",2,"2026-06-11 03:49:44","high_star"]