[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81024":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":13,"forks30d":13,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":13,"starSnapshotCount":13,"syncStatus":15,"lastSyncTime":27,"discoverSource":28},81024,"AutoMoT","OscarHuangWind\u002FAutoMoT","OscarHuangWind","[ICML'26] This is the official repository of AutoMoT, an asynchronous VLA as E2E Model","",null,"Python",40,0,29,2,5,10,11,15,53.1,false,"release",[],"2026-06-12 04:01:31","\n# [ICML'26]AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Ficml.cc\u002F\">\n    \u003Cimg src=\".\u002Fassets\u002Ficml_logo.svg\" alt=\"ICML\" height=\"40\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.14851\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2603.14851-b31b1b?style=flat-square&logo=arxiv\" alt=\"arXiv\">\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fautomot-website.github.io\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-AutoMoT-blueviolet?style=flat-square&logo=googlechrome&logoColor=white\" alt=\"Project Page\">\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FOscar-Huang\u002FAutoMoT\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97_Weights-AutoMoT-yellow?style=flat-square\" alt=\"Weights\">\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOscar-Huang\u002FNuSync\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97_Datasets-NuSync-orange?style=flat-square\" alt=\"Datasets\">\u003C\u002Fa>\n\u003C\u002Fp>\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fdcd08673-5ea5-49a1-8dca-5d4b4b8d91fa\n\n**AutoMoT** is an asyncronous VLA end-to-end autonomous driving agent accepted at **ICML 2026**.\n\n> **Current release**: Closed-loop inference on Bench2Drive (220 routes); model checkpoints and NuSync dataset are open-sourced. Training code coming soon — see [TODO](#todo-list).\n\n---\n\n## TODO List \u003Ca name=\"todo-list\">\u003C\u002Fa>\n\n- [x] Bench2Drive closed-loop inference (220 routes, CARLA 0.9.15)\n- [x] Model checkpoint release ([HuggingFace](https:\u002F\u002Fhuggingface.co\u002FOscar-Huang\u002FAutoMoT))\n- [x] NuSync dataset release ([HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOscar-Huang\u002FNuSync))\n- [ ] Training code release\n- [ ] Action Refiner code release\n---\n\n## Table of Contents\n\n1. [Method Overview](#method-overview)\n2. [Repository Structure](#repository-structure)\n3. [Environment Setup](#environment-setup)\n4. [Model Weights](#model-weights)\n5. [Running Evaluation](#running-evaluation)\n6. [Benchmark Results](#benchmark-results)\n7. [TODO List](#todo-list)\n8. [Citation](#citation)\n\n---\n\n## Method Overview \u003Ca name=\"method-overview\">\u003C\u002Fa>\n\nAutoMoT uses an **Asynchronous Mixture-of-Transformers** design: a slow Understanding Expert (4B) performs low-frequency reasoning, while a fast Action Expert (1.6B) runs at high frequency to decode 3-second decisions and spatial-temporal waypoints via KV-cache bridging.\n\n---\n\n## Repository Structure \u003Ca name=\"repository-structure\">\u003C\u002Fa>\n\n```\nBench2Drive_opensource\u002F\n├── Automot\u002F                          # AutoMoT model and agent utilities\n│   ├── mot\u002F\n│   │   ├── modeling\u002F\n│   │   │   ├── automot\u002F              # Core model: AutoMoT, configs, connectors\n│   │   │   ├── bev_encoder\u002F          # BEV encoder backbone \n│   │   │   ├── cache_utils\u002F          # KV-cache utilities\n│   │   │   └── qwen3\u002F                # Qwen3 text backbone\n│   │   ├── data\u002Freasoning\u002F           # Special token handling\n│   │   └── evaluation\u002F               # Inference engine (slow\u002Ffast KV-cache)\n│   ├── team_code\u002F                    # UKF, LiDAR preprocessing, prompt builders\n│   └── checkpoints\u002F                  # Model weights (downloaded separately)\n│       ├── model.safetensors         # All weights: AutoMoT\n│       ├── config.json               # Qwen3-VL model config\n│       ├── tokenizer*.json           # Tokenizer files\n│       ├── preprocessor_config.json  # Vision preprocessor\n│       └── bev_config.json           # BEV encoder GlobalConfig\n├── leaderboard\u002F                      # Bench2Drive evaluation harness\n│   ├── team_code\u002F\n│   │   ├── mot_b2d_agent.py          # Main CARLA agent entry point\n│   │   ├── automot_utils.py          # Model loading + prompt utilities\n│   │   └── bev_data_utils.py         # LiDAR → BEV histogram features\n│   ├── data\u002Fbench2drive220\u002F          # 220 route XML files\n│   └── scripts\u002F\n│       └── run_evaluation_route.sh   # Route-by-route evaluation\n├── eval_json\u002F                        # Route JSON files for evaluation\n│   ├── b2d_all_routes.json           # All 220 routes\n│   ├── b2d_all_routes_split1.json    # Routes 1–110 (for multi-GPU)\n│   ├── b2d_all_routes_split2.json    # Routes 111–220 (for multi-GPU)\n│   └── b2d_all_routes_merged.json    # Route ID index (used by run script)\n├── scenario_runner\u002F                  # CARLA scenario execution\n```\n\n---\n\n## Environment Setup \u003Ca name=\"environment-setup\">\u003C\u002Fa>\n\n### 1. CARLA 0.9.15\n\n```bash\nmkdir carla && cd carla\nwget https:\u002F\u002Fcarla-releases.s3.us-east-005.backblazeb2.com\u002FLinux\u002FCARLA_0.9.15.tar.gz\ntar -xvf CARLA_0.9.15.tar.gz\ncd Import && wget https:\u002F\u002Fcarla-releases.s3.us-east-005.backblazeb2.com\u002FLinux\u002FAdditionalMaps_0.9.15.tar.gz\ncd .. && bash ImportAssets.sh\nexport CARLA_ROOT=\u002Fpath\u002Fto\u002Fcarla  # set to the directory containing CarlaUE4.sh\n```\n\n### 2. Create the `automot` environment\n\n```bash\nconda create -n automot python=3.10\nconda activate automot\n```\n\n### 3. PyTorch\n\n```bash\npip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 torchaudio==2.7.1+cu128 \\\n    --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n```\n\n### 4. Python dependencies\n\n```bash\n# Install all requirements\npip install -r requirements.txt\n\n# CARLA Python API (Python 3.10, available on PyPI)\npip install carla==0.9.15\n\n# flash-attn (requires torch to be installed first)\npip install flash-attn==2.8.3 --no-build-isolation\n```\n\n### 5. Environment variables\n\n```bash\nexport CARLA_ROOT=\u002Fpath\u002Fto\u002Fcarla\nexport PYTHONPATH=$CARLA_ROOT\u002FPythonAPI\u002Fcarla:$PYTHONPATH\n```\n\n---\n\n## Model Weights \u003Ca name=\"model-weights\">\u003C\u002Fa>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FOscar-Huang\u002FAutoMoT\">\n    \u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fbadges\u002Fresolve\u002Fmain\u002Fmodel-on-hf-md.svg\" alt=\"Model on HuggingFace\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\nAll weights are hosted at **[Oscar-Huang\u002FAutoMoT](https:\u002F\u002Fhuggingface.co\u002FOscar-Huang\u002FAutoMoT)**.\n\n| File | Local destination | Description | Size |\n|------|------------------|-------------|------|\n| `model.safetensors` | `Automot\u002Fcheckpoints\u002Fmodel.safetensors` | All model weights | ~13 GB |\n| `config.json` | `Automot\u002Fcheckpoints\u002F` | Qwen3-VL model config | \u003C 1 MB |\n| `tokenizer*.json` | `Automot\u002Fcheckpoints\u002F` | Tokenizer files | \u003C 1 MB |\n| `preprocessor_config.json` | `Automot\u002Fcheckpoints\u002F` | Vision preprocessor | \u003C 1 MB |\n| `bev_config.json` | `Automot\u002Fcheckpoints\u002F` | BEV encoder config | \u003C 1 MB |\n\n```bash\nhuggingface-cli download Oscar-Huang\u002FAutoMoT \\\n    --local-dir Automot\u002Fcheckpoints \\\n    --repo-type model\n```\n\n---\n\n## Running Evaluation \u003Ca name=\"running-evaluation\">\u003C\u002Fa>\n\n### Route-by-route evaluation\n\n```bash\ncd leaderboard\u002Fscripts\nbash run_evaluation_route.sh\n```\n\nThis script:\n- Runs all 220 routes sequentially, skipping already completed ones\n- Saves per-route JSON to `leaderboard\u002Fscripts\u002Fv_2json_open\u002F`\n\n---\n\n## Benchmark Results \u003Ca name=\"benchmark-results\">\u003C\u002Fa>\n\nBench2Drive 220-route closed-loop evaluation (DS↑ \u002F SR↑):\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fb2d_final.png\" alt=\"Bench2Drive Results\" width=\"85%\">\n\u003C\u002Fp>\n\n**AutoMoT achieves DS=87.34 \u002F SR=70.00**\n\n---\n\n## Citation \u003Ca name=\"citation\">\u003C\u002Fa>\n\n```bibtex\n@article{huang2026automot,\n  title   = {AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving},\n  author  = {Wenhui Huang and Songyan Zhang and Qihang Huang and Zhidong Wang and Zhiqi Mao and Collister Chua and Zhan Chen and Long Chen and Chen Lv},\n  journal = {arXiv preprint arXiv:2603.14851},\n  year    = {2026},\n  url     = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.14851}\n}\n\n@inproceedings{jia2024bench,\n  title     = {Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving},\n  author    = {Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan},\n  booktitle = {NeurIPS 2024 Datasets and Benchmarks Track},\n  year      = {2024}\n}\n```\n\n---\n\n## Acknowledgements\n\nWe thank [TransFuser++](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fcarla_garage), [SimLingo](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fsimlingo), and [BAGEL](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FBAGEL) for their open-source contributions, which this work builds upon.\n","AutoMoT 是一个用于端到端自动驾驶的异步视觉-语言-动作模型。该项目的核心功能是通过异步混合变压器架构，结合低频理解专家（4B参数）和高频动作专家（1.6B参数），实现高效的决策和路径规划。技术特点包括使用KV-cache桥接机制来优化推理速度与准确性。适用于需要在复杂环境中进行高效、实时导航的自动驾驶场景。目前项目已开放了模型权重及NuSync数据集，并提供了在Bench2Drive上的闭环推理结果，未来还将发布训练代码。","2026-06-11 04:03:13","CREATED_QUERY"]