[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11367":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":15,"lastSyncTime":30,"discoverSource":31},11367,"molmoact2","allenai\u002Fmolmoact2","allenai","Official Repository for MolmoAct2","https:\u002F\u002Fallenai.org\u002Fblog\u002Fmolmoact2",null,"Shell",595,20,7,2,0,18,33,348,54,7.97,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:02:31","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002FMolmoAct2.svg\" alt=\"MolmoAct2 Logo\" width=\"800\" style=\"margin-left:'auto' margin-right:'auto' display:'block'\"\u002F>\n  \u003Cbr>\n  \u003Cbr>\n  \u003Ch1>MolmoAct2: Action Reasoning Models for Real-world Deployment\u003C\u002Fh1>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fallenai\u002Fmolmoact2\u002Fblob\u002Fmain\u002FLICENSE\">\n    \u003Cimg alt=\"GitHub License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fallenai\u002Fmolmoact2\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fallenai.org\u002Fblog\u002Fmolmoact2\">\n    \u003Cimg alt=\"Blog Post\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-Post-F0529C\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.02881\">\n    \u003Cimg alt=\"Paper URL\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.02881-red?logo=arxiv\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fallenai\u002Fmolmoact2-models-69f81e05242e2499606b1be6\">\n    \u003Cimg alt=\"Base Models\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF-Base%20Models-yellow?logo=huggingface\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fallenai\u002Fmolmoact2-finetuned-models-69f81e23d5a7b34fde34f2ce\">\n    \u003Cimg alt=\"Finetuned Models\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF-Finetuned%20Models-yellow?logo=huggingface\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fallenai\u002Fmolmoact2-bimanualyam-dataset-69f81e17b140ec34f430a35e\">\n    \u003Cimg alt=\"MolmoAct2-BimanualYAM Dataset\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF-MolmoAct2--BimanualYAM%20Dataset-yellow?logo=huggingface\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fallenai\u002Fmolmoact2-datasets-69f81e316ec3daafe3f9555c\">\n    \u003Cimg alt=\"Robotics Datasets\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF-Robotics%20Datasets-yellow?logo=huggingface\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fallenai\u002Fmolmo2-er-datasets-69f8d605d92d46a5fc24ced2\">\n    \u003Cimg alt=\"ER Datasets\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF-ER%20Datasets-yellow?logo=huggingface\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\nMolmoAct2 is Ai2's open family of action reasoning models for robot control and real-world deployment. It builds on the Molmo2-ER embodied-reasoning vision-language backbone, adds robot state and action modeling, and connects the VLM to a flow-matching continuous action expert for closed-loop manipulation. The release includes base checkpoints for continued training, fine-tuned robot policies for evaluation and deployment, and the datasets used to build MolmoAct2 and Molmo2-ER.\n\n---\n### Updates\n- **[2026\u002F05\u002F17]** 🔥 We have released FastAPI inference servers for MolmoAct2 using DROID and YAM setups at [**Inference Servers**](#5-inference-servers) (implemented by [Jie Wang](https:\u002F\u002Fgithub.com\u002FEverloom-129)).\n- **[2026\u002F05\u002F14]** 🔥 We have released MolmoAct2 lerobot workflow for fine-tuning and inference. [**Check it out**](https:\u002F\u002Fgithub.com\u002Fallenai\u002Flerobot\u002Ftree\u002Fmolmoact2-policy). \n- **[2025\u002F05\u002F06]** 🔥 Detail implementation and setup for Franka, SO-100\u002F101, and bimanual YAM have been released at  [**Real-world Deployment**](#4-real-world-deployment).\n- **[2026\u002F05\u002F05] 🔥 [MolmoAct2]([https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fallenai\u002Fmolmoact-689697591a3936fba38174d7](https:\u002F\u002Fallenai.org\u002Fblog\u002Fmolmoact2))** has been released!\n\n\n## 1. Models\n\n### Base Models\n\nWe provide base checkpoints at every training stage for continued MolmoAct2 training and robot fine-tuning. These are foundation checkpoints rather than one-size-fits-all deployment policies.\n\n| Model | Use Case | Description | Checkpoint Path |\n| --- | --- | --- | --- |\n| MolmoAct2 | Fine-tuning | Post-trained MolmoAct2 model with a continuous flow-matching action expert. Use as the default foundation checkpoint for adapting to a target robot embodiment or benchmark. | https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2 |\n| MolmoAct2-Think | Fine-tuning | MolmoAct2 foundation checkpoint with depth-token reasoning. Use when downstream policies should reason over compact depth predictions before acting. | https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-Think |\n| MolmoAct2-Pretrain | Post-training | Pre-trained discrete autoregressive VLA backbone before the continuous action expert is attached. Intended for continuing MolmoAct2 training stages, not direct continuous-control inference. | https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-Pretrain |\n| Molmo2-ER | Pre-training | Embodied-reasoning VLM backbone used as the starting point for MolmoAct2 action models. | https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmo2-ER |\n\n### Finetuned Models\n\nWe also provide fine-tuned checkpoints for common robot platforms and benchmarks. These models are intended to run directly in their target setting, or to serve as a stronger starting point for closely related robots. As with any robot policy, performance depends on hardware, cameras, calibration, action conventions, and language\u002Ftask distribution.\n\n| Model | Use Case | Description | Checkpoint Path |\n| --- | --- | --- | --- |\n| MolmoAct2-DROID | Inference \u002F Fine-tuning | MolmoAct2 fine-tuned on the filtered DROID Franka mixture with absolute joint-pose control. Intended for DROID-style policy inference or further fine-tuning. | https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-DROID |\n| MolmoAct2-BimanualYAM | Inference \u002F Fine-tuning | MolmoAct2 fine-tuned on the bimanual YAM mixture with absolute joint-pose control and annotated language instructions. | https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-BimanualYAM |\n| MolmoAct2-SO100_101 | Inference \u002F Fine-tuning | MolmoAct2 fine-tuned on SO-100\u002FSO-101 datasets with absolute joint-pose control and annotated language instructions. | https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-SO100_101 |\n| MolmoAct2-LIBERO | Inference \u002F Fine-tuning | MolmoAct2 fine-tuned on the full LIBERO training mixture, combining Spatial, Object, Goal, and Long suites. | https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-LIBERO |\n| MolmoAct2-Think-LIBERO | Inference \u002F Fine-tuning | MolmoAct2-Think fine-tuned on LIBERO with depth-and-action examples and adaptive depth reasoning. | https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-Think-LIBERO |\n\n## 2. Datasets\n\n| Data | Description | Dataset Path |\n| --- | --- | --- |\n| MolmoAct2-BimanualYAM Dataset | Collection of bimanual YAM datasets and related resources used for MolmoAct2 bimanual training and evaluation. | https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fallenai\u002Fmolmoact2-bimanualyam-dataset-69f81e17b140ec34f430a35e |\n| MolmoAct2 Robotics Datasets | Robotics datasets for MolmoAct2 training and fine-tuning, including SO-100\u002FSO-101, DROID, MolmoAct Dataset, BC-Z, Bridge, and RT-1. | https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fallenai\u002Fmolmoact2-datasets-69f81e316ec3daafe3f9555c |\n| Molmo2-ER Datasets | Embodied reasoning datasets used for Molmo2-ER and MolmoAct2 backbone training, including spatial, 3D, robotics, and visual reasoning data. | https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fallenai\u002Fmolmo2-er-datasets-69f8d605d92d46a5fc24ced2 |\n\nNote that all of the robotics datasets for pre-training and post-training are in LeRobot v3.0 format, paired with extra language annotations.\n\n## 3. LeRobot Integration\n\nMolmoAct2 is integrated into LeRobot as a policy implementation, so users can train, evaluate, and deploy MolmoAct2 with standard LeRobot datasets and workflows. This repository includes the LeRobot integration as a Git submodule at `lerobot\u002F`, pinned to the branch [`allenai\u002Flerobot:molmoact2-policy`](https:\u002F\u002Fgithub.com\u002Fallenai\u002Flerobot\u002Ftree\u002Fmolmoact2-policy).\n\nFor training, although all of our experiments start from the base checkpoint [`allenai\u002FMolmoAct2`](https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2), we recommend starting from the fine-tuned checkpoints listed in the [Finetuned Models](#finetuned-models) section above if your embodiment is similar to [Bimanual YAM](https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-BimanualYAM), [DROID Franka](https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-DROID), or [SO-100\u002FSO-101](https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-SO100_101), as they can provide better initialization and downstream performance. For generic use, use the base checkpoint.\n\nAfter cloning this repository, initialize the submodule from the repo root:\n\n```bash\ngit submodule update --init --recursive\ncd lerobot\n```\n\nFor training, evaluation, and deployment instructions, see the MolmoAct2 LeRobot documentation at [`docs\u002Fsource\u002Fmolmoact2.mdx`](https:\u002F\u002Fgithub.com\u002Fallenai\u002Flerobot\u002Fblob\u002Fmolmoact2-policy\u002Fdocs\u002Fsource\u002Fmolmoact2.mdx). To reproduce the original LIBERO benchmark results exactly with the v0.5.1 evaluation stack, use the pinned inference branch [`allenai\u002Flerobot:molmoact2-hf-inference`](https:\u002F\u002Fgithub.com\u002Fallenai\u002Flerobot\u002Ftree\u002Fmolmoact2-hf-inference) with instructions in [MolmoAct2 README](https:\u002F\u002Fgithub.com\u002Fallenai\u002Flerobot\u002Ftree\u002Fmolmoact2-hf-inference#molmoact2).\n\n## 4. Real-world Deployment\n\nMolmoAct2 supports out-of-the-box deployment on three robot embodiments:\n\n- **SO-100**\n- **Bimanual YAMs**\n- **Franka DROID setup**\n\n### SO-100 Setup\n\nFor the best performance, we recommend using an **SO-100 with the standard wrist configuration** and a **third-person camera**.\n\n### Bimanual YAM Setup\n\nFor the best performance, please build your Bimanual YAM setup following the reference design below:\n\n![Bimanual YAM setup](assets\u002Fm.png)\n\nAll required components can be purchased using this [Bimanual YAM parts list](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F10bg4XJoeIqnuOBLpUlkhJV6QEYn_oK5IZVm5C7_kdbo\u002Fedit?usp=sharing).\n\nImplementation code for setting up, data collection, and inference for Bimanual YAM is [here](https:\u002F\u002Fgithub.com\u002Fwilliamtsai726\u002FYAM)\n\n### Franka Setup\n\nFor the Franka setup, we recommend following the official [DROID implementation](https:\u002F\u002Fgithub.com\u002Fdroid-dataset\u002Fdroid) for best results.\n\n## 5. Inference Servers\n\nThis repository ships two FastAPI inference servers under `examples\u002F`, one per fine-tuned checkpoint. Each server exposes the same `\u002Fact` wire protocol — `json_numpy`-encoded request\u002Fresponse — but with an embodiment-specific schema (camera count, state dimension, normalisation tag).\n\n| Server | Checkpoint | Default port | State dim | Cameras |\n| --- | --- | --- | --- | --- |\n| [`examples\u002Fdroid\u002Fhost_server_droid.py`](examples\u002Fdroid\u002Fhost_server_droid.py) | [`allenai\u002FMolmoAct2-DROID`](https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-DROID) | `8000` | `(8,) = [q1..q7, gripper]` | `external`, `wrist` |\n| [`examples\u002Fyam\u002Fhost_server_yam.py`](examples\u002Fyam\u002Fhost_server_yam.py) | [`allenai\u002FMolmoAct2-BimanualYAM`](https:\u002F\u002Fhuggingface.co\u002Fallenai\u002FMolmoAct2-BimanualYAM) | `8202` | `(14,)` (per-arm 7-D × 2 arms) | `top`, `left`, `right` (order matters) |\n\n### 1. Install [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F)\n\n```bash\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\nexec $SHELL          # reload PATH so the `uv` binary is picked up\nuv --version\n```\n\n### 2. Create the project environment\n\nThe pinned dependencies (CUDA-12.1 PyTorch wheels, `transformers`, `fastapi`, `json-numpy`, …) live in `pyproject.toml`. From the repo root:\n\n```bash\nuv sync                  # creates .venv\u002F and installs all deps\nuv run python -c \"import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))\"\n# expected: True NVIDIA RTX A6000\n```\n\n`uv` reads `.python-version` (3.11) and downloads a matching interpreter if needed. Re-run `uv sync` after pulling new commits.\n\n### 3. Download the checkpoint (~22 GB each)\n\n```bash\nexport HF_HUB_ENABLE_HF_TRANSFER=1                       # fast parallel download\nuv run hf download allenai\u002FMolmoAct2-DROID               # for the DROID server\nuv run hf download allenai\u002FMolmoAct2-BimanualYAM         # for the YAM server\n```\n\nTo put the cache on a different disk, set `HF_HOME=\u002Fpath\u002Fto\u002Fcache` before the download (and when starting the server).\n\n### 4. Start a server\n\n```bash\n# DROID (Franka)\nuv run python examples\u002Fdroid\u002Fhost_server_droid.py --host 0.0.0.0 --port 8000 --dtype bfloat16\n\n# Bimanual YAM\nuv run python examples\u002Fyam\u002Fhost_server_yam.py --host 0.0.0.0 --port 8202 --dtype bfloat16\n```\n\nUseful flags (both servers):\n\n- `--dtype bfloat16|float16|float32` — default `bfloat16`. The DROID model card uses `float32` (~88 GB), which only fits on ~96 GB of free VRAM. The YAM model card reports `float32` at ~26 GB (fits on a single A6000), `bfloat16` under 16 GB. `bfloat16` is the safe default for both.\n- `--device cuda:0`\n- `--cuda-graph` — enables CUDA-graph capture for the action expert (~2× faster per call, ~2 GB extra VRAM). Disabled by default so the server coexists with other GPU workloads.\n- `--no-warmup` — skip the dummy forward pass at startup.\n\n#### bf16 patches\n\nLoading in `bfloat16` is not officially supported by the upstream MolmoAct2 code; each server applies two idempotent patches to the cached `modeling_molmoact2.py` at startup:\n\n1. flow-matching trajectory uses the model dtype instead of hardcoded `float32` (otherwise the action expert errors with `mat1 and mat2 must have the same dtype`),\n2. `_to_array` casts to `float32` before `.numpy()` (numpy has no bf16 dtype).\n\nBoth are marked with `# patched_bf16_*` comments and re-applied on every server start, so re-downloading the checkpoint won't break things. Newer snapshot revisions (e.g. YAM) have already fixed both upstream; the server will log \"needle not found\" warnings, which are harmless.\n\n### 5. Reach it from the LAN\n\nBound to `0.0.0.0`, the server is reachable on every interface of this host. Health check:\n\n```bash\ncurl http:\u002F\u002F\u003Clan-ip>:8000\u002Fact\n# DROID: {\"status\":\"ok\",\"repo_id\":\"allenai\u002FMolmoAct2-DROID\",\"norm_tag\":\"franka_droid\",...}\n\ncurl http:\u002F\u002F\u003Clan-ip>:8202\u002Fact\n# YAM:   {\"status\":\"ok\",\"repo_id\":\"allenai\u002FMolmoAct2-BimanualYAM\",\"norm_tag\":\"yam_dual_molmoact2\",\"num_cameras\":3,\"state_dim\":14,...}\n```\n\nThe wire format (`json_numpy`-encoded request) is documented in the docstring at the top of each server file. The DROID server expects `external_cam`, `wrist_cam`, `instruction`, `state`; the YAM server expects `top_cam`, `left_cam`, `right_cam`, `instruction`, `state`. Both return `actions` (`(N, D)` float32) and `dt_ms`.\n\n### Firewall \u002F port\n\nIf clients on the LAN can't connect, open the port locally:\n\n```bash\nsudo ufw allow from \u003Csubnet> to any port 8000 proto tcp   # DROID\nsudo ufw allow from \u003Csubnet> to any port 8202 proto tcp   # YAM\n```\n\n## 6. Coming Soon\n\nFull code for training, fine-tuning, deployment, evaluation, and more details are coming soon.\n\n## 7. License\n\nThis model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https:\u002F\u002Fallenai.org\u002Fresponsible-use).\n\n## 8. Model and Hardware Safety\nMolmoAct2 generate robot actions from visual observations and language instructions, but their behavior may vary across embodiments, environments, and hardware configurations. Users should carefully validate model outputs before deployment, especially when operating physical robots or other actuated systems. Where possible, actions should be monitored through interpretable intermediate outputs (adaptive depth map), simulation rollouts, action limits, or other safety checks before execution on hardware. The model’s action space should be bounded by the training data, robot controller limits, and task-specific safety constraints, including limits on speed, workspace, torque, and contact force. Users should follow the hardware manufacturer’s safety guidelines, use appropriate emergency-stop mechanisms, and operate the system only in a safely configured environment with human supervision.\n\n## 9. Contacts\n\nFor questions, collaborations, or support, please contact with:\n```\n{hqfang,duanj1}@cs.washington.edu \n```\nFound a bug or have a feature request? Please open a GitHub issue.\n\n## 10. Citation\n\n```bibtex\n@misc{fang2026molmoact2actionreasoningmodels,\n      title={MolmoAct2: Action Reasoning Models for Real-world Deployment}, \n      author={Haoquan Fang and Jiafei Duan and Donovan Clay and Sam Wang and Shuo Liu and Weikai Huang and Xiang Fan and Wei-Chuan Tsai and Shirui Chen and Yi Ru Wang and Shanli Xing and Jaemin Cho and Jae Sung Park and Ainaz Eftekhar and Peter Sushko and Karen Farley and Angad Wadhwa and Cole Harrison and Winson Han and Ying-Chun Lee and Eli VanderBilt and Rose Hendrix and Suveen Ellawela and Lucas Ngoo and Joyce Chai and Zhongzheng Ren and Ali Farhadi and Dieter Fox and Ranjay Krishna},\n      year={2026},\n      eprint={2605.02881},\n      archivePrefix={arXiv},\n      primaryClass={cs.RO},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.02881}, \n}\n```\n","MolmoAct2 是由艾伦人工智能研究所开发的一系列用于机器人控制和实际部署的动作推理模型。该项目基于 Molmo2-ER 的具身推理视觉-语言基础架构，增加了对机器人状态与动作的建模，并通过流匹配连续动作专家将视觉-语言模型连接起来，实现闭环操作。其核心功能包括提供可继续训练的基础检查点、用于评估和部署的微调机器人策略以及构建 MolmoAct2 和 Molmo2-ER 所需的数据集。此外，MolmoAct2 还提供了FastAPI推理服务器和lerobot工作流程以支持快速集成和真实世界中的应用。此项目特别适用于需要高级动作理解和执行能力的机器人应用场景，如工业自动化、服务机器人等。","2026-06-11 03:31:44","CREATED_QUERY"]