[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79672":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":14,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":13,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":25,"discoverSource":26},79672,"HEX","Open-X-Humanoid\u002FHEX","Open-X-Humanoid","HEX is a whole-body vision-language-action framework for full-sized humanoid robots. ",null,"Jupyter Notebook",490,18,3,2,0,1,348,3.84,false,"main",[],"2026-06-12 02:03:54","\u003Ch1 align=\"center\">HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation\u003C\u002Fh1>\n\n\u003Cdiv align=\"center\">\n\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.07993\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2604.07993-b31b1b.svg\" alt=\"arXiv\">\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhex-humanoid.github.io\u002F\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-2f80ed.svg\" alt=\"Project Page\">\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FX-Humanoid\u002FHEX-Model\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Model-ffcc4d.svg?logo=huggingface&logoColor=black\" alt=\"Model\">\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FX-Humanoid\u002FHEX-Datasets\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Data-ffcc4d.svg?logo=huggingface&logoColor=black\" alt=\"Data\">\n\u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fteaser.png\" alt=\"HEX teaser image\" \u002F>\n\u003C\u002Fp>\n\nHEX is a whole-body vision-language-action framework for full-sized humanoid robots. It combines a Qwen-VL backbone, a Unified Proprioceptive Predictor (UPP), and a flow-matching action head to predict continuous future actions.\nThe key idea of HEX is to align heterogeneous humanoid states into shared body-part slots and learn predictive body dynamics from cross-embodiment humanoid data. This enables the policy to transfer across different humanoid platforms and perform long-horizon whole-body manipulation.\nDuring deployment, HEX directly predicts arm, hand, and waist actions, while providing high-level commands to a low-level RL-based whole-body controller for generating leg actions. This design enables coordinated and stable humanoid manipulation.\n\n## News\n\n- ✅ **2026\u002F05\u002F17**: Pretraining and fine-tuning code of VLA has been released.\n\n## Installation\n\nFirst, git clone this repo and `cd` into it.\n\n```bash\n# clone project\ngit clone https:\u002F\u002Fgithub.com\u002FOpen-X-Humanoid\u002FHEX.git\ncd HEX\n```\n\nThen create python\u002Fpytorch env.\n\n```bash\n# crerate conda environment\nconda create -n hex python=3.10 -y\nconda activate hex\n\n# Install env dependencies\nsudo apt update\nsudo apt install libegl1-mesa-dev libglu1-mesa\n\n# Install requirements\npip install -r requirements.txt\n\n# Install FlashAttention2\npip install flash-attn --no-build-isolation\n\n# Install HEX\npip install -e .\n```\n\nIf `flash-attn` fails to install correctly, you can run\n\n```bash\npython hex\u002Futils\u002Ftest_flash_attn.py\n```\n\nto check the versions of PyTorch, CUDA, and the libstdc++ ABI.\nThen, manually download a compatible wheel from the [flash-attn release](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases).\nWe use version 2.7.3. However, for newer GPUs (e.g., NVIDIA RTX 5090), you should install the latest available release (e.g., version 2.8.3) to ensure compatibility.\nExample:\n\n```bash\nwget https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.7.3\u002Fflash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\npip install flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\n```\n\n\n## Quick Start\n\nWe release the pretrained HEX checkpoint on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FCognition2ActionLab\u002FHEX-model).\n\n| Description | Params | Link |\n|:-----------:|:------:|:----:|\n| HEX | 2.4B | 🤗 [HEX-model](https:\u002F\u002Fhuggingface.co\u002FCognition2ActionLab\u002FHEX-model) |\n\n### Download HEX Checkpoints\n\nTo download the HEX checkpoint, first modify the target download path in [`hex\u002Futils\u002Fdownload_model_hex.py`](hex\u002Futils\u002Fdownload_model_hex.py), and then run:\n\n```bash\npython hex\u002Futils\u002Fdownload_model_hex.py\n```\n\n### Download the Base VLM\n\nBefore running inference, please also download the Qwen3-VL base model:\n\n```bash\npython hex\u002Futils\u002Fdownload_model_qwen.py\n```\n\nAfter downloading Qwen3-VL, update the `framework.qwenvl.base_vlm` field in the `config.yaml` file of the downloaded HEX checkpoint to your local Qwen3-VL path.\n\n### Run Inference\n\nOnce both the HEX checkpoint and the Qwen3-VL model are prepared, follow [`notebooks\u002Feval_model.ipynb`](notebooks\u002Feval_model.ipynb) to run model inference.\n\n\n\n## Data\n\n### Data Source\n\nWe open-source the 8 real-world evaluation task datasets collected in HEX, which can be directly used for fine-tuning.\nThe full training data used in this project consists of the following sources:\n\n| Embodiment \u002F Platform | Source | Dataset |\n|:----------------------|:-------|:--------|\n| Tienkung Series | HEX | 🤗 [HF Link](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FCognition2ActionLab\u002Feai_real_world) |\n| Unitree G1 | [Humanoid Everyday](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.08807) | 🤗 [HF Link](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUSC-PSI-Lab\u002FHumanoid-Everyday-G1) | \n| AgiBot-to-Unitree G1 | [AgiBot World Colosseo](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.06669) & [TrajBooster](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.11839) | 🤗 [HF Link](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fl2aggle\u002FAgibot2UnitreeG1Retarget) |\n| Unitree H1 | [Humanoid Everyday](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.08807) | 🤗 [HF Link](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUSC-PSI-Lab\u002FHumanoid-Everyday-H1) |\n| Leju Kuavo | [RoboCOIN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.17441) | 🤗 [HF Link](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FRoboCOIN\u002Frobocoin) |\n\nTo download all datasets, run:\n\n```bash\nbash scripts\u002Fdownload_datasets.sh\n```\n\nSince HEX still follows the LeRobot v2.1 data format, each dataset should contain a corresponding `modality.json`.  \nFor each Leju Kuavo dataset, please copy `examples\u002Freal_world\u002Fmodality_leju\u002Fmodality.json` to `\u003Cleju_dataset>\u002Fmeta\u002Fmodality.json`.\n\nThe overall data structure is as follows:\n\n```text\neai_real_world\u002F\n├── dvt217_carry_boxes_and_avoid_obstacles_260113_lerobot\n├── ...\n├── evt12_carry_box_and_tidy_table_260318_lerobot\n├── ...\n├── g1_add_the_seasoning_to_the_pot\n├── ...\n├── g1_humanoid_everyday\n├── h1_humanoid_everyday\n├── leju_robot_box_storage_parcel\n└── ...\n```\n\n\n### Data Collection\n\nDue to commercial restrictions, we are unable to release the data collection pipeline used for the Tienkung series robots.\n\nFor users interested in collecting data on Unitree G1, we recommend referring to the following open-source data collection pipelines:\n\n- [OpenTrajBooster](https:\u002F\u002Fgithub.com\u002FOpenHelix-Team\u002FOpenTrajBooster), which uses a VR headset and handheld joysticks for full-body teleoperation.\n- [Psi0](https:\u002F\u002Fgithub.com\u002Fphysical-superintelligence-lab\u002FPsi0\u002Ftree\u002Fmain\u002Freal): uses a PICO VR headset with controllers, along with a waist tracker and foot trackers for full-body teleoperation.\n\n\n\n## Pretraining\n\nYou can download our [pretrained HEX model](https:\u002F\u002Fhuggingface.co\u002FCognition2ActionLab\u002FHEX-model) and skip this step if you only want to run inference or evaluation.\n\nBefore pretraining, please download the Qwen3-VL backbone:\n\n```bash\nbash scripts\u002Fdownload_models.sh\n```\n\nThen, update the dataset paths in the following files to match your local directory structure:\n\n- [`hex\u002Fdataloader\u002Fgr00t_lerobot\u002Fmixtures.py`](hex\u002Fdataloader\u002Fgr00t_lerobot\u002Fmixtures.py), Line 9\n- [`hex\u002Fdataloader\u002Fgr00t_lerobot\u002Fdata_config.py`](hex\u002Fdataloader\u002Fgr00t_lerobot\u002Fdata_config.py), Line 1299\n\nNext, modify the following fields in [`scripts\u002Fpretrain_hex.sh`](scripts\u002Fpretrain_hex.sh):\n\n- `base_vlm`: path to your downloaded Qwen3-VL model\n- `data_root_dir`: path to your local dataset directory\n- `dataset_name`: the dataset mixture name, which should be consistent with the settings in [`hex\u002Fdataloader\u002Fgr00t_lerobot\u002Fmixtures.py`](hex\u002Fdataloader\u002Fgr00t_lerobot\u002Fmixtures.py)\n\nFinally, start pretraining with:\n\n```bash\nbash scripts\u002Fpretrain_hex.sh\n```\n\n\n## Fine-tuning\n\nAfter obtaining the [pretrained HEX model](https:\u002F\u002Fhuggingface.co\u002FCognition2ActionLab\u002FHEX-model), you can further fine-tune HEX on downstream datasets.\n\nBefore fine-tuning, please modify the following fields in [`scripts\u002Ffine_tune_hex.sh`](scripts\u002Ffine_tune_hex.sh):\n\n- `base_vlm`: path to your Qwen3-VL backbone\n- `data_root_dir`: path to your local dataset directory\n- `dataset_name`: name of the downstream dataset mixture, which should be consistent with the settings in [`hex\u002Fdataloader\u002Fgr00t_lerobot\u002Fmixtures.py`](hex\u002Fdataloader\u002Fgr00t_lerobot\u002Fmixtures.py)\n- `pretrained_models_path`: path to the pretrained HEX checkpoint\n\nThen, start fine-tuning with:\n\n```bash\nbash scripts\u002Ffine_tune_hex.sh\n```\n\n## Depolyment\n\nDue to commercial restrictions, the low-level RL-based whole-body controller used for the Tienkung series robots is not open-sourced. However, we provide a sample deployment interface in [`examples\u002Freal_world`](examples\u002Freal_world).\n\nIf you want to deploy your own model on Unitree G1, you may refer to the following open-source projects:\n\n- [OpenTrajBooster](https:\u002F\u002Fgithub.com\u002FOpenHelix-Team\u002FOpenTrajBooster): uses [HOMIE](https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FOpenHomie) as the low-level RL-based whole-body controller.\n- [Psi0](https:\u002F\u002Fgithub.com\u002Fphysical-superintelligence-lab\u002FPsi0\u002Ftree\u002Fmain\u002Freal): uses [AMO](https:\u002F\u002Fgithub.com\u002FOpenTeleVision\u002FAMO) as the low-level RL-based whole-body controller.\n\nWhen training your own low-level controller, please make sure that the command space output by the high-level VLA policy matches the input space expected by the low-level controller. The dataset construction process should also follow the same interface for consistent training and deployment.\n\n\n## Simulation\n\nThanks to the cross-embodiment capability of VLA models, HEX can also be evaluated in simulation environments such as LIBERO.\n\nFirst, download the LIBERO datasets:\n\n```bash\npython hex\u002Futils\u002Fdownload_dataset_libero.py --base_dir \u002Fyour\u002Fdataset\u002Fpath\n```\n\nThen, replace the `modality.json` file for each LIBERO suite with the provided template in [examples\u002FLIBERO\u002Fmodality.json](examples\u002FLIBERO\u002Fmodality.json).\n\nNext, modify the following fields in [`scripts\u002Flibero\u002Ftrain_hex_libero.sh`](scripts\u002Flibero\u002Ftrain_hex_libero.sh):\n\n- `base_vlm`: path to your Qwen3-VL backbone\n- `dataset_name`: name of the LIBERO dataset mixture\n- `data_root_dir`: path to your local LIBERO dataset directory\n\nThen start training with:\n\n```bash\nbash scripts\u002Flibero\u002Ftrain_hex_libero.sh\n```\n\nFor evaluation, modify the following fields in [`scripts\u002Flibero\u002Feval_libero.sh`](scripts\u002Flibero\u002Feval_libero.sh):\n\n- `ckpt_root`: root directory of the trained checkpoint\n- `ckpt_path`: relative path to the checkpoint file\n\nThen run:\n\n```bash\nbash scripts\u002Flibero\u002Feval_libero.sh\n```\n\n\n## Citation\n\n```\n@article{bai2026hex,\n  title={HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation},\n  author={Bai, Shuanghao and Li, Meng and Lv, Xinyuan and Wang, Jiawei and Wang, Xinhua and Liao, Fei and Hou, Chengkai and Gu, Langzhe and Zhou, Wanqi and Wu, Kun and others},\n  journal={arXiv preprint arXiv:2604.07993},\n  year={2026}\n}\n```\n\n## Ackwnledgemments\n\nThis project draws inspiration from and builds upon several notable open-source projects, including: [StarVLA](https:\u002F\u002Fgithub.com\u002FstarVLA\u002FstarVLA), [Isaac-GR00T](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FIsaac-GR00T), [HiMoE-VLA](https:\u002F\u002Fgithub.com\u002FZhiyingDu\u002FHiMoE-VLA), [LeRobot](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flerobot), [Humanoid Everyday](https:\u002F\u002Fgithub.com\u002Fphysical-superintelligence-lab\u002FHumanoid-Everyday), [RoboCOIN](https:\u002F\u002Fgithub.com\u002FFlagOpen\u002FRoboCOIN), [AgiBot-World](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FAgiBot-World), and [OpenTrajBooster](https:\u002F\u002Fgithub.com\u002FOpenHelix-Team\u002FOpenTrajBooster).\n","HEX是一个面向全尺寸人形机器人的全身视觉-语言-动作框架。它集成了Qwen-VL主干、统一本体感知预测器（UPP）和流匹配动作头，以预测连续的未来动作。该项目通过将异构的人形状态对齐到共享的身体部位槽，并从跨实体的人形数据中学习预测身体动力学，从而实现策略在不同人形平台之间的迁移，并执行长时间范围的全身操作。部署时，HEX直接预测手臂、手部和腰部的动作，同时向基于强化学习的全身控制器提供高层指令以生成腿部动作，确保协调且稳定的操纵表现。适合需要复杂人机交互与操作的场景，如家庭服务机器人、工业协作机器人等。","2026-06-11 03:58:12","CREATED_QUERY"]