[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74127":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":15,"lastSyncTime":30,"discoverSource":31},74127,"Psi0","physical-superintelligence-lab\u002FPsi0","physical-superintelligence-lab","[RSS26'] Welcome to Psi-Zero, a Humanoid VLA towards Universal Humanoid Intelligence.","https:\u002F\u002Fpsi-lab.ai\u002FPsi0\u002F",null,"Python",2627,68,3,2,0,9,24,75,27,27.52,"Other",false,"main",true,[],"2026-06-12 02:03:22","\u003Ch1 align=\"center\">[RSS26'] Ψ₀: An Open Foundation Model \u003Cbr\u002F> Towards Universal Humanoid Loco-Manipulation\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fmedia\u002Fteaser.jpg\" alt=\"Psi0 teaser image\" \u002F>\n\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2603.12263-df2a2a.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.12263)\n[![Static Badge](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-a)](https:\u002F\u002Fpsi-lab.ai\u002FPsi0)\n[![Model](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Model-yellow)](https:\u002F\u002Fhuggingface.co\u002FUSC-PSI-Lab\u002Fpsi-model)\n[![Data](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Data-pink)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUSC-PSI-Lab\u002Fpsi-data)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache2.0-blue.svg)](.\u002FLICENSE)\n\n\u003C\u002Fdiv>\n\nContributors: [Songlin Wei](https:\u002F\u002Fsonglin.github.io\u002F), [Hongyi Jing](https:\u002F\u002Fhongyijing.me\u002F), [Boqian Li](https:\u002F\u002Fboqian-li.github.io\u002F), [Zhenyu Zhao](https:\u002F\u002Fzhenyuzhao.com\u002F), [Jiageng Mao](https:\u002F\u002Fpointscoder.github.io\u002F), [Zhenhao Ni](https:\u002F\u002Fnizhenhao-3.github.io\u002F) , [Sicheng He](https:\u002F\u002Fhesicheng.net\u002F), [Jie Liu](https:\u002F\u002Fjie0530.github.io\u002F), [Xiawei Liu](https:\u002F\u002Fwww.xiaweiliu.com\u002F), Kaidi Kang,  Sheng Zang,[Weiduo Yuan](https:\u002F\u002Fweiduoyuan.com\u002F), [Marco Pavone](https:\u002F\u002Fprofiles.stanford.edu\u002Fmarco-pavone), Di Huang, [Yue Wang](https:\u002F\u002Fyuewang.xyz\u002F)\n\n-------\n\n$\\Psi_0$ is an open vision-language-action (VLA) model for dexterous humanoid loco-manipulation. Our model first learns task semantics and visual representation from large-scale human egocentic videos, and then is post-trained on a smaller amount of real-world teleoperated robot data, to learn general dynamics of the embodiment. \n\nOur foundation model is capable of acquiring new long-horizontal dexterous loco-manipulation skill by fine-tuning using as few as 80 trajectories. ***Our key finding is that scaling the right data in the right way.***\n\nAt the top, the $\\Psi_0$ model consists of two end-to-end trained components: a vision–language backbone (System-2) and a multimodal diffusion transformer (System-1) action expert. The backbone is based on Qwen’s Qwen3-VL-2B-Instruct, which extracts vision–language features from observations and instructions. These features condition a flow-based multimodal diffusion transformer inspired by Stable Diffusion 3. The action expert (≈500M parameters) predicts future whole-body action chunks, enabling efficient fusion of visual, linguistic, and action representations. At the lowest level (System-0), an RL-based tracking controller executes the predicted lower-body action commands, ensuring stable and precise physical control.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fmedia\u002Farch.png\" alt=\"Psi0 model\" \u002F>\n\u003C\u002Fp>\n\n## Table of Contents\n\u003C!-- - [Installation](#-environment-setup) -->\n\u003C!-- - [Pre- & Post- Training](#-) -->\n\u003C!-- - [Data Pre-Processing](#-) -->\n- [Finetune Ψ₀ on Unitree G1 Humanoid Robot](#finetune-psi0)\n  - [Installation](#installation)\n  - [Data Collection](#data-collection)\n  - [Fine-Tuning](#training-real)\n  - [Open-Loop Evaluation](#open-loop-evaluation)\n  - [Deployment](#deployment)\n- [Baselines](#baselines)\n  - [GR00T N1.6](#groot-n16)\n  - [OpenPi π0.5](#openpi-05)\n  - [InternVLA-M1](#internvla-m1)\n  - [H-RDT](#h-rdt)\n  - [EgoVLA](#egovla)\n  - [Diffusion Policy](#diffusion-policy)\n  - [ACT](#act) \n- [Simulation 🚀🚀🚀](#simulation)\n  - [Install SIMPLE](#install-simple)\n  - [Data Generation](#data-generation)\n  - [Fine-Tuning](#training-sim)\n  - [Evaluation in SIMPLE](#evaluation-in-simple)\n- [Reproduce Ψ₀: Pre-Training and Post-Training](#pre-post-train)\n- [Checkpoints](#checkpoints)\n- [Troubleshootings](#troubleshootings)\n- [Citation](#️-citation)\n\n\u003Ca id=\"finetune-psi0\">\u003C\u002Fa>\n## Finetune Ψ₀ on Unitree G1 Humanoid Robot\n\n### Installation\n\nClone the project and change directory to the project root:\n```bash\ngit clone git@github.com:physical-superintelligence-lab\u002FPsi0.git \ncd Psi0\n```\nWe use [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fgetting-started\u002Finstallation\u002F) to manage Python dependencies. Install `uv` if not already installed:\n\n```bash\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n```\n\nSet up the $\\Psi_0$ environment:\n\n> ℹ️ We manage the $\\Psi_0$ environment and all the baselines through `uv` and they all share the same `src\u002F` code.  See [Environment Management](baselines\u002FREADME.md) for more details.\n\n```\nuv venv .venv-psi --python 3.10\nsource .venv-psi\u002Fbin\u002Factivate\nGIT_LFS_SKIP_SMUDGE=1 uv sync \\\n  --group serve \\\n  --group viz \\\n  --group psi \\\n  --index-strategy unsafe-best-match \\\n  --active\nuv pip install flash_attn==2.7.4.post1 --no-build-isolation\n```\n\n> If you want to support `SIMPLE` evaluation, you can use the following commands to install `SIMPLE` along with `Psi0`. See also [quickstart](examples\u002Fquick_start\u002Fpsi.md).\n\n```\ngit submodule update --init --recursive\nGIT_LFS_SKIP_SMUDGE=1 uv sync --all-groups --index-strategy unsafe-best-match --active\nuv pip install flash_attn==2.7.4.post1 --no-build-isolation\nUV_PROJECT_ENVIRONMENT=${pwd}\u002F.venv-psi .\u002Fscripts\u002Finstall_curobo.sh\n```\n\nTest installation, a version number should be displayed.\n```bash\npython -c \"import psi;print(psi.__version__);\"\n```\n\nVerify `SIMPLE` installation\n``` bash\npython -c \"import simple; print(simple.__version__)\"\n```\n\nVerify the shared `lerobot` stack is importable.\n```bash\npython -c \"from psi.data.lerobot.compat import LEROBOT_LAYOUT; print(LEROBOT_LAYOUT)\"\n```\n\n### Data Collection\n> 📂 We open-sourced all the 9 real-world tasks. You can directly download the data and jump to the [Fine-Tuning](#training-real).\n\nSee the detailed teleoperation guide here:  \n[Real-World Deployment Guide](real\u002FREADME.md#real-world-deployment)\n\n\n#### Pre-Processing: Convert Raw Data to LeRobot Format\n\n```\nexport task=Hug_box_and_move\n\nhf download USC-PSI-Lab\u002Fpsi-data \\\n  g1_real_raw\u002F$task.zip \\\n  --local-dir=$PSI_HOME\u002Fdata\u002Freal_teleop_g1 \\\n  --repo-type=dataset\n\nunzip $PSI_HOME\u002Fdata\u002Freal_teleop_g1\u002Fg1_real_raw\u002F$task.zip -d $PSI_HOME\u002Fdata\u002Freal_teleop_g1\u002Fg1_real_raw\u002F$task\n```\nYou should observe similar folder structure:\n\n```\ng1_real_raw\n└── Hug_box_and_move\n    ├── episode_0\n    │   ├── color\n    │   │   ├── frame_000000.jpg\n    │   │   └── ...\n    │   └── data.json\n    └── ...\n```\n\nEdit the task description file with the following format, eg.,\n```\nvim scripts\u002Fdata\u002Ftask_description_dict.json\n```\n```\n{\n  \"Hug_box_and_move\": \"Hug box and move.\"\n}\n```\n\nRun conversion script\n```\npython scripts\u002Fdata\u002Fraw_to_lerobot.py \\\n  --data-root=$PWD\u002Fdata\u002Freal_teleop_g1\u002Fg1_real_raw \\\n  --work-dir=$PWD\u002Fdata\u002Freal \\\n  --repo-id=psi0-real-g1 \\\n  --robot-type=g1 \\\n  --task=$task\n```\n\nCalculate stats\n```\npython scripts\u002Fdata\u002Fcalc_modality_stats.py \\\n  --work-dir=$PSI_HOME\u002Fdata\u002Freal \\\n  --task=$task\n```\n\nCreate **$\\Psi_0$** format stats (simply a copy for now)\n```\ncp $PSI_HOME\u002Fdata\u002Freal\u002F$task\u002Fmeta\u002Fstats.json $PSI_HOME\u002Fdata\u002Freal\u002F$task\u002Fmeta\u002Fstats_psi0.json\n```\n\nNow it's ready to finetune $\\Psi_0$.\n\n> ✈️ If training env is already configured, directly launch training via `scripts\u002Ftrain\u002Fpsi0\u002Ffinetune-real-psi0.sh $task`\n\n\n\u003Ca id=\"training-real\">\u003C\u002Fa>\n### Fine-Tuning\n\n> ✔️ Suppose the data is already collected and processed. Now we can proceed to fine-tune the $\\Psi_0$ model.\n\n>  There is a [known issue](https:\u002F\u002Fgithub.com\u002Fphysical-superintelligence-lab\u002FPsi0\u002Fissues\u002F3) of loading our real data, apply this fix first `python scripts\u002Fdata\u002Fpatch_lerobot_meta.py $PSI_HOME\u002Fdata\u002Freal\u002F$task`\n\n> 📝 Here we illustrate by using the pre-collected data from [Huggingface psi-data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUSC-PSI-Lab\u002Fpsi-data\u002Ftree\u002Fmain\u002Freal).\n\nSet up the environment variables following `.env.sample`. The environment variables will be loaded by the `dotenv.load_dotenv()` in python.\n```\ncp .env.sample .env\n# and edit the following env variables \n# HF_TOKEN=\u003CYOUR HF READ TOKEN>\n# WANDB_API_KEY=\u003CAPI KEY for wandb logging>\n# WANDB_ENTITY=\u003Cwandb entity>\n# PSI_HOME=\u003CPath where PSI cache\u002Fcheckpoint\u002Fdata are located by convention>\n\nsource .env\necho $PSI_HOME\n```\n\nDownload the collected real-world data and extract it:\n```\nexport task=Pick_bottle_and_turn_and_pour_into_cup\n\nhf download USC-PSI-Lab\u002Fpsi-data \\\n  real\u002F$task.zip \\\n  --local-dir=$PSI_HOME\u002Fdata \\\n  --repo-type=dataset\n\nunzip $PSI_HOME\u002Fdata\u002Freal\u002F$task.zip -d $PSI_HOME\u002Fdata\u002Freal\n```\n> 👀 If you want to visualize the episode please refer to the [Data Visualization](examples\u002Fvisualize.md) in the examples.\n\nLaunch the training script:\n```\nscripts\u002Ftrain\u002Fpsi0\u002Ffinetune-real-psi0.sh $task\n```\n\n> 🖥️ You can always change the GPUs, e.g., `CUDA_VISIBLE_DEVICES=0,1,2,3 scripts\u002Ftrain\u002F...`.  \n\n> ⚠️ Please try to maintain a reasonable global batch size = device batch size x number of GPUs x gradient accumulation step. We use global batch size 128 throughout all the real-world and simulation experiments.\n\n\n### Open-Loop Evaluation\n> Follow the steps in `examples\u002Fsimple\u002Fopenloop_eval.ipynb`\n\nLoad the training dataset, and run model inference to see how model fits the training data.\n\n### Deployment\n\n#### Serve $\\Psi_0$ (RTC mode)\n\n```bash\nbash .\u002Fscripts\u002Fdeploy\u002Fserve_psi0-rtc.sh\n```\n\n#### Start $\\Psi_0$ Client (RTC mode)\n\n```bash\nbash .\u002Freal\u002Fscripts\u002Fdeploy_psi0-rtc.sh\n```\n\nFor detailed real-world deployment environment setup, please also refer to the dedicated documentation:\n\n[Real-World Teleoperation Guide](real\u002FREADME.md)\n\n\n## Baselines\n\n\u003Ca id=\"groot-n16\">\u003C\u002Fa>\n\n### GR00T\nInstall the env \n```bash\ncd src\u002Fgr00t; uv sync\n```\n1. training\n```bash\ncd src\u002Fgr00t\n.\u002Fscripts\u002Ftrain_gr00t.sh --dataset-path \u002Fyour\u002Flerobot\u002Fdataset\n```\n2. serving a checkpoint\n```bash\ncd src\u002Fgr00t\n.\u002Fscripts\u002Fdeploy_gr00t.sh\n```\n\n3. openloop eval on trained checkpoint using gt\n```bash\ncd src\u002Fgr00t\n.\u002Fscripts\u002Fopenloop_eval.sh\n```\n\n\u003Ca id=\"openpi-05\">\u003C\u002Fa>\n\n### OpenPI $\\pi_{0.5}$\n\nPlease see more detailed instructions here: [baselines\u002Fpi05](baselines\u002Fpi05\u002FREADME.md).\n\n### InternVLA-M1\nInstall the env \n```bash\ncd src\u002FInternVLA-M1; uv sync --python 3.10\n```\n1. training\n```bash\ncd src\u002FInternVLA-M1\nbash scripts\u002Ftrain_internvla.sh\n```\n2. serving a checkpoint\n```bash\ncd src\u002FInternVLA-M1\n.\u002Fscripts\u002Fdeploy_internvla.sh\n```\n\n### H-RDT\n\nSee quick-start doc for [baseline\u002Fhrdt](examples\u002Fquick_start\u002Fhrdt.md).\n\n### EgoVLA\n\nSee quick-start doc for [baseline\u002Fegovla](examples\u002Fquick_start\u002Fegovla.md).\n\n### Diffusion Policy\nSee dedicated doc here [baseline\u002Fdp](baselines\u002Fdp\u002FREADME.md)\n\n### ACT\nSee dedicated doc here [baseline\u002Fact](baselines\u002Fact\u002FREADME.md)\n\n## Simulation\n\nWe use [SIMPLE](https:\u002F\u002Fgithub.com\u002Fphysical-superintelligence-lab\u002FSIMPLE) to benchmark $\\Psi_0$ and all the baselines.\n\n> 📢 SIMPLE is an easy-to-use humanoid benchmarking simulator built on the MuJoCo physics engine and Isaac Sim rendering.\n\n### Install SIMPLE\n\nCurrently, there are two options to integrate SIMPLE and Psi-0.\n\n#### [Option 1] Install stand-alone SIMPLE (Best for collecting data through teleoperation)\n\n> We recommend to install [SIMPLE](https:\u002F\u002Fgithub.com\u002Fphysical-superintelligence-lab\u002FSIMPLE) on stand alone desktop with a NVIDIA GPU (3090\u002F4090\u002F5090). \n\nPlease refer to the SIMPLE repo [here](https:\u002F\u002Fgithub.com\u002Fphysical-superintelligence-lab\u002FSIMPLE)\n\n#### [Option 2] Install SIMPLE as third-party dependency (Best for evaluting Psi-0 and all baselines)\n\nPlease refer the more details steps [here](examples\u002Fquick_start\u002Fpsi.md).\n\n### Data Generation\n> 📂 We also provide 6 pre-collected whole-body humanoid loco-manipulation tasks at [Huggingface psi-data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUSC-PSI-Lab\u002Fpsi-data\u002Ftree\u002Fmain\u002Fsimple). If you want to use the existing simulation data, jump to the [Fine-Tuning](#training-sim)\n\n#### Motion-Planning Based Data Generation\nPlease refert to the SIMPLE docs.\n\n#### Teleoperation in Simulator\nPlease refert to the SIMPLE docs.\n\n\u003Ca id=\"training-sim\">\u003C\u002Fa>\n### Fine-Tuning\n\n> 👉 You can skip fine-tuning and download our released [checkpoints for SIMPLE](https:\u002F\u002Fhuggingface.co\u002FUSC-PSI-Lab\u002Fpsi-model\u002Ftree\u002Fmain\u002Fpsi0\u002Fsimple-checkpoints).\n\nDownload [SIMPLE task data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUSC-PSI-Lab\u002Fpsi-data\u002Ftree\u002Fmain\u002Fsimple) and extract it:\n\n> 💡 Dont forget `source .env` first before following below commands.\n\n```\nexport task=G1WholebodyXMovePickTeleop-v0\n\nhf download USC-PSI-Lab\u002Fpsi-data \\\n  simple\u002F$task.zip \\\n  --local-dir=$PSI_HOME\u002Fdata \\\n  --repo-type=dataset\n\nunzip $PSI_HOME\u002Fdata\u002Fsimple\u002F$task.zip -d $PSI_HOME\u002Fdata\u002Fsimple\n```\n\n> 👀 If you want to visualize the episode please refer to the [Data Visualization](examples\u002Fvisualize.md) in the examples.\n\nStart training:\n\n> Please [set up the envrionment variables](#training-real) if not done so yet.\n\n```\nbash scripts\u002Ftrain\u002Fpsi0\u002Ffinetune-simple-psi0.sh $task\n```\nThe training will create a run dir which is located under `.runs` in the project root.\nIf your GPU has limited VRAM, set `--train.optimizer-foreach=false` to reduce optimizer-step memory usage at the cost of some speed.\n\n### Evaluation in SIMPLE\n\n#### Serve $\\Psi_0$\n```\nexport run_dir=\u003Cthe run dir here under folder .runs>\nexport ckpt_step=\u003Ccheckpoint step>\nuv run --active --group psi --group serve serve_psi0 \\\n  --host 0.0.0.0 \\\n  --port 22085 \\\n  --run-dir=$run_dir \\\n  --ckpt-step=$ckpt_step \\\n  --action-exec-horizon=24 \\\n  --rtc\n```\n\nRun open-loop evaluation (offline)\n\n[examples\u002Fsimple\u002Fopenloop_eval.ipynb](examples\u002Fsimple\u002Fopenloop_eval.ipynb)\n\n#### Run the Evaluation in SIMPLE\n\nThis `quick-start` guide assumes running SIMPLE on a Stand-alone workstation with NVIDIA GPU.\n\n> We recommend serving the VLA models on a remote server other than locally as IsaacSim is also resource demanding. \n\n> If the server is started on a remote server, run ssh port forward. eg., `ssh -L 22086:localhost:22086 songlin@nebula100`.\n\n> Once port forward is done, open a new terminal to test if server is up `curl -i http:\u002F\u002Flocalhost:22085\u002Fhealth`\n\nDownload eval tasks from [USC-PSI-Lab\u002Fpsi-data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUSC-PSI-Lab\u002Fpsi-data\u002Ftree\u002Fmain\u002Fsimple-eval).\n\n\n```\ncd \u002Fpath\u002Fto\u002FSIMPLE\nexport task=G1WholebodyXMovePickTeleop-v0\n```\n\nDownload eval data and extract it:\n```\nhf download USC-PSI-Lab\u002Fpsi-data \\\n\tsimple-eval\u002F$task.zip \\\n\t--local-dir=data\u002Fevals \\\n\t--repo-type=dataset\n\nunzip data\u002Fevals\u002Fsimple-eval\u002F$task.zip -d data\u002Fevals\u002Fsimple-eval\n```\n\nNow start SIMPLE eval in the SIMPLE environment:\n\n> We provide three domain randomization levels: `level-0`, `level-1`, `level-2` for each task\n\n```\nexport dr=level-0\n```\nWe use two different entrypoints for evaluating different tasks:\n\nset entrypoint and agent to `eval_decoupled_wbc.py` and `psi0_decoupled_wbc` if the evaluating task ends with `Teleop`, which means the task data is collected using teleoperation:\n```\nexport entry=eval_decoupled_wbc.py\nexport agent=psi0_decoupled_wbc\n```\n\nand set entrypoint and agent to `eval.py` and `psi0` if the evaluating task ends with `MP`, which means the task data is generated using CuRobo Motion planning:\n```\nexport entry=eval.py\nexport agent=psi0\n```\n\nLaunch the evaluation script:\n```\npython src\u002Fsimple\u002Fcli\u002F$entry \\\n\tsimple\u002F$task \\\n\t$agent \\\n\t$dr \\\n\t--host=localhost \\\n\t--port=9000 \\\n\t--sim-mode=mujoco_isaac \\\n\t--no-headless \\\n\t--data-format=lerobot \\\n\t--data-dir=data\u002Fevals\u002Fsimple-eval\u002F$task\u002F$dr\n```\n\nThe policy rollout videos will be found in folder `third_party\u002FSIMPLE\u002Fdata\u002Fevals\u002Fpsi0`.\n\n> The evaluation for a single episode could take up to 6~10 minutes because SIMPLE use a synchronous rendering API in IsaacSim. See here for [more explanation](#).\n\n\u003Ca id=\"pre-post-train\">\u003C\u002Fa>\n## Reproduce Ψ₀: Pre-Training and Post-Training\n\n\n### Pre-Train VLM\n\nDownload and cache the official `Qwen\u002FQwen3-VL-2B-Instruct` weights.\n```\nscripts\u002Fpredownload_qwen3vl.py\n```\n\nPre-train on the [EgoDex dataset](https:\u002F\u002Fgithub.com\u002Fapple\u002Fml-egodex)\n\nPre-compute `48 DoF EgoDex action`:\n\n> We re-use the pre-process code from [H-RDT EgoDex Pre-Processing](https:\u002F\u002Fgithub.com\u002FHongzheBi\u002FH_RDT?tab=readme-ov-file#data-preprocessing).\n> 1. Change the paths in `src\u002Fh_rdt\u002Fdatasets\u002Fpretrain\u002Fsetup_pretrain.sh`.\n> 2. Tweak the `NUM_PROCESSES` if on a powerful server, i tried max 64.\n> 3. set `FORCE_OVERWRITE=True` if the processing script is disrupted.\n\n```\nsource src\u002Fh_rdt\u002Fdatasets\u002Fpretrain\u002Fsetup_pretrain.sh\nsource .venv-psi\u002Fbin\u002Factivate\nbash src\u002Fh_rdt\u002Fdatasets\u002Fpretrain\u002Frun_pretrain_pipeline.sh\n```\n\n> [Optinal] If you also want to train `FAST` tokenizer, please refer to [traing FAST](src\u002Ffast\u002FREADME.md).\n\n```\nbash scripts\u002Ftrain\u002Fpsi0\u002Fpretrain-egodex-psi0-fast.sh \n```\n\nPre-train on [humanoid everyday dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUSC-GVL\u002Fhumanoid-everyday)\n\n> Please download the pre-processed HE data here:  `hf download USC-PSI-Lab\u002Fpsi-data HE_RAW.zip --repo-type=dataset`\n\n```\nbash scripts\u002Ftrain\u002Fpsi0\u002Fpretrain-he-psi0-fast.sh\n```\n\nSave the pretrained checkpoints once training is done:\n```\npython scripts\u002Fsave_pretrain_qwen3vl_backbone.py\n```\n\n### Post-Train Action Expert\n\nDownload pre-trained `psi-0` VLM backbone\n```\npython scripts\u002Fdata\u002Fdownload.py \\\n  --repo-id=USC-PSI-Lab\u002Fpsi-model \\\n  --remote-dir=psi0\u002Fpre.fast.1by1.2601091803.ckpt.ego200k.he30k \\\n  --local-dir=$PSI_HOME\u002Fcache\u002Fcheckpoints\u002Fpsi0\u002Fpre.fast.1by1.2601091803.ckpt.ego200k.he30k \\\n  --repo-type=model\n```\n\nPost-train on [humanoid everyday (HE) dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUSC-GVL\u002Fhumanoid-everyday)\n```\nbash scripts\u002Ftrain\u002Fpsi0\u002Fposttrain-he-psi0.sh\n```\n\nSave post-trained action header once training is over\n```\npython scripts\u002Fsave_posttrain_action_expert.py\n```\n\n## Checkpoints\n\nThe released checkpoints on [HuggingFace Psi-Model](https:\u002F\u002Fhuggingface.co\u002FUSC-PSI-Lab\u002Fpsi-model) is listed\n\n| Checkpoint | Description | Remote Directory |\n|---|---|---|\n| $\\Psi_0$ VLM\u003Cbr\u002F>(Baseline) | Pre-trained VLM backbone (EgoDex 200K steps + HE 30K steps) | `psi0\u002Fpre.fast.1by1.2601091803.ckpt.ego200k.he30k` |\n| $\\Psi_0$ Action Expert\u003Cbr\u002F>(Baseline) | Post-trained Action Expert On HE | `psi0\u002Fpostpre.1by1.pad36.2601131206.ckpt.he30k` |\n\nand more variants for ablation studies:\n| Checkpoint | Description | Remote Directory |\n|---|---|---|\n| $\\Psi_0$ VLM\u003Cbr\u002F>(Ablation Study) | Pre-trained VLM backbone only on EgoDex 200K steps | `psi0\u002Fpre.fast.egodex.2512241941.ckpt200k` |\n| $\\Psi_0$ VLM\u003Cbr\u002F>(Ablation Study) | Pre-trained VLM backbone only on HE 48K steps  | `psi0\u002Fpre.abl.only.he.2512311516.48k` |\n| $\\Psi_0$ VLM\u003Cbr\u002F>(Ablation Study) | Pre-trained VLM backbone only on 10% EgoDex  | `psi0\u002Fpre.abl.ego.10per.2602021632.46k` |\n| $\\Psi_0$ Action Expert\u003Cbr\u002F>(Ablation Study) | Post-train on HE by picking pre-trained variant `psi0\u002Fpre.abl.only.he.2512311516.48k` | `psi0\u002Fpostpre.abl.only.he.2602050012` |\n| $\\Psi_0$ Action Expert\u003Cbr\u002F>(Ablation Study) | Post-train on HE by picking pre-trained variant `psi0\u002Fpre.abl.ego.10per.2602021632.46k` | `psi0\u002Fpostpre.abl.ego.10per.2602050006` |\n\n\nDownload the selected models\n\n> Edit `.env` to use `HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com` if needed.\n\n```\npython scripts\u002Fdata\u002Fdownload.py \\\n  --repo-id=USC-PSI-Lab\u002Fpsi-model \\\n  --remote-dir=\u003CRemote Directory> \\\n  --local-dir=$PSI_HOME\u002Fcache\u002Fcheckpoints\u002F\u003CRemote Directory> \\\n  --repo-type=model\n```\n\n## Troubleshootings\n\n1. Lerobot dataset issues: `stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Column`\n\nThis usually means the environment is still on the legacy PSI `lerobot` stack. Resync the PSI env so it uses the\nsame `lerobot` and `datasets` versions as SIMPLE, then verify the import layout:\n\n```bash\nsource .venv-psi\u002Fbin\u002Factivate\nuv sync --group psi --active\npython -c \"from psi.data.lerobot.compat import LEROBOT_LAYOUT; print(LEROBOT_LAYOUT)\"\n```\n\n2. Fail to install `evdev`, `src\u002Fevdev\u002Finput.c:10:10: fatal error: Python.h: No such file or directory`\n\n```\nsudo apt update\nsudo apt install -y python3-dev python3-venv build-essential \\\n    linux-headers-$(uname -r)\n```\n\n3. RuntimeError: Could not load libtorchcodec. Likely causes ...\n```\nsudo apt-get install ffmpeg\n```\n\n4. ImportError: cannot import name 'Deprecated' from 'wandb.proto.wandb_telemetry_pb2' \n\nre-install `wandb`\n```\nsource .venv-pusht\u002Fbin\u002Factivate\nuv pip uninstall wandb\nuv pip install wandb==0.18.0\n```\n\n5. support `sm_120` on newer GPUs like `5090` or `RTX 6000`, UserWarning: Ignoring invalid value for boolean flag CUDA_LAUNCH_BLOCKING: truevalid values are 0 or 1.\n\nupdate `torch` and `flash-attn`\n```\nuv pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\nuv pip install flash-attn --no-build-isolation\n```\n\n6. Failed to download and build `lerobot ... `, Use `git lfs logs last` to view the log.\n\n```\nGIT_LFS_SKIP_SMUDGE=1 uv ...\n```\n## Citation\n\n```\n@article{wei2026psi0,\n  title={{$\\Psi_0$}: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation},\n  author={Wei, Songlin and Jing, Hongyi and Li, Boqian and Zhao, Zhenyu and Mao, Jiageng and Ni, Zhenhao and He, Sicheng and Liu, Jie and Liu, Xiawei and Kang, Kaidi and others},\n  journal={arXiv preprint arXiv:2603.12263},\n  year={2026}\n}\n```\n\n## License\n\nThis project is licensed under the Apache License 2.0.\n\nSee the [LICENSE](LICENSE) file for details.\n","Psi-Zero 是一个面向通用人形智能的视觉-语言-动作（VLA）模型，旨在实现灵活的人形机器人操控。该项目通过大规模人类第一视角视频学习任务语义和视觉表示，并利用少量真实世界遥控机器人数据进行后训练，以掌握实体动态。其核心技术特点在于采用基于Qwen3-VL-2B-Instruct的视觉-语言骨干网络和多模态扩散变换器，能够仅用80条轨迹微调即可获得新的长距离灵巧操作技能。此项目适用于需要高级别自主性和适应性的机器人应用场景，如复杂环境下的物体操作与移动任务。","2026-06-11 03:48:55","high_star"]