[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79912":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":5,"homepage":8,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":12,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":12,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":15,"lastSyncTime":27,"discoverSource":28},79912,"LaST-R1","CHEN-H01\u002FLaST-R1","CHEN-H01","",null,"Python",101,6,1,0,2,3,12,2.54,"MIT License",false,"main",true,[],"2026-06-12 02:03:55","\u003Cdiv align=\"center\">\n\n# LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning\n\n\u003Ca href=\"https:\u002F\u002Fsiriyep.github.io\u002Flast-r1\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Project&message=Page&color=blue&logo=github&style=for-the-badge\">\u003C\u002Fa> &ensp;\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2604.28192\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Arxiv&message=Paper&color=red&logo=arxiv&style=for-the-badge\">\u003C\u002Fa> &ensp;\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fchenhao01\u002FLaST-R1\u002Ftree\u002Fmain\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Models&message=LaST-R1&color=yellow&logo=huggingface&style=for-the-badge\">\u003C\u002Fa> &ensp;\n\nHao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Shanghang Zhang, Pheng-Ann Heng\n\n![](asset\u002Fmethod.jpg)\n\n\u003Cdiv align=\"left\">\n\n**🤖The Framework of LaST-R1.** (a) LaST-R1 VLA is a unified model that takes visual observations and language instructions as input, where a vision foundation model provides physically grounded latent targets to guide latent CoT reasoning before action generation. (b) During LAPO RL post-training, LaST-R1 interacts with the environment in a closed loop manner, storing latents, actions, and rewards in a rollout buffer for jointly reshaping the latent and action spaces. It further enables adaptive reasoning by learning to emit the `\u003Clatent_end>` token based on predicted probabilities, dynamically adjusting the reasoning horizon across tasks. (c) Through LAPO, LaST-R1 achieves adaptive reasoning lengths across diverse tasks, improving generalization and execution stability.\n\n\u003C\u002Fdiv>\n\n\u003C\u002Fdiv>\n\n## 🔥 News\n\n- [2026\u002F05\u002F06] LaST-R1 is now live on arXiv! The code and model checkpoints for both simulation SFT and RL are publicly available! 🚀 \n\n## 📦 Installation\n\nThe code is built using Python 3.10, we also recommand to use Python above Python 3.10. We require PyTorch >= 2.2.0 and CUDA >= 12.0 (It may run with lower versions, but we have not tested it).\nWe recommend using [Miniconda](https:\u002F\u002Fdocs.conda.io\u002Fen\u002Flatest\u002Fminiconda.html) and create an environment as follows:\n\n```bash\nconda create -n last-r1 python=3.10 -y\nconda activate last-r1\n\npip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\npip install -r requirements.txt\n\n# Clone veRL (recommended to place at the same level as LaST-R1, not inside the LaST-R1 folder)\ngit clone -b v0.2.x https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl.git\ncd \u002Fpath\u002Fto\u002Fverl\n# Replace the installed pyproject.toml file with our custom .\u002Fsetup\u002Fpyproject.toml file.\npip install -e .\n\n# Clone LIBERO (recommended to place at the same level as LaST-R1, not inside the LaST-R1 folder)\ngit clone https:\u002F\u002Fgithub.com\u002FLifelong-Robot-Learning\u002FLIBERO.git\ncd \u002Fpath\u002Fto\u002FLIBERO\npip install -e .\n```\n\n## 🧩 Framework\n\nOur code is built based on [Qwen3-VL](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-VL) and [veRL](https:\u002F\u002Fgithub.com\u002Fverl-project\u002Fverl), organized in the following framework:\n\n- `verl\u002Ftrainer\u002Fmain_ppo.py`: LAPO training entry point that initializes the trainer and launches the training pipeline\n- `verl\u002Ftrainer\u002Fconfig\u002Fppo_trainer.yaml`: default LAPO training configuration (data, optimization, rollout, logging, and runtime settings)\n- `verl\u002Ftrainer\u002Fppo\u002Fray_trainer.py`: coordinates distributed training, including rollout, reward\u002Fadvantage computation, and actor updates\n- `verl\u002Ftrainer\u002Fppo\u002Fcore_algos.py`: core LAPO utilities and algorithm logic used by the trainer (e.g., advantage\u002Freturn and policy optimization helpers)\n- `verl\u002Fworkers\u002Frollout\u002Frob_rollout.py`: handles environment interaction, trajectory collection, and action generation during rollout\n- `verl\u002Fworkers\u002Ffsdp_workers.py`: defines FSDP-based worker abstractions for distributed model execution and training\u002Finference worker behaviors\n- `verl\u002Fworkers\u002Factor\u002Fdp_rob.py`: actor-side training logic, including loss computation, value prediction, and policy updates\n- `verl\u002Fworkers\u002Factor\u002Faction_tokenizer.py`: converts continuous robot actions to discrete tokens and back\n- `transformers\u002Fmodels\u002Fqwen3_vl\u002Fmodeling_qwen3_vl.py`: core Qwen3-VL model implementation, including latent\u002Faction modeling and value head integration\n- `transformers\u002Fintegrations\u002Fsdpa_attention.py`: SDPA attention integration and attention-mask related logic\n\nWe would like to thank [SimpleVLA-RL](https:\u002F\u002Fgithub.com\u002FPRIME-RL\u002FSimpleVLA-RL) 🥳, upon which our repo is built.\n\n## 💡 Usage\n### 🔍 Warmup SFT Model and Post-RL Models\n\nWe release all LIBERO warmup SFT models and post-RL models on [Huggingface 🤗](https:\u002F\u002Fhuggingface.co\u002F) as follows:\n- [last-r1-warmup-libero_spatial-oneshot](https:\u002F\u002Fhuggingface.co\u002Fchenhao01\u002FLaST-R1\u002Ftree\u002Fmain\u002FLaST-R1-Warmup\u002Flast-r1-warmup-libero_spatial-oneshot) | [last-r1-rl-libero_spatial](https:\u002F\u002Fhuggingface.co\u002Fchenhao01\u002FLaST-R1\u002Ftree\u002Fmain\u002FLaST-R1-RL\u002Flast-r1-rl-libero_spatial)\n- [last-r1-warmup-libero_object-oneshot](https:\u002F\u002Fhuggingface.co\u002Fchenhao01\u002FLaST-R1\u002Ftree\u002Fmain\u002FLaST-R1-Warmup\u002Flast-r1-warmup-libero_object-oneshot) | [last-r1-rl-libero_object](https:\u002F\u002Fhuggingface.co\u002Fchenhao01\u002FLaST-R1\u002Ftree\u002Fmain\u002FLaST-R1-RL\u002Flast-r1-rl-libero_object)\n- [last-r1-warmup-libero_goal-oneshot](https:\u002F\u002Fhuggingface.co\u002Fchenhao01\u002FLaST-R1\u002Ftree\u002Fmain\u002FLaST-R1-Warmup\u002Flast-r1-warmup-libero_goal-oneshot) | [last-r1-rl-libero_goal](https:\u002F\u002Fhuggingface.co\u002Fchenhao01\u002FLaST-R1\u002Ftree\u002Fmain\u002FLaST-R1-RL\u002Flast-r1-rl-libero_goal)\n- [last-r1-warmup-libero_10-oneshot](https:\u002F\u002Fhuggingface.co\u002Fchenhao01\u002FLaST-R1\u002Ftree\u002Fmain\u002FLaST-R1-Warmup\u002Flast-r1-warmup-libero_10-oneshot) | [last-r1-rl-libero_10](https:\u002F\u002Fhuggingface.co\u002Fchenhao01\u002FLaST-R1\u002Ftree\u002Fmain\u002FLaST-R1-RL\u002Flast-r1-rl-libero_10)\n\n\n### 🔍 Training and Evaluation on LIBERO\n1. The main training and evaluation script is `scripts\u002Frun_libero_rl_training.sh`. When editing this script, please pay attention to:\n    - `SFT_MODEL_PATH` (warm-up model checkpoint path), `DATA_STATUS` (dataset statistics `.json` used for action normalization), and `ALIGN_PATH` (runtime environment `align.json` config) must be valid paths.\n    - `VAL_ONLY` controls train\u002Feval mode: set `VAL_ONLY=False` for training and `VAL_ONLY=True` for evaluation.\n    - `DATASET_NAME` should match your benchmark split (`libero_spatial`, `libero_object`, `libero_goal`, or `libero_10`), and related parameters (e.g., `max_prompt_length`, `traj_mini_batch_size`, and `*_max_steps`) should be updated consistently.\n    - `NUM_GPUS` affects several effective batch settings (e.g., `actor.ppo_micro_batch_size=$NUM_GPUS` and recommended `val_batch_size`); if you change GPU count, adjust batch sizes accordingly to avoid OOM or shape mismatches.\n\n2. Hardware-specific initialization note: in `verl\u002Fworkers\u002Frollout\u002Frob_rollout.py`, function `env_worker(...)` (around lines 315-316), we apply `initial_state[12] += 0.038` for `libero_spatial` task 5 to avoid environment initialization failure observed on NVIDIA H20 machines. If your hardware does not show this issue, you may remove this workaround based on your setup.\n\n\n3. We evaluate **LaST-R1** on [LIBERO](https:\u002F\u002Flibero-project.github.io\u002Fmain.html) and achieve state-of-the-art performance.\n\n![](asset\u002Flibero.jpg)\n\n## 📜️ License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 📚 BibTeX\n```bibtex\n@article{chen2026last,\n  title   = {LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models},\n  author  = {Chen, Hao and Liu, Jiaming and Yan, Zhonghao and Han, Nuowei and Zhang, Renrui and Gu, Chenyang and Gao, Jialin and Guo, Ziyu and Qian, Siyuan and Wang, Yinxi and others},\n  journal = {arXiv preprint arXiv:2604.28192},\n  year    = {2026}\n}\n```\n","LaST-R1 是一个旨在通过自适应物理潜在推理来增强机器人操作能力的框架。它利用视觉基础模型提供物理基础的潜在目标，以指导基于指令的潜在链式思维推理，并生成动作。在强化学习后训练阶段，LaST-R1 通过与环境进行闭环交互，存储潜在状态、动作和奖励，从而共同重塑潜在空间和动作空间，进一步实现了根据任务动态调整推理长度的能力。该项目适合需要提高机器人在多样化任务中执行稳定性和泛化能力的场景使用。代码基于 Python 3.10 构建，依赖 PyTorch 2.2.0 及 CUDA 12.0 或更高版本。","2026-06-11 03:58:29","CREATED_QUERY"]