[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72508":3},{"id":4,"name":5,"fullName":6,"owner":5,"repo":5,"description":7,"homepage":8,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":15,"starSnapshotCount":15,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},72508,"starVLA","starVLA\u002FstarVLA","StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing","https:\u002F\u002Fstarvla.github.io\u002F",null,"Python",2789,347,9,16,0,54,154,474,162,29.62,"Other",false,"starVLA_dev",true,[26,27,28,29,30],"robotic-foundation-model","robotics","vision-language-action-model","vla","vlm","2026-06-12 02:03:04","\u003Cp align=\"center\">\n\u003Cimg src=\"assets\u002Flogo.svg\" alt=\"StarVLA Logo\" width=\"10%\">\n\u003C\u002Fp>\n\u003Ch1 align=\"center\">StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing\u003C\u002Fh1>\n\n\u003Cp align=\"center\">An open-source research platform for integrating and exploring cutting-edge technologies for generalist robots.\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fstarvla.github.io\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject%20Page-starvla.github.io-blue?style=for-the-badge&logo=github\" alt=\"Project Page\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FStarVLA\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-Model%20%26%20Data-orange?style=for-the-badge&logo=huggingface\" alt=\"Model & Data on Hugging Face\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.05014\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2604.05014-red?style=for-the-badge&logo=arxiv\" alt=\"Technical Report\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FstarVLA\u002FstarVLA\u002Fissues\u002F64#issuecomment-3715403845\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-加入讨论群-brightgreen?style=for-the-badge&logo=wechat\" alt=\"WeChat\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n> **📢 Citation Update:** Our technical report is now on arXiv ([2604.05014](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.05014)). We kindly invite you to use the [updated BibTeX](#citation) for any ongoing or future citations. If you have already cited StarVLA in a previous version of your work, we would greatly appreciate it if you could update the citation entry in your camera-ready or future revisions. Thank you for your understanding and support! 🙏\n\n---\n\nIn StarVLA (also a pun on \"start VLA\" ),  each functional component (model, data, trainer, config, evaluation, etc.) follows a top-down, intuitive separation and high-cohesion, low-coupling principle, enabling plug-and-play design, rapid prototyping, and independent debugging.\n\n## News\n\n> **⚠️ Branch notice:** The `starVLA_dev` branch is where we actively merge new features and may be temporarily unstable. For verified results, use the stable `starVLA` branch. Thanks to StarVLA's low-coupling design, switching between branches is painless. We encourage trying `starVLA_dev` and welcome PRs if you spot any issues!\n\n> **💡 Tip:** Files under any `**\u002Fbar\u002F` directory are git-ignored, so you can place your custom scripts there (e.g., `examples\u002FLIBERO\u002Ftrain_files\u002Fbar\u002Fmy_train.sh`) without polluting the repo.\n\n\n\n**[2026\u002F04\u002F09]** 🚀 unified **multi-benchmark co-training** example (combining LIBERO, SimplerEnv, RoboTwin, VLA-Arena, etc.) is coming soon. Stay tuned!\n\n**[2026\u002F05\u002F01]** 🔥 We now provide a step-by-step guide for [integrating your own robot \u002F dataset into StarVLA](docs\u002Fintegrate_your_dataset.md).\n\n**[2026\u002F05\u002F01]** 🔥 We are building [agent skills](docs\u002Fagent_skills) to make StarVLA a powerful substrate for AI coding agents — we have verified that GitHub Copilot (Claude Opus 4.7) can autonomously integrate [examples\u002FRobocasa_365](examples\u002FRobocasa_365) and [examples\u002FRoboChallenge_table30v2](examples\u002FRoboChallenge_table30v2) from scratch. Going forward, StarVLA will be continuously optimised to be equally easy to use for humans and code agents.\n\n**[2026\u002F05\u002F01]** 🙏 Special thanks to the [AllenAI](https:\u002F\u002Fgithub.com\u002Fallenai) team for providing optimised Docker environments and evaluation-acceleration support via [vla-evaluation-harness](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fvla-evaluation-harness)! If you need to speed up large-scale VLA benchmark evaluation, we recommend checking it out.\n\n\n\n**[2026\u002F04\u002F18]** 🔥 StarVLA now supports [DOMINO](examples\u002FDOMINO), a dynamic manipulation benchmark for moving objects and time-varying scenes. Original DOMINO repository is [here](https:\u002F\u002Fgithub.com\u002FH-EmbodVis\u002FDOMINO).\n\n**[2026\u002F04\u002F09]** 🎯 Thanks to the [RLinf](https:\u002F\u002Frlinf.readthedocs.io) team, StarVLA now supports **RL post-training**! Check out the [StarVLA × RLinf tutorial](https:\u002F\u002Frlinf.readthedocs.io\u002Fen\u002Flatest\u002Frst_source\u002Fexamples\u002Fembodied\u002Fstarvla.html) to get started.\n\n**[2026\u002F04\u002F09]** 🔥 **WM4A (World Model for Action)** is now integrated! Use pretrained video-generation DiT models (Cosmos-Predict2, Wan2.2) as backbones for action prediction. See [docs\u002FWM4A.md](docs\u002FWM4A.md) for architecture details and training instructions. Pretrained checkpoints are available at the [StarVLA\u002Fworld-model-to-vla](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FStarVLA\u002Fworld-model-to-vla) HuggingFace collection.\n\n\n**[2026\u002F03\u002F29]** 🔥 Thanks to the [ABot-M0](https:\u002F\u002Fgithub.com\u002Famap-cvlab\u002FABot-Manipulation) team for providing the [pre-trained weights](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002Famap_cvlab\u002FABot-M0-Pretrain). For `Qwen3-VL 4B`, you can reload the `qwen_vl_interface` module in various frameworks!\n\n**[2026\u002F03\u002F19]** 🔥 StarVLA now provides a complete real-robot development case with [Franka robot examples](https:\u002F\u002Fgithub.com\u002FstarVLA\u002FstarVLA\u002Fpull\u002F198)!\n\n**[2026\u002F03\u002F03]** 🔥 We now support [**Qwen3.5** as a backbone for VLA](https:\u002F\u002Fgithub.com\u002FstarVLA\u002FstarVLA\u002Fpull\u002F172) — the fastest integration in the community ⚡\nWith more model size options: **0.8B, 2B, 4B, and 9B**! Build your VLA flexibly on top of native multimodal models!\n\n**[2026\u002F01\u002F29]** 🔥 StarVLA [Training Efficiency Report](https:\u002F\u002Fgithub.com\u002FstarVLA\u002FstarVLA\u002Fissues\u002F158) & [Training Curves](https:\u002F\u002Fgithub.com\u002FstarVLA\u002FstarVLA\u002Fissues\u002F68) released!\nTraining configs and efficiency benchmarks for community reference.\n\n**[2026\u002F01\u002F29]** Calvin benchmark experiments were conducted by the UNT team. For inquiries, please contact Zhijie Song (1600013008@pku.edu.cn) or Feng Yan (bphengyan@163.com).\n\n**[2025\u002F12\u002F25]** We've simultaneously established pipelines for [Behavior-1K](examples\u002FBehavior), [RoboTwin 2.0](examples\u002FRobotwin), and CALVIN. We'd love to collaborate and share baseline results for more benchmarks with the community!\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Prior Timeline\u003C\u002Fb>\u003C\u002Fsummary>\n\n**[2025\u002F12\u002F25]**  We've released RoboCasa evaluation support, which was trained **without pretraining and reached SOTA performance**. Check out more details in [examples\u002FRobocasa_tabletop](examples\u002FRobocasa_tabletop).\n\n**[2025\u002F12\u002F15]** Completed a release regression check to ensure the public code runs smoothly. Routine updates—including recent support for the LeRobot dataset v3.0 and DeepSpeed ZeRO-3—will continue to appear in the [🚧 Daily Development Log](https:\u002F\u002Fgithub.com\u002FstarVLA\u002FstarVLA\u002Fissues\u002F64#issue-3727060165).\n\n**[2025\u002F12\u002F09]** Became the first open-source repository to support training with [train your vlm](starVLA\u002Ftraining\u002Ftrain_starvlm.py), [train your vla](starVLA\u002Ftraining\u002Ftrain_starvla.py), and [train your vla with vlm](starVLA\u002Ftraining\u002Ftrain_starvla_cotrain.py). Check out how to co-train your VLA with multimodal data in [examples\u002FCoTrainVLM](examples\u002FCoTrainVLM\u002FREADME.md).\n\n**[2025\u002F11\u002F12]** We now support [Florence-2](https:\u002F\u002Fgithub.com\u002Fanyantudre\u002FFlorence-2-Vision-Language-Model) as a smaller VLM for resource-constrained development. StarVLA can now run on a single A100 GPU. See the [🚀Train with a smaller VLM](docs\u002Ffaq.md#how-to-train-with-a-smaller-vlm) section for more details.\n\n**[2025\u002F10\u002F30]:** We released the LIBERO Training & Evaluation README. Results are very promising. More details are in [examples\u002FLIBERO](examples\u002FLIBERO).\n\n**[2025\u002F10\u002F25]:** We fixed several script links and so everything is smoother now. Thanks to the community for the feedback.\n\n\u003C\u002Fdetails>\n\n## Overview and Key Features\n\n![Overview of the StarVLA framework](assets\u002FstarVLA_overview.png)\n*Overview of the StarVLA framework. StarVLA organises VLA research as a\ncomposable stack: a shared training infrastructure, pluggable foundation-model\nbackbones (VLM \u002F world model), interchangeable action heads (FAST, OFT,\nflow-matching π, GR00T-style dual-system), and benchmark-agnostic deployment\nhooks. Each axis is decoupled, so a new framework variant typically reduces\nto swapping the backbone or the action head while reusing the rest.*\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Data flow diagram (click to expand)\u003C\u002Fb>\u003C\u002Fsummary>\n\n![StarVLA data flow](assets\u002FstarVLA_dataflow.png)\n*Data flow view of StarVLA. A unified, modular pipeline connects heterogeneous\ndata sources, pluggable dataloaders, and flexible data representations through\na standardised model-forwarding interface, enabling end-to-end training and\ndeployment.*\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Various VLA Frameworks\u003C\u002Fb>\u003C\u002Fsummary>\n\nAll variants share the same data interface and infrastructure; only the action head differs.\n\n- [x] **StarVLA-FAST**: Autoregressive discrete action tokens via a fast tokenizer (à la π₀-fast).\n- [x] **StarVLA-OFT**: Parallel continuous action decoding with an MLP head (à la OpenVLA-OFT\u002FEO).\n- [x] **StarVLA-PI**: Flow-Matching action expert for diffusion-based continuous actions (à la π₀).\n- [x] **StarVLA-GR00T**: Dual-system architecture — VLM as System 2, Flow-Matching as System 1 (à la GR00T).\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"assets\u002Fstarvla_variants.png\" alt=\"StarVLA variant architectures\" width=\"95%\">\n\u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Various Training Recipes\u003C\u002Fb>\u003C\u002Fsummary>\n\nEvery recipe is paradigm-agnostic and applies uniformly to all supported frameworks.\n\n- [x] Supervised fine-tuning (SFT)\n- [x] Multimodal Multi-objectives Co-Training\n- [x] Cross-embodiment Co-Training\n- [ ] Reinforcement Learning Adaptation\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Broad Benchmark Integration\u003C\u002Fb>\u003C\u002Fsummary>\n\nAchieve **state-of-the-art (SOTA) performance** on a variety of benchmarks, as follows:\n\n- [x] **SimplerEnV**\n- [x] **LIBERO**\n- [x] **LIBERO-plus**\n- [x] **Robocasa-GR1**\n- [x] **Robocasa365**\n- [x] **RoboTwin 2.0**\n- [x] **DOMINO**\n- [x] **BEHAVIOR**\n- [x] **Calvin**\n- [ ] **SO101**\n- [ ] **RLBench**\n\n\u003C\u002Fdetails>\n\n---\n\n## 🎒 Quick Start\n\n> **📖 New to StarVLA?** Check out our step-by-step [**Quick Start Guide**](docs\u002FstarVLA_guideline.md) — a complete walkthrough from installation to training to evaluation using the LIBERO benchmark.\n\n---\n\n## Benchmark Results\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Results on LIBERO\u003C\u002Fb>\u003C\u002Fsummary>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"assets\u002Fstarvla_LIBERO.png\" alt=\"LIBERO modules\" width=\"84%\">\n\u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Results on SimplerEnv\u003C\u002Fb>\u003C\u002Fsummary>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"assets\u002Fstarvla_simpleEnv.png\" alt=\"SimplerEnv modules\" width=\"95%\">\n\u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Results on RoboCasa GR1\u003C\u002Fb>\u003C\u002Fsummary>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"assets\u002Fstavla_RoboCasa.png\" alt=\"RoboCasa modules\" width=\"94%\">\n\u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Results on Calvin_D_D\u003C\u002Fb>\u003C\u002Fsummary>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"assets\u002Fcalvin.png\" alt=\"Calvin_D_D modules\" width=\"84%\">\n\u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\nWe have more results for RoboCasa, RoboTwin 2.0, Behavior-1k, Calvin. See our [🍀 Overleaf](https:\u002F\u002Fwww.overleaf.com\u002Fread\u002Fqqtwrnprctkf#d5bdce), which continuously presents our real-time experimental results.\n\n---\n\n## Model Zoo\n\nSee the full list of released models and checkpoints in [docs\u002Fmodel_zoo.md](docs\u002Fmodel_zoo.md).\n\n---\n\n## Start Building Your VLA Like Lego!\n👇 StarVLA achieves \"Lego-like\" development via the following designs:\n\u003Cdetails>\n\u003Csummary>\u003Cb>1. Smoke test any submodule\u003C\u002Fb>\u003C\u002Fsummary>\n\nStarVLA emphasizes a modular model design. Each major framework file can be run standalone for rapid debugging and smoke-testing your code. For example:\n\n```bash\n# model\npython starVLA\u002Fmodel\u002Fframework\u002FVLM4A\u002FQwenOFT.py --config_yaml starvla_cotrain_oxe.yaml\n# dataloader\npython starVLA\u002Fdataloader\u002Flerobot_datasets.py --config_yaml starvla_cotrain_oxe.yaml\n```\n\nNote: `starVLA\u002Fmodel\u002Fframework\u002FVLM4A\u002Fyourframework.py` is the single external API surface of the model; it should mirror (be structurally isomorphic to) the framework diagram in your paper.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>2. Explicit model boundaries\u003C\u002Fb>\u003C\u002Fsummary>\n\nStarVLA follows top‑down decomposition and the principle of high cohesion & low coupling.\n\nFor example:\n- Dataloader\n  - Returns a raw, model‑agnostic dict only; no model‑specific preprocessing (e.g., tokenizer, image encoding).\n  - A single sample should include (add\u002Fremove as needed):\n    - image: list[PIL.Image] | np.ndarray\n    - lang: str\n    - action: np.ndarray[T, action_dim]\n    - state: Optional[np.ndarray[..., state_dim]]\n\nBoth `framework.forward()` and `framework.predict_action()` operate directly on raw inputs, keeping train\u002Ftest boundaries explicit and easy to hack.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>3. Flexible configuration system\u003C\u002Fb>\u003C\u002Fsummary>\n\nStarVLA uses a single global configuration object\nParameters are passed primarily via extensible dicts, allowing overrides and controlled redundancy.\n\n\u003C\u002Fdetails>\n\n\n\u003C!-- 🧪 *To self‑test and iterate on StarVLA's usability, we re‑implemented several representative VLA frameworks. We have done a beta test: an internal developer can stand up a new VLA framework in under half a day (less than 3 hours), and a new user can build their first custom VLA framework within a single day. More design insights for each item can be found in *[*assets\u002Fintro_v1.md*](assets\u002Fintro_v1.md)*.* -->\n\n---\n\n## FAQ\n\nSee [docs\u002Ffaq.md](docs\u002Ffaq.md) for common questions on configuration, freezing, learning rates, checkpointing, smaller VLMs, and more.\n\n## Contributing\n\nCommunity contributors are the driving force behind StarVLA's growing ecosystem. We deeply appreciate every PR, bug fix, and piece of feedback from the open-source community — your efforts keep StarVLA evolving rapidly. A full, continuously updated contributor list is maintained at [starvla.github.io\u002Fcontributors](https:\u002F\u002Fstarvla.github.io\u002Fcontributors).\n\nThanks to all the people who have contributed to StarVLA:\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FstarVLA\u002FstarVLA\u002Fgraphs\u002Fcontributors\">\n\u003Cimg src=\"https:\u002F\u002Fcontrib.rocks\u002Fimage?repo=starVLA\u002FstarVLA&max=100&columns=15\" \u002F>\n\u003C\u002Fa>\n\nSee [docs\u002FCONTRIBUTING.md](docs\u002FCONTRIBUTING.md) for guidelines on reporting bugs, proposing features, and submitting PRs.\n\n### Projects Based on StarVLA\n\n**NeuroVLA**: [*A Brain-like Embodied Intelligence for Fluid and Fast Reflexive Robotics Control*](https:\u002F\u002Fgithub.com\u002Fguoweiyu\u002FNeuroVLA)\n\n**PhysBrain**: [*Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence*](https:\u002F\u002Fzgc-embodyai.github.io\u002FPhysBrain)\n\n**TwinBrainVLA**: [*TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers*](https:\u002F\u002Fgithub.com\u002FZGC-EmbodyAI\u002FTwinBrainVLA)\n\n**LangForce**: [*LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries*](https:\u002F\u002Fgithub.com\u002FZGC-EmbodyAI\u002FLangForce)\n\nExamples:\n```bash\naccelerate launch \\\n  --config_file starVLA\u002Fconfig\u002Fdeepseeds\u002Fdeepspeed_zero2.yaml  \\\n  --num_processes 8 \\\n  starVLA\u002Ftraining\u002Ftrain_internvla.py \\\n  --config_yaml examples\u002FSimplerEnv\u002Ftrain_files\u002Fstarvla_cotrain_oxe.yaml \\\n  --framework.qwenvl.base_vlm Qwen\u002FQwen2.5-VL-7B-Instruct \\ # override framework choice\n  --framework.qwenvl.base_vlm Qwen\u002FQwen2.5-VL-7B-Instruct \\ # override framework choice\n  --framework.action_model.new_module ${module_name} \\ # plug-in a new module to action model\n```\n\n⚠️: `framework.action_model.new_module` only adds to the global config; its behavior is on your framework.\n\n\n\u003C\u002Fdetails>\n\n\u003Cdetails close>\n\u003Csummary>\u003Cb>Q: Can I freeze the VLM via parameters?\u003C\u002Fb>\u003C\u002Fsummary>\n\nA: Yes. StarVLA uses a regex \u002F name list to control freezing. Example:\n```\n--trainer.freeze_modules \"qwen_vl_interface.model.model.visual,dino_encoder\" \\\n```\nTips: You can ``print(your_model)`` first to check the relative paths of your modules and list them as comma-separated values.\n(implementation in `TrainerUtils.freeze_backbones`.)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails close>\n\u003Csummary>\u003Cb>Q: Can I set different learning rates for different modules?\u003C\u002Fb>\u003C\u002Fsummary>\n\nA: Yes, starVLA also uses name: value dict to control learning group. Config example:\n```yaml\ntrainer:\n  learning_rate:\n    base: 1e-05      # other modules\n    qwen_vl_interface: 1.0e-05\n    action_model: 1.0e-04\n```\n(Also referenced in `trainer_tools.build_param_lr_groups`.)\n\u003C\u002Fdetails>\n\n\u003Cdetails close>\n\u003Csummary>\u003Cb>Q: Can I resume training from a checkpoint?\u003C\u002Fb>\u003C\u002Fsummary>\n\nA: Yes, somehow can. Specify the latest checkpoint path in `config.yaml`, e.g.:\n```yaml\ntrainer:\n  pretrained_checkpoint: path_to_steps_10000.pt\n  reload_modules: \"action_model\"\n```\nEmpty `reload_modules` means full load all model. However, starVLA does not save  `optimizer state`. It requires a lot of  memory\u002Fdisk and bring limited benefit.\n\u003C\u002Fdetails>\n\n\n\u003Cdetails id=\"train-smaller-vlm\" close>\n\u003Csummary>\u003Cb>🚀 Train with a smaller VLM\u003C\u002Fb>\u003C\u002Fsummary>\n\n```bash\n    accelerate launch \\\n      --config_file starVLA\u002Fconfig\u002Fdeepseeds\u002Fdeepspeed_zero2.yaml \\\n      --main_process_ip $MASTER_ADDR \\\n      --main_process_port $MASTER_PORT \\\n      --machine_rank $SLURM_PROCID \\\n      --num_machines $SLURM_NNODES \\\n      --num_processes=${TOTAL_GPUS} \\\n      starVLA\u002Ftraining\u002Ftrain_starvla.py \\\n      --config_yaml examples\u002FSimplerEnv\u002Ftrain_files\u002Fstarvla_cotrain_oxe.yaml \\\n      --framework.name QwenGR00T \\\n      --framework.qwenvl.base_vlm microsoft\u002FFlorence-2-large \\\n      --run_root_dir ${run_root_dir} \\\n      --run_id ${run_id} \\\n      --wandb_project your_project \\\n      --wandb_entity your_name\n```\n\nNote: To ensure better compatibility with already released checkpoints, we are continuing to use `--framework.qwenvl`. This parameter will be unified in the next release.\n\n\u003C\u002Fdetails>\n\n\n\n\u003Ca id=\"citation\">\u003C\u002Fa>\n\n## ✍️ Citation & Copyright\n\nStarVLA is released under the MIT License, which permits commercial use, modification, distribution, and private use. Rebases are allowed for forks and feature branches; when rebasing from upstream StarVLA, use descriptive commit messages (e.g., \"chore: rebase from StarVLA\") and keep at least the two latest upstream commits as separate. See [License](LICENSE) for details.\n\n```bibtex\n\n@article{ye2026starvla,\n  title={StarVLA-$$\\backslash$alpha $: Reducing Complexity in Vision-Language-Action Systems},\n  author={Ye, Jinhui and Gao, Ning and Yang, Senqiao and Zheng, Jinliang and Wang, Zixuan and Chen, Yuxin and Chen, Pengguang and Chen, Yilun and Liu, Shu and Jia, Jiaya},\n  journal={arXiv preprint arXiv:2604.11757},\n  year={2026}\n}\n\n@article{community2026starvla,\n  title={StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing},\n  author={Community, StarVLA},\n  journal={arXiv preprint arXiv:2604.05014},\n  year={2026}\n}\n```\n\n## Acknowledgements\nThis project draws inspiration and references from several notable open-source initiatives, including:\n- [LeRobot](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flerobot)\n- [GR00T](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FIsaac-GR00T\u002Ftree\u002Fmain)\n- [DeepSpeed](https:\u002F\u002Fgithub.com\u002Fdeepspeedai\u002FDeepSpeed)\n- [Qwen-VL](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-VL\u002Ftree\u002Fmain)\n- [InternVL](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FInternVL)\n- [ABot-Manipulation](https:\u002F\u002Fgithub.com\u002Famap-cvlab\u002FABot-Manipulation)\n\nThe codebase was originally forked from [InternVLA-M1](https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FInternVLA-M1).\n\n## Star History\nHere's how our community has grown over time:\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=starVLA\u002FstarVLA&type=date&legend=bottom-right)](https:\u002F\u002Fwww.star-history.com\u002F#starVLA\u002FstarVLA&type=date&legend=bottom-right)\n\n\n\u003C!-- *Chart updates automatically. Click to interact with the full timeline.* -->\n","StarVLA是一个面向视觉-语言-动作模型开发的乐高式代码库。它提供了一个开放的研究平台，支持将前沿技术集成到通用机器人中，每个功能组件（如模型、数据、训练器等）都遵循直观分离和高内聚低耦合的原则，便于即插即用设计、快速原型制作及独立调试。项目采用Python编写，特别适合于需要探索和整合多模态感知与行动能力的机器人应用场景，例如家庭服务机器人、工业自动化系统等。通过其灵活的设计，用户可以轻松地在不同分支间切换，并且能够方便地将自己的机器人或数据集集成进平台中。",2,"2026-06-11 03:42:21","high_star"]