[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74287":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},74287,"lingbot-vla","Robbyant\u002Flingbot-vla","Robbyant","A Pragmatic VLA Foundation Model","",null,"Python",1420,146,9,30,0,27,79,283,81,19.5,"Apache License 2.0",false,"main",[26,27],"embodied-ai","vla","2026-06-12 02:03:25","\u003Ch1 align=\"center\">LingBot-VLA: A Pragmatic VLA Foundation Model\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"assets\u002FLingBot-VLA.pdf\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Paper&message=PDF&color=red&logo=arxiv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Ftechnology.robbyant.com\u002Flingbot-vla\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Website-blue\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Frobbyant\u002Flingbot-vla\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=%F0%9F%A4%97%20Model&message=HuggingFace&color=yellow\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fmodelscope.cn\u002Fcollections\u002FRobbyant\u002FLingBot-VLA\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=%F0%9F%A4%96%20Model&message=ModelScope&color=purple\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Frobbyant\u002Fgm100\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=%F0%9F%A4%97%20GM-100&message=HuggingFace&color=yellow\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache--2.0-green\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002FTeaser.png\" width=\"100%\">\n\u003C\u002Fp>\n\n## 🥳 We are excited to introduce **LingBot-VLA**, a pragmatic Vision-Language-Action foundation model.\n\n**LingBot-VLA** has focused on being **Pragmatic**:\n- **Large-scale Pre-training Data**: 20,000 hours of real-world\ndata from 9 popular dual-arm robot configurations.\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fscale_sr.png\" width=\"45%\" style=\"margin: 0 10px;\">\n  \u003Cimg src=\"assets\u002Fscale_ps.png\" width=\"45%\" style=\"margin: 0 10px;\">\n\u003C\u002Fp>\n\n- **Strong Performance**: Achieves clear superiority over competitors on simulation and real-world benchmarks.\n- **Training Efficiency**: Represents a 1.5 ∼ 2.8× (depending on the relied VLM base model) speedup over existing VLA-oriented codebases.\n\n## 🚀 News\n- **[2026-04-30]** Update of Our codebase:\n\n  - Add recommended post-training setting with real robot data.\n  - Upgrade to LeRobot v3.0.\n  - Support open-loop evaluation.\n  - Optimize GPU memory usage during training.\n  - Enable Torch Compile for inference.\n- **[2026-01-27]** LingBot-VLA Technical Report is available on Arxiv.\n- **[2026-01-27]** Weights and code released!\n\n\n---\n\n\n## 🛠️ Installation\nRequirements\n - Python 3.12.3\n - Pytorch 2.8.0\n - CUDA 12.8\n\n```bash\nconda create -n lingbotvla python=3.12 -y\nconda activate lingbotvla\n\ngit clone https:\u002F\u002Fgithub.com\u002FRobbyant\u002Flingbot-vla.git\ncd lingbot-vla\nbash install.sh\n```\n\n---\n\n## 📦 Model Download\nWe release LingBot-VLA pre-trained weights in two configurations: depth-free version and a depth-distilled version.\n#### Pretrained Checkpoints for Post-Training with and without depth\n\n| Model Name | Huggingface | ModelScope | Description |\n| :--- | :---: | :---: | :---: |\n| LingBot-VLA-4B &nbsp; | [🤗 lingbot-vla-4b](https:\u002F\u002Fhuggingface.co\u002Frobbyant\u002Flingbot-vla-4b) | [🤖 lingbot-vla-4b](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FRobbyant\u002Flingbot-vla-4b) | LingBot-VLA *w\u002Fo* Depth|\n| LingBot-VLA-4B-Depth | [🤗 lingbot-vla-4b-depth](https:\u002F\u002Fhuggingface.co\u002Frobbyant\u002Flingbot-vla-4b-depth) | [🤖 lingbot-vla-4b-depth](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FRobbyant\u002Flingbot-vla-4b-depth) | LingBot-VLA *w\u002F* Depth |\n\n```bash\n# Download Pretrained Checkpoints\npython3 scripts\u002Fdownload_hf_model.py --repo_id robbyant\u002Flingbot-vla-4b --local_dir lingbot-vla-4b \n```\n\n> \u003Cdetails>\n> \u003Csummary>⚠️ \u003Cstrong>Note for users who downloaded before 2026\u002F05\u002F01 (click to expand)\u003C\u002Fstrong>\u003C\u002Fsummary>\n> \n> \u003Cbr>\n> \n> If you downloaded `LingBot-VLA-4B` or `LingBot-VLA-4B-Depth` before **2026\u002F05\u002F01**, you may encounter the following error when loading the model:\n> \n> ```\n> draccus.utils.DecodingError: The fields `resize_imgs_with_padding`, `adapt_to_pi_aloha`, `use_delta_joint_actions_aloha`, `proj_width`, `num_steps`, `use_cache`, `attention_implementation`, `freeze_vision_encoder`, `train_expert_only`, `train_state_proj` are not valid for PI0Config\n> ```\n> \n> This is caused by our migration from **LeRobot v2.1** to **v3.0**.  \n> To fix this, please **re-download the latest checkpoint**, or manually remove the above fields from `config.json` in your local `lingbot-vla-4b\u002F` or `lingbot-vla-4b-depth\u002F` directory.\n> \n> \u003C\u002Fdetails>\n\n\n\u003Cbr>\n\nTo train LingBot with our codebase, weights from [Qwen2.5-VL-3B-Instruct](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen2.5-VL-3B-Instruct), [MoGe-2-vitb-normal](https:\u002F\u002Fhuggingface.co\u002FRuicheng\u002Fmoge-2-vitb-normal), and [LingBot-Depth](https:\u002F\u002Fhuggingface.co\u002Frobbyant\u002Flingbot-depth-pretrain-vitl-14) are also required.\n\n\n---\n\n## 💻 Post-Training Example\n\n### Data Preparation\n\nPost-training requires three preparation steps. For a complete guide on customizing your own dataset, see the [Custom Data Guide](lingbotvla\u002Fdata\u002Fvla_data\u002FREADME.md).\n\n| Step | Description | Output |\n|------|-------------|--------|\n| 1. Prepare LeRobot Dataset | Convert your demonstration data to [LeRobot v3.0](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flerobot) format | LeRobot dataset directory |\n| 2. Prepare Robot Config | Define feature mapping (states \u002F actions \u002F images) from raw keys to unified feature space | `configs\u002Frobot_configs\u002F\u003Cdata_name>.yaml` |\n| 3. Compute Norm Statistics | Calculate normalization statistics over your dataset | `assets\u002Fnorm_stats\u002F\u003Cname>.json` |\n\n> **Note:** If you already have data in LeRobot v2.1 format, you can use [convert_dataset_v21_to_v30.py](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flerobot\u002Fblob\u002Fv0.4.2\u002Fsrc\u002Flerobot\u002Fdatasets\u002Fv30\u002Fconvert_dataset_v21_to_v30.py) to quickly convert it to v3.0.\n\nBelow we use **RoboTwin 2.0** (5 tasks: \"open_microwave\", \"click_bell\", \"stack_blocks_three\", \"place_shoe\", \"put_object_cabinet\") as an example.\n\n- **Step 1 &mdash; RoboTwin Data**: Follow [RoboTwin2.0 Preparation](experiment\u002Frobotwin\u002FREADME.md) to download and convert.\n- **Step 2 &mdash; Robot Config**: See [`configs\u002Frobot_configs\u002Frobotwin.yaml`](configs\u002Frobot_configs\u002Frobotwin.yaml) for the RoboTwin feature mapping.\n- **Step 3 &mdash; Normalization**: Pre-computed stats are provided at `assets\u002Fnorm_stats\u002Frobotwin_50.json`. To recompute for a custom task subset, see the [Custom Data Guide](lingbotvla\u002Fdata\u002Fvla_data\u002FREADME.md).\n\n### Training\n\nWe provide a post-training example of LingBot-VLA on 5 RoboTwin 2.0 tasks (\"open_microwave\", \"click_bell\", \"stack_blocks_three\", \"place_shoe\", \"put_object_cabinet\"):\n\n```bash\n# without depth\nbash train.sh tasks\u002Fvla\u002Ftrain_lingbotvla.py .\u002Fconfigs\u002Fvla\u002Frobotwin_load20000h.yaml \\\n    --data.train_path \u002Fpath\u002Fto\u002Fmixed_robotwin_5tasks \\\n    --data.data_name robotwin \\\n    --data.norm_stats_file assets\u002Fnorm_stats\u002Frobotwin_50.json \\\n    --train.output_dir output\u002F\n\n# with depth\nbash train.sh tasks\u002Fvla\u002Ftrain_lingbotvla.py .\u002Fconfigs\u002Fvla\u002Frobotwin_load20000h_depth.yaml \\\n    --data.train_path \u002Fpath\u002Fto\u002Fmixed_robotwin_5tasks \\\n    --data.data_name robotwin \\\n    --data.norm_stats_file assets\u002Fnorm_stats\u002Frobotwin_50.json \\\n    --train.output_dir output\u002F\n```\n\n### 🤖 Real-Robot Post-Training\nAlso, we provide recommended training configurations specifically tailored for **real-world scenarios**: [`real_load20000h.yaml`](configs\u002Fvla\u002Freal_load20000h.yaml) (w\u002Fo depth) and [`real_load20000h_depth.yaml`](configs\u002Fvla\u002Freal_load20000h_depth.yaml) (w\u002F depth). For a detailed explanation of all training configuration parameters (batch size, gradient accumulation, training duration, checkpointing, depth injection, etc.), see the [Training Configuration Guide](configs\u002Fvla\u002FTraining_Config.md).\n\n\n### Evaluation\n\n#### Open-Loop Eval\n\n```bash\nexport QWEN25_PATH=Qwen\u002FQwen2.5-VL-3B-Instruct\npython scripts\u002Fopen_loop_eval.py --model_path path_to_posttraining_ckpt --data_path path_to_validation_data --use_length 50\n# If `--data_path` is omitted, the script defaults to the training dataset specified in the YAML config (`data.train_path`).\n```\n\n\n> **Note:**  \n> For inference, the model path (`path_to_posttraining_ckpt`, located in `train.output_dir\u002Fcheckpoints\u002F*\u002Fhf_ckpt`) must include:\n> - weights in `.safetensors` format\n> - `config.json`\n> - `lingbotvla_cli.yaml`\n\n\n#### Robotwin\n```bash\nexport QWEN25_PATH=path_to_Qwen2.5-VL-3B-Instruct\npython -m deploy.lingbot_vla_policy \\\n --model_path path_to_posttraining_ckpt \\\n --use_compile \\\n --use_length 50 \\\n --port port\n```\n\n\n#### Real-Robot Deployment\n```bash\nexport QWEN25_PATH=path_to_Qwen2.5-VL-3B-Instruct\npython -m deploy.lingbot_vla_policy \\\n --model_path path_to_posttraining_ckpt \\\n --use_compile \\\n --use_length 25\n# You can set --num_denoising_step to 5 if you want to speed up the evaluation.\n```\n\n---\n\n## 🏗️ Efficiency\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002FQwenPI_PaliGemmaPI.png\" width=\"85%\">\n\u003C\u002Fp>\nWe evaluate the training efficiency of our codebase against established baselines for both \u003Cb>Qwen2.5-VL-3B-π\u003C\u002Fb> and \u003Cb>PaliGemma-3B-pt-224-π\u003C\u002Fb> models. The results demonstrate that our codebase\nachieved the fastest training speeds in both model settings. The above figures detail the training throughput across configurations of 8, 16, 32, 128, and 256 GPUs, alongside the theoretical linear scaling limit.\n\n> **📢 Note on Throughput Metrics:** \n> All throughput values (e.g., 261 samples\u002Fsec) represent the **total aggregate throughput across all GPUs**, not per-GPU performance. \n> \u003Cbr>\u003Csup>(Updated: Previously mislabeled as per-GPU in earlier versions. We apologize for the confusion.)\u003C\u002Fsup>\n\n---\n\n## 📊 Performance\n\nOur LingBot-VLA achieves state-of-the-art results on real-world and simulation benchmarks:\n- **GM-100 across 3 robot platforms**\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth rowspan=\"2\">Platform\u003C\u002Fth>\n      \u003Cth colspan=\"2\">WALL-OSS\u003C\u002Fth>\n      \u003Cth colspan=\"2\">GR00T N1.6\u003C\u002Fth>\n      \u003Cth colspan=\"2\">π\u003Csub>0.5\u003C\u002Fsub>\u003C\u002Fth>\n      \u003Cth colspan=\"2\">Ours w\u002Fo depth\u003C\u002Fth>\n      \u003Cth colspan=\"2\">Ours w\u002F depth\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>SR\u003C\u002Fth>\u003Cth>PS\u003C\u002Fth>\n      \u003Cth>SR\u003C\u002Fth>\u003Cth>PS\u003C\u002Fth>\n      \u003Cth>SR\u003C\u002Fth>\u003Cth>PS\u003C\u002Fth>\n      \u003Cth>SR\u003C\u002Fth>\u003Cth>PS\u003C\u002Fth>\n      \u003Cth>SR\u003C\u002Fth>\u003Cth>PS\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>Agibot G1\u003C\u002Ftd>\n      \u003Ctd>2.99%\u003C\u002Ftd>\u003Ctd>8.75%\u003C\u002Ftd>\u003Ctd>5.23%\u003C\u002Ftd>\u003Ctd>12.63%\u003C\u002Ftd>\u003Ctd>7.77%\u003C\u002Ftd>\u003Ctd>21.98%\u003C\u002Ftd>\u003Ctd>\u003Cb>12.82%\u003C\u002Fb>\u003C\u002Ftd>\u003Ctd>30.04%\u003C\u002Ftd>\u003Ctd>11.98%\u003C\u002Ftd>\u003Ctd>\u003Cb>30.47%\u003C\u002Fb>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>AgileX\u003C\u002Ftd>\n      \u003Ctd>2.26%\u003C\u002Ftd>\u003Ctd>8.16%\u003C\u002Ftd>\u003Ctd>3.26%\u003C\u002Ftd>\u003Ctd>10.52%\u003C\u002Ftd>\u003Ctd>17.20%\u003C\u002Ftd>\u003Ctd>34.82%\u003C\u002Ftd>\u003Ctd>15.50%\u003C\u002Ftd>\u003Ctd>36.31%\u003C\u002Ftd>\u003Ctd>\u003Cb>18.93%\u003C\u002Fb>\u003C\u002Ftd>\u003Ctd>\u003Cb>40.36%\u003C\u002Fb>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>Galaxea R1Pro\u003C\u002Ftd>\n      \u003Ctd>6.89%\u003C\u002Ftd>\u003Ctd>14.13%\u003C\u002Ftd>\u003Ctd>14.29%\u003C\u002Ftd>\u003Ctd>24.83%\u003C\u002Ftd>\u003Ctd>14.10%\u003C\u002Ftd>\u003Ctd>26.14%\u003C\u002Ftd>\u003Ctd>18.89%\u003C\u002Ftd>\u003Ctd>34.71%\u003C\u002Ftd>\u003Ctd>\u003Cb>20.98%\u003C\u002Fb>\u003C\u002Ftd>\u003Ctd>\u003Cb>35.40%\u003C\u002Fb>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Cb>Average\u003C\u002Fb>\u003C\u002Ftd>\n      \u003Ctd>4.05%\u003C\u002Ftd>\u003Ctd>10.35%\u003C\u002Ftd>\u003Ctd>7.59%\u003C\u002Ftd>\u003Ctd>15.99%\u003C\u002Ftd>\u003Ctd>13.02%\u003C\u002Ftd>\u003Ctd>27.65%\u003C\u002Ftd>\u003Ctd>15.74%\u003C\u002Ftd>\u003Ctd>33.69%\u003C\u002Ftd>\u003Ctd>\u003Cb>17.30%\u003C\u002Fb>\u003C\u002Ftd>\u003Ctd>\u003Cb>35.41%\u003C\u002Fb>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n- **RoboTwin 2.0 (Clean and Randomized)**\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth rowspan=\"2\" >\u003Cb>Simulation Tasks\u003C\u002Fb>\u003C\u002Fth>\n      \u003Cth colspan=\"2\">\u003Cb>&pi;\u003Csub>0.5\u003C\u002Fsub>\u003C\u002Fb>\u003C\u002Fth>\n      \u003Cth colspan=\"2\">\u003Cb>Ours w\u002Fo depth\u003C\u002Fb>\u003C\u002Fth>\n      \u003Cth colspan=\"2\">\u003Cb>Ours w\u002F depth\u003C\u002Fb>\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>\u003Cb>Clean\u003C\u002Fb>\u003C\u002Fth>\n      \u003Cth>\u003Cb>Rand.\u003C\u002Fb>\u003C\u002Fth>\n      \u003Cth>\u003Cb>Clean\u003C\u002Fb>\u003C\u002Fth>\n      \u003Cth>\u003Cb>Rand.\u003C\u002Fb>\u003C\u002Fth>\n      \u003Cth>\u003Cb>Clean\u003C\u002Fb>\u003C\u002Fth>\n      \u003Cth>\u003Cb>Rand.\u003C\u002Fb>\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr style=\"border-top: 1px solid #ccc;\"> \u003C!-- \\midrule -->\n      \u003Ctd>\u003Cb>Average SR\u003C\u002Fb>\u003C\u002Ftd>\n      \u003Ctd>82.74%\u003C\u002Ftd>\n      \u003Ctd>76.76%\u003C\u002Ftd>\n      \u003Ctd>86.50%\u003C\u002Ftd>\n      \u003Ctd>85.34%\u003C\u002Ftd>\n      \u003Ctd>88.56%\u003C\u002Ftd>\n      \u003Ctd>86.68%\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n📢 We have released our checkpoints of LingBot-VLA-Posttrain-Robotwin:\n| Model Name | Huggingface | ModelScope | Description |\n| :--- | :---: | :---: | :---: |\n| LingBot-VLA-4B-Posttrain-Robotwin &nbsp; | [🤗 lingbot-vla-4b-posttrain-robotwin](https:\u002F\u002Fhuggingface.co\u002Frobbyant\u002Flingbot-vla-4b-posttrain-robotwin) | [🤖 lingbot-vla-4b-posttrain-robotwin](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FRobbyant\u002Flingbot-vla-4b-posttrain-robotwin) | LingBot-VLA-Posttrain-Robotwin *w\u002Fo* Depth|\n| LingBot-VLA-4B-Depth-Posttrain-Robotwin | [🤗 lingbot-vla-4b-depth-posttrain-robotwin](https:\u002F\u002Fhuggingface.co\u002Frobbyant\u002Flingbot-vla-4b-depth-posttrain-robotwin) | [🤖 lingbot-vla-4b-depth-posttrain-robotwin](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FRobbyant\u002Flingbot-vla-4b-depth-posttrain-robotwin) | LingBot-VLA-Posttrain-Robotwin *w\u002F* Depth |\n\n> \u003Cdetails>\n> \u003Csummary>⚠️ \u003Cstrong>Note for users who downloaded before 2026\u002F05\u002F01 (click to expand)\u003C\u002Fstrong>\u003C\u002Fsummary>\n> \n> \u003Cbr>\n> \n> If you downloaded `lingbot-vla-4b-posttrain-robotwin` or `lingbot-vla-4b-depth-posttrain-robotwin` before **2026\u002F05\u002F01**, you  may encounter the following error when loading the model:  \n> \n> ```\n> draccus.utils.DecodingError: The fields `resize_imgs_with_padding`, `adapt_to_pi_aloha`, `use_delta_joint_actions_aloha`, `proj_width`, `num_steps`, `use_cache`, `attention_implementation`, `freeze_vision_encoder`, `train_expert_only`, `train_state_proj` are not valid for PI0Config\n> ```\n> \n> To fix this, please re-download the latest checkpoint, or manually remove the above fields from\n`config.json` and add them to the `train` section of `lingbotvla_cli.yaml` in your local directory.\n> \n> \u003C\u002Fdetails>\n\n\u003Cbr>\n\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fexp-gm-100.png\" width=\"45%\" style=\"margin: 0 10px;\">\n  \u003Cimg src=\"assets\u002Fexp-robotwin.png\" width=\"45%\" style=\"margin: 0 10px;\">\n\u003C\u002Fp>\n\n---\n\n## 📝 Citation\n\nIf you find our work useful in your research, feel free to give us a cite.\n\n```bibtex\n@article{wu2026pragmatic,\n  title={A Pragmatic VLA Foundation Model},\n  author={Wei Wu and Fan Lu and Yunnan Wang and Shuai Yang and Shi Liu and Fangjing Wang and Shuailei Ma and He Sun and Yong Wang and Zhenqi Qiu and Houlong Xiong and Ziyu Wang and Shuai Zhou and Yiyu Ren and Kejia Zhang and Hui Yu and Jingmei Zhao and Qian Zhu and Ran Cheng and Yong-Lu Li and Yongtao Huang and Xing Zhu and Yujun Shen and Kecheng Zheng},\n  journal={arXiv preprint arXiv:2601.18692v1},\n  year={2026}\n}\n```\n\n---\n\n## 📄 License Agreement\nThis project is licensed under the [Apache-2.0 License](LICENSE).\n\n## 😊 Acknowledgement\nWe would like to express our sincere gratitude to the developers of [VeOmni](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.02317), [LeRobot](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flerobot#), and Baidu Cloud for their technical support. This project benefits significantly from their outstanding work and contributions. Baidu Cloud's optimization solutions notably reduced our GPU memory consumption by **29.2%** during model training.","LingBot-VLA 是一个实用的视觉-语言-动作基础模型。该项目通过20,000小时来自9种流行双臂机器人配置的真实世界数据进行大规模预训练，表现出优于同类模型的强大性能，并且在训练效率上实现了1.5至2.8倍的提升（具体取决于所依赖的视觉语言模型基底）。它支持无深度信息和有深度信息两种版本，适用于需要高效处理复杂多模态任务的场景，如机器人操作、自动化系统等。基于Python开发，兼容最新的PyTorch和CUDA版本，易于安装与部署。",2,"2026-06-11 03:49:49","high_star"]