[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80007":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":14,"stars7d":11,"stars30d":15,"stars90d":13,"forks30d":13,"starsTrendScore":16,"compositeScore":17,"rankGlobal":8,"rankLanguage":8,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":21,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":13,"starSnapshotCount":13,"syncStatus":14,"lastSyncTime":26,"discoverSource":27},80007,"realtime-vla-flash","dexmal\u002Frealtime-vla-flash","dexmal",null,"Python",81,7,69,0,2,11,8,2.71,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:03:56","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"docs\u002Fflash\u002Fimg\u002Fflash.png\" alt=\"Realtime-VLA FLASH overview\" width=\"100%\">\u003Cbr>\n\u003C\u002Fdiv>\n\n\n---\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fdexmal.github.io\u002Frealtime-vla-flash\u002F\">\u003Cb>Page\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.13778\">\u003Cb>Paper\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FDexmal\u002FRealtimeVLA-Flash\">\u003Cb>Model\u003C\u002Fb>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\n## News\n\n- [2026\u002F05] 🔥 Realtime-VLA FLASH code is now available.\n\n## Highlights\n\nRealtime-VLA FLASH is the first speculative inference framework for diffusion-based VLAs.\n\n- Speculative inference as fast as 7.8 ms (2 views), enabling over 125 Hz real-time inference.\n- VLM-aligned draft architecture with a deployment-friendly block design.\n- FLASH serving with customized Triton kernels, achieving a 3.04× average task-level speedup.\n\n## Installation\n\nFollow [openpi README](README_OPENPI.md):\n\n```bash\ngit clone --recurse-submodules https:\u002F\u002Fgithub.com\u002Fdexmal\u002Frealtime-vla-flash\n# Or if you already cloned the repo:\ngit submodule update --init --recursive\n```\n\nInstall the Python environment with `uv`:\n\n```bash\nGIT_LFS_SKIP_SMUDGE=1 uv sync\nGIT_LFS_SKIP_SMUDGE=1 uv pip install -e .\n```\n\nLIBERO client\u002Fevaluation code can run in a separate environment. (see [LIBERO README](examples\u002Flibero\u002FREADME.md)).\n\n## Quick Start\n\nFirst, convert the pretrained pi0 and draft checkpoints into the Triton weight layout.\n\n```bash\nuv run scripts\u002Fspec\u002Ftriton\u002Fconvert_for_triton.py \\\n   --mode base \\\n   --jax-path \u002Fpath\u002Fto\u002Fjax\u002Fcheckpoint \\\n   --output converted\u002Fbase\n\nuv run scripts\u002Fspec\u002Ftriton\u002Fconvert_for_triton.py \\\n   --mode draft \\\n   --draft-ckpt \u002Fpath\u002Fto\u002Fdraft_model.pt \\\n   --output converted\u002Fdraft\n```\n\nThen start the policy server and the LIBERO client.\n\n```bash\nuv run scripts\u002Fspec\u002Fspec_serve_policy.py \\\n  --config pi0_libero \\\n  --base-triton-path converted\u002Fbase \\\n  --draft-triton-path converted\u002Fdraft \\\n  --task-suite-name libero_goal \\\n  --backend triton\n\nuv run scripts\u002Fspec\u002Fspec_client_libero.py \\\n  --task-suite-name libero_goal\n```\n\n## Benchmark\nYou can check the inference time on your local machine by\n```\nuv run python scripts\u002Fspec\u002Fpi0_benchmark.py\n```\n\n## Train Draft Model\n\n```bash\n  uv run scripts\u002Fspec\u002Fenc_cache.py \\\n    --config pi0_libero \\\n    --checkpoint-dir \u002Fopenpi-assets\u002Fcheckpoints\u002Fpi0_libero_torch \\\n    --task-suite-name libero_goal \\\n    --output-dir \u002Ftmp\u002Fspec_quickstart_train\u002Flibero_goal_cache\n\n  uv run scripts\u002Fspec\u002Fspec_draft_train.py \\\n    --cache-dir \u002Ftmp\u002Fspec_quickstart_train\u002Flibero_goal_cache \\\n    --output draft_model_goal_torch.pt\n```\n\nA typical workflow is:\n\n1. Build a prefix-embedding cache with `scripts\u002Fspec\u002Fenc_cache.py`.\n2. Train the draft head with `scripts\u002Fspec\u002Fspec_draft_train.py`.\n3. Serve the FLASH policy with `scripts\u002Fspec\u002Fspec_serve_policy.py`.\n4. Run LIBERO client evaluation or sweeps with `scripts\u002Fspec\u002Fspec_client_libero.py` or `scripts\u002Fspec\u002Fexp\u002Frun_sweep.py`.\n\n## Citation\n\nIf you find this work useful, please cite the paper once the arXiv version is available:\n\n```bibtex\n@article{niu2026realtimevlaflash,\n  title={Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs},\n  author={Niu, Jiahui and Gu, Kefan and Zhao, Yucheng and Liang, Shengwen and Wang, Tiancai and Hu, Xing and Wang, Ying and Li, Huawei},\n  journal={arXiv preprint arXiv:2605.13778},\n  year={2026}\n}\n```\n\n## Acknowledgements\n- [dexmal\u002Frealtime-vla](https:\u002F\u002Fgithub.com\u002Fdexmal\u002Frealtime-vla)\n- [openpi](https:\u002F\u002Fgithub.com\u002FPhysical-Intelligence\u002Fopenpi)\n","realtime-vla-flash 是一个用于扩散模型的实时视觉语言动作（VLA）推测推理框架。其核心功能包括以7.8毫秒完成双视角推测，支持超过125Hz的实时推理速度；采用与视觉-语言模型对齐的草图架构设计，并通过定制化的Triton内核实现平均任务级加速3.04倍。该项目适合需要高性能、低延迟视觉理解与决策的应用场景，如机器人控制、自动驾驶等。使用Python编写，易于部署和扩展。","2026-06-11 03:58:53","CREATED_QUERY"]