[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80004":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":15,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":23,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":16,"starSnapshotCount":16,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},80004,"astraflow","Infini-AI-Lab\u002Fastraflow","Infini-AI-Lab","Dataflow-Oriented Reinforcement Learning for (Multi-)Agentic LLMs","https:\u002F\u002Finfini-ai-lab.github.io\u002Fastraflow\u002Fdocs\u002F",null,"Python",85,12,70,1,0,7,13,3.34,"Apache License 2.0",false,"main",true,[25,26,27,28,29,30],"agentic-ai","llm","llm-rl","mlsys","reinforcement-learning","rl","2026-06-12 02:03:56","\u003Cdiv align=\"center\" id=\"astraflowtop\">\n\u003Cpicture>\n  \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\".\u002Fdocs\u002Fassets\u002Ftitle-dark.svg\">\n  \u003Cimg src=\".\u002Fdocs\u002Fassets\u002Ftitle-light.svg\" alt=\"AstraFlow — Dataflow-Oriented Reinforcement Learning for (Multi-)Agentic LLMs\" width=\"620\">\n\u003C\u002Fpicture>\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.15565-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15565)\n[![Blog](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fblog-online-9cf.svg)](https:\u002F\u002FInfini-AI-Lab.github.io\u002Fastraflow\u002F)\n[![Docs Site](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-github%20pages-blue)](https:\u002F\u002FInfini-AI-Lab.github.io\u002Fastraflow\u002Fdocs\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-green.svg)](.\u002FLICENSE)\n\n\u003C\u002Fdiv>\n\n\u003Chr>\n\nAstraFlow is a **dataflow-oriented** reinforcement learning system designed for better flexibility and scalability.\n\nAstraFlow **natively** supports the following for LLM RL training **without any feature-specific system engineering**:\n\n- **Fully Async Multi-policy collaborative RL**\n- **Elastic heterogeneous cross-region rollouts**\n- **Substitutable rollout and trainer service**\n- **Composable data algorithms**\n\n\u003Cbr>\n\n\u003C!-- ## What can AstraFlow enable? -->\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\".\u002Fdocs\u002Fassets\u002Fraas.gif\" width=\"90%\" alt=\"Elastic RaaS pool of mixed-hardware nodes joining and leaving across regions\">\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\u003Ci>\u003Cb>Elastic Heterogeneous Cross-region Rollouts\u003C\u002Fb>: RaaS instances on mixed hardware and across regions join and leave the rollout pool on demand, with no scheduler- or region-specific code.\u003C\u002Fi>\u003C\u002Fp>\n\n\u003Cbr>\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\".\u002Fdocs\u002Fassets\u002Fastraflow.gif\" width=\"90%\" alt=\"AstraFlow training a multi-policy workflow on an elastic, heterogeneous, cross-region rollout pool\">\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\u003Ci>\u003Cb>Fully Async Multi-policy Collaborative RL Training\u003C\u002Fb>: multiple policies train together, each as an independent trainer with its own data and weight stream.\u003C\u002Fi>\u003C\u002Fp>\n\n\u003C!-- \u003Cp align=\"center\">\u003Ci>AstraFlow training a multi-policy workflow on an elastic, heterogeneous, cross-region rollout pool — all at once, with no feature-specific code.\u003C\u002Fi>\u003C\u002Fp> -->\n\n## News\n- **[2026\u002F05]** AstraFlow **v0.1.0** released — first public release of the full system. See the [project website](https:\u002F\u002FInfini-AI-Lab.github.io\u002Fastraflow\u002F).\n- **[2026\u002F05]** AstraFlow paper is on [arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15565).\n\n----\n\n## Getting Started\n\n- [Install AstraFlow](https:\u002F\u002FInfini-AI-Lab.github.io\u002Fastraflow\u002Fdocs\u002Fget-started\u002Finstallation.html)\n- [Quick Start](https:\u002F\u002FInfini-AI-Lab.github.io\u002Fastraflow\u002Fdocs\u002Fget-started\u002Fquickstart.html)\n\n## Recipes\nAstraFlow currently supports the following recipes. Check the [documentation](https:\u002F\u002FInfini-AI-Lab.github.io\u002Fastraflow\u002Fdocs\u002F) for more detailed instructions.\n\n| Recipe | Description |\n|---|---|\n| [`math\u002F`](examples\u002Fmath\u002F) | RLVR math reasoning — Qwen3-1.7B \u002F 8B, M2PO, full and delta-weight transfer |\n| [`math-multi-agent\u002F`](examples\u002Fmath-multi-agent\u002F) | Actor + verifier collaborative math training |\n| [`math-efficient-data\u002F`](examples\u002Fmath-efficient-data\u002F) | Composable data algorithms — GRESO, dynamic sampling, buffer replay |\n| [`code\u002F`](examples\u002Fcode\u002F) | Code-generation RL — Qwen3-8B, M2PO |\n| [`code-multi-agent\u002F`](examples\u002Fcode-multi-agent\u002F) | Codegen + verifier competitive coding |\n| [`search\u002F`](examples\u002Fsearch\u002F) | Search-augmented agent training with local retrieval |\n| [`alfworld\u002F`](examples\u002Falfworld\u002F) | ALFWorld embodied household agent |\n| [`webshop\u002F`](examples\u002Fwebshop\u002F) | WebShop web-navigation shopping agent |\n\n## Roadmap\nNear-term focus:\n\n- [ ] **Offline cluster training** — Support training on offline clusters without internet access.\n- [ ] **All-in-one launcher** — A launcher helper that streamlines bringing up the AstraFlow, RaaS, and trainer services.\n- [ ] **MoE model support** — Extend the training backends to Mixture-of-Experts models.\n- [ ] **Terminal-Bench training** — Add a recipe for training agents on Terminal-Bench.\n- [ ] **Megatron backend** — Add Megatron-LM as a training backend.\n- [ ] **vLLM rollout engine** — Support vLLM alongside SGLang as a rollout engine.\n\n## Citation\nIf you find AstraFlow useful in your research, please cite:\n\n```bibtex\n@article{zheng2026astraflow,\n  title   = {AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs},\n  author  = {Zheng, Haizhong and Di, Yizhuo and Wang, Jiahui and Jin, Shuowei and\n             Liu, Xueshen and Wu, Yongji and Mao, Z. Morley and Stoica, Ion and\n             Zhao, Jiawei and Chen, Beidi},\n  journal = {arXiv preprint arXiv:2605.15565},\n  year    = {2026}\n}\n```\n\n## Acknowledgment\nWe learned the design and reused code from the following projects: [AReaL](https:\u002F\u002Fgithub.com\u002Fareal-project\u002FAReaL), [verl](https:\u002F\u002Fgithub.com\u002Fverl-project\u002Fverl), [AgentBench](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentBench), [ASearcher](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher), and [M2PO](https:\u002F\u002Fgithub.com\u002FInfini-AI-Lab\u002FM2PO).\n","AstraFlow是一个面向数据流的强化学习系统，专为提高灵活性和可扩展性设计。该项目的核心功能包括全异步多策略协作强化学习、弹性异构跨区域部署、可替换的部署和训练服务以及可组合的数据算法，无需特定功能的系统工程即可支持大规模语言模型（LLM）的强化学习训练。这些特性使得AstraFlow非常适合需要在不同硬件和地理区域之间动态调整资源分配的大规模分布式强化学习场景，特别是当涉及多个智能体协同工作时。",2,"2026-06-11 03:58:53","CREATED_QUERY"]