[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71966":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":36,"readmeContent":37,"aiSummary":38,"trendingCount":16,"starSnapshotCount":16,"syncStatus":39,"lastSyncTime":40,"discoverSource":41},71966,"ART","OpenPipe\u002FART","OpenPipe","Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!","https:\u002F\u002Fart.openpipe.ai",null,"Python",9962,891,66,63,0,56,87,516,168,39.85,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32,33,34,35],"agent","agentic-ai","grpo","llms","lora","qwen","qwen3","reinforcement-learning","rl","2026-06-12 02:02:56","\u003Cdiv align=\"center\">\n\n\u003Ca href=\"https:\u002F\u002Fart.openpipe.ai\">\u003Cpicture>\n\u003Cimg alt=\"ART logo\" src=\"https:\u002F\u002Fgithub.com\u002Fopenpipe\u002Fart\u002Fraw\u002Fmain\u002Fassets\u002FART_logo.png\" width=\"160px\">\n\u003C\u002Fpicture>\u003C\u002Fa>\n\n\u003Cp align=\"center\">\n  \u003Ch1>Agent Reinforcement Trainer\u003C\u002Fh1>\n\u003C\u002Fp>\n\n\u003Cp>\nTrain multi-step agents for real-world tasks using GRPO.\n\u003C\u002Fp>\n\n[![PRs-Welcome][contribute-image]][contribute-url]\n[![PyPI version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fopenpipe-art?color=364fc7)][pypi-url]\n[![Train Agent](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002F2048\u002F2048.ipynb)\n\n[![Join Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJoin%20Discord-5865F2?style=plastic&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002FEceeVdhpxD)\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-orange?style=plastic&logo=gitbook&logoColor=white)](https:\u002F\u002Fart.openpipe.ai)\n\n\u003C\u002Fdiv>\n\n## 🚀 W&B Training: Serverless RL\n\n**W&B Training (Serverless RL)** is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward function—leading to faster feedback cycles, lower costs, and far less DevOps.\n\n✨ **Key Benefits:**\n\n- **40% lower cost** - Multiplexing on shared production-grade inference cluster\n- **28% faster training** - Scale to 2000+ concurrent requests across many GPUs\n- **Zero infra headaches** - Fully managed infrastructure that stays healthy\n- **Instant deployment** - Every checkpoint instantly available via W&B Inference\n\n```python\n# Before: Hours of GPU setup and infra management\n# RuntimeError: CUDA error: out of memory 😢\n\n# After: Serverless RL with instant feedback\nfrom art.serverless.backend import ServerlessBackend\n\nmodel = art.TrainableModel(\n  project=\"voice-agent\",\n  name=\"agent-001\",\n  base_model=\"OpenPipe\u002FQwen3-14B-Instruct\"\n)\n\nbackend = ServerlessBackend(\n    api_key=\"your_wandb_api_key\"\n)\nmodel.register(backend)\n# Edit and iterate in minutes, not hours!\n```\n\n[📖 Learn more about W&B Training →](https:\u002F\u002Fdocs.wandb.ai\u002Fguides\u002Ftraining)\n\n## ART Overview\n\nART is an open-source RL framework that improves agent reliability by allowing LLMs to **learn from experience**. ART provides an ergonomic harness for integrating GRPO into any python application. For a quick hands-on introduction, run one of the notebooks below. When you're ready to learn more, check out the [docs](https:\u002F\u002Fart.openpipe.ai).\n\n## 📒 Notebooks\n\n| Agent Task          | Example Notebook                                                                                                                       | Description                                         | Comparative Performance                                                                                                                                                                                                     |\n| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| **ART•E [Serverless]**   | [🏋️ Train agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Fart-e.ipynb)                       | Qwen3 14B learns to search emails using RULER     | \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fopenpipe\u002Fart\u002Fraw\u002Fmain\u002Fassets\u002Fbenchmarks\u002Femail_agent\u002Faccuracy-training-progress.svg\" height=\"72\"> [benchmarks](\u002Fdev\u002Fart-e\u002Fart_e\u002Fevaluate\u002Fdisplay_benchmarks.ipynb)                              |\n| **2048 [Serverless]** | [🏋️ Train agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002F2048\u002F2048.ipynb)                   | Qwen3 14B learns to play 2048                     | \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fopenpipe\u002Fart\u002Fraw\u002Fmain\u002Fassets\u002Fbenchmarks\u002F2048\u002Faccuracy-training-progress.svg\" height=\"72\"> [benchmarks](\u002Fexamples\u002F2048\u002Fdisplay_benchmarks.ipynb)                                                |\n| **ART•E LangGraph** | [🏋️ Train agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Flanggraph\u002Fart-e-langgraph.ipynb)   | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon]                                                                                                                                                                                                          |\n| **MCP•RL**          | [🏋️ Train agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Fmcp-rl\u002Fmcp-rl.ipynb)               | Qwen 2.5 3B masters the NWS MCP server              | [Link coming soon]                                                                                                                                                                                                          |\n| **Temporal Clue**   | [🏋️ Train agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Ftemporal_clue\u002Ftemporal-clue.ipynb) | Qwen 2.5 7B learns to solve Temporal Clue           | [Link coming soon]                                                                                                                                                                                                          |\n| **Tic Tac Toe**     | [🏋️ Train agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Ftic_tac_toe\u002Ftic-tac-toe.ipynb)     | Qwen 2.5 3B learns to play Tic Tac Toe              | \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fopenpipe\u002Fart\u002Fraw\u002Fmain\u002Fassets\u002Fbenchmarks\u002Ftic-tac-toe-local\u002Faccuracy-training-progress.svg\" height=\"72\"> [benchmarks](\u002Fexamples\u002Ftic_tac_toe\u002Fdisplay-benchmarks.ipynb)                            |\n| **Codenames**       | [🏋️ Train agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Fcodenames\u002FCodenames_RL.ipynb)      | Qwen 2.5 3B learns to play Codenames                | \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fopenpipe\u002Fart\u002Fraw\u002Fmain\u002Fassets\u002Fbenchmarks\u002Fcodenames\u002Fwin_rate_over_time.png\" height=\"72\"> [benchmarks](https:\u002F\u002Fgithub.com\u002FOpenPipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Fcodenames\u002FCodenames_RL.ipynb) |\n| **AutoRL [RULER]**  | [🏋️ Train agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Fauto_rl.ipynb)                     | Train Qwen 2.5 7B to master any task                | [Link coming soon]                                                                                                                                                                                                          |\n| **Distillation (SFT)** | [🏋️ Train model](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Fsft\u002Fdistillation.ipynb)         | Distill text-to-SQL from Qwen 3 235B to Qwen 3 30B  | [Link coming soon]                                                                                                                                                                                                          |\n| **Summarizer (SFT + RL)** | [🏋️ Train model](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Fsft\u002Fsft-rl.ipynb)            | Train a document summarizer with SFT warmup then RL | [Link coming soon]                                                                                                                                                                                                          |\n| **SFT from a dataset** | [🏋️ Train model](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenpipe\u002Fart-notebooks\u002Fblob\u002Fmain\u002Fexamples\u002Fsft\u002Ftrain_from_file.ipynb)      | Fine-tune Qwen 3 30B on text-to-SQL from a dataset  | [Link coming soon]                                                                                                                                                                                                          |\n\n## 📰 ART News\n\nExplore our latest research and updates on building SOTA agents.\n\n- 🗞️ **[ART now integrates seamlessly with LangGraph](https:\u002F\u002Fart.openpipe.ai\u002Fintegrations\u002Flanggraph-integration)** - Train your LangGraph agents with reinforcement learning for smarter multi-step reasoning and improved tool usage.\n- 🗞️ **[MCP•RL: Teach Your Model to Master Any MCP Server](https:\u002F\u002Fx.com\u002Fcorbtt\u002Fstatus\u002F1953171838382817625)** - Automatically train models to effectively use MCP server tools through reinforcement learning.\n- 🗞️ **[AutoRL: Zero-Data Training for Any Task](https:\u002F\u002Fx.com\u002Fmattshumer_\u002Fstatus\u002F1950572449025650733)** - Train custom AI models without labeled data using automatic input generation and RULER evaluation.\n- 🗞️ **[RULER: Easy Mode for RL Rewards](https:\u002F\u002Fopenpipe.ai\u002Fblog\u002Fruler-easy-mode-for-rl-rewards)** is now available for automatic reward generation in reinforcement learning.\n- 🗞️ **[ART·E: How We Built an Email Research Agent That Beats o3](https:\u002F\u002Fopenpipe.ai\u002Fblog\u002Fart-e-mail-agent)** demonstrates a Qwen 2.5 14B email agent outperforming OpenAI's o3.\n- 🗞️ **[ART Trainer: A New RL Trainer for Agents](https:\u002F\u002Fopenpipe.ai\u002Fblog\u002Fart-trainer)** enables easy training of LLM-based agents using GRPO.\n\n[📖 See all blog posts →](https:\u002F\u002Fopenpipe.ai\u002Fblog)\n\n## Why ART?\n\n- ART provides convenient wrappers for introducing RL training into **existing applications**. We abstract the training server into a modular service that your code doesn't need to interface with.\n- **Train from anywhere.** Run the ART client on your laptop and let the ART server kick off an ephemeral GPU-enabled environment, or run on a local GPU.\n- Integrations with hosted platforms like W&B, Langfuse, and OpenPipe provide flexible observability and **simplify debugging**.\n- ART is customizable with **intelligent defaults**. You can configure training parameters and inference engine configurations to meet specific needs, or take advantage of the defaults, which have been optimized for training efficiency and stability.\n\n## Installation\n\nART agents can be trained from any client machine that runs python. To add to an existing project, run this command:\n\n```\npip install openpipe-art\n```\n\n## 🤖 ART•E Agent\n\nCurious about how to use ART for a real-world task? Check out the [ART•E Agent](https:\u002F\u002Fopenpipe.ai\u002Fblog\u002Fart-e-mail-agent) blog post, where we detail how we trained Qwen 2.5 14B to beat o3 at email retrieval!\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fopenpipe\u002Fart\u002Fraw\u002Fmain\u002Fassets\u002FART_E_graphs.png\" width=\"700\">\n\n## 🔁 Training Loop Overview\n\nART's functionality is divided into a **client** and a **server**. The OpenAI-compatible client is responsible for interfacing between ART and your codebase. Using the client, you can pass messages and get completions from your LLM as it improves. The server runs independently on any machine with a GPU. It abstracts away the complexity of the inference and training portions of the RL loop while allowing for some custom configuration. An outline of the training loop is shown below:\n\n1. **Inference**\n\n   1. Your code uses the ART client to perform an agentic workflow (usually executing several rollouts in parallel to gather data faster).\n   2. Completion requests are routed to the ART server, which runs the model's latest LoRA in vLLM.\n   3. As the agent executes, each `system`, `user`, and `assistant` message is stored in a Trajectory.\n   4. When a rollout finishes, your code assigns a `reward` to its Trajectory, indicating the performance of the LLM.\n\n2. **Training**\n   1. When each rollout has finished, Trajectories are grouped and sent to the server. Inference is blocked while training executes.\n   2. The server trains your model using GRPO, initializing from the latest checkpoint (or an empty LoRA on the first iteration).\n   3. The server saves the newly trained LoRA to a local directory and loads it into vLLM.\n   4. Inference is unblocked and the loop resumes at step 1.\n\nThis training loop runs until a specified number of inference and training iterations have completed.\n\n## 🧩 Supported Models\n\nART should work with most vLLM\u002FHuggingFace-transformers compatible causal language models, or at least the ones supported by [Unsloth](https:\u002F\u002Fdocs.unsloth.ai\u002Fget-started\u002Fall-our-models). Gemma 3 does not appear to be supported for the time being. If any other model isn't working for you, please let us know on [Discord](https:\u002F\u002Fdiscord.gg\u002FzbBHRUpwf4) or open an issue on [GitHub](https:\u002F\u002Fgithub.com\u002Fopenpipe\u002Fart\u002Fissues)!\n\n## 🤝 Contributing\n\nART is in active development, and contributions are most welcome! Please see the [CONTRIBUTING.md](CONTRIBUTING.md) file for more information.\n\n## 📖 Citation\n\n```bibtex\n@misc{hilton2025art,\n  author = {Brad Hilton and Kyle Corbitt and David Corbitt and Saumya Gandhi and Angky William and Bohdan Kovalevskyi and Andie Jones},\n  title = {ART: Agent Reinforcement Trainer},\n  year = {2025},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fopenpipe\u002Fart}}\n}\n```\n\n## ⚖️ License\n\nThis repository's source code is available under the [Apache-2.0 License](LICENSE).\n\n## 🙏 Credits\n\nART stands on the shoulders of giants. While we owe many of the ideas and early experiments that led to ART's development to the open source RL community at large, we're especially grateful to the authors of the following projects:\n\n- [Unsloth](https:\u002F\u002Fgithub.com\u002Funslothai\u002Funsloth)\n- [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)\n- [trl](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftrl)\n- [torchtune](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ftorchtune)\n\nFinally, thank you to our partners who've helped us test ART in the wild! We're excited to see what you all build with it.\n\n[pypi-url]: https:\u002F\u002Fpypi.org\u002Fproject\u002Fopenpipe-art\u002F\n[contribute-url]: https:\u002F\u002Fgithub.com\u002Fopenpipe\u002Fart\u002Fblob\u002Fmain\u002FCONTRIBUTING.md\n[contribute-image]: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-blue.svg\n","Agent Reinforcement Trainer (ART) 是一个用于通过GRPO训练多步骤代理执行现实任务的工具，特别适用于Qwen3.6、GPT-OSS、Llama等语言模型。其核心功能包括利用强化学习让大型语言模型从经验中学习，并提供了一个易于集成GRPO到任何Python应用中的框架。技术特点上，ART支持W&B Training服务，该服务可以自动管理训练和推理基础设施，从而降低40%的成本，提高28%的训练速度，并消除基础设施管理的麻烦。此外，它还提供了即时部署的能力，使得每个检查点都能立即通过W&B Inference获得。此项目非常适合需要快速迭代开发、减少运维负担以及希望在真实环境中优化AI代理表现的应用场景。",2,"2026-06-11 03:39:44","high_star"]