[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72079":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},72079,"slime","THUDM\u002Fslime","THUDM","slime is an LLM post-training framework for RL Scaling.","https:\u002F\u002Fthudm.github.io\u002Fslime",null,"Python",6067,882,19,196,0,60,206,426,180,39.84,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:02:58","# slime\n\n[中文版](.\u002FREADME_zh.md)\n\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-latest-brightgreen.svg?style=flat)](https:\u002F\u002Fthudm.github.io\u002Fslime\u002F)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002FTHUDM\u002Fslime)\n\n**slime** is an LLM post-training framework for RL scaling, providing two core capabilities:\n\n1.  **High-Performance Training**: Supports efficient training in various modes by connecting Megatron with SGLang;\n2.  **Flexible Data Generation**: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.\n\nslime is the RL-framework behind [GLM-5.1](https:\u002F\u002Fz.ai\u002Fblog\u002Fglm-5.1), [GLM-5](https:\u002F\u002Fz.ai\u002Fblog\u002Fglm-5), [GLM-4.7](https:\u002F\u002Fz.ai\u002Fblog\u002Fglm-4.7), [GLM-4.6](https:\u002F\u002Fz.ai\u002Fblog\u002Fglm-4.6), [GLM-4.5](https:\u002F\u002Fz.ai\u002Fblog\u002Fglm-4.5) and apart from models from Z.ai, we also supports the following models:\n- Qwen series (Qwen3.6, Qwen3.5, Qwen3Next, Qwen3MoE, Qwen3, Qwen2.5);\n- DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);\n- Llama 3.\n\n## Blogs\n\n- Our vision: [slime: An SGLang-Native Post-Training Framework for RL Scaling](https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-07-09-slime\u002F).\n- Our ideas on agentic training: [Agent-Oriented Design: An Asynchronous and Decoupled Framework for Agentic RL](https:\u002F\u002Fwww.notion.so\u002FAgent-Oriented-Design-An-Asynchronous-and-Decoupled-Framework-for-Agentic-RL-2278e692d081802cbdd5d37cef76a547)\n- v0.1.0 release note: [v0.1.0: Redefining High-Performance RL Training Frameworks](https:\u002F\u002Fthudm.github.io\u002Fslime\u002Fblogs\u002Frelease_v0.1.0.html)\n\n## Table of Contents\n\n- [Architecture Overview](#architecture-overview)\n- [Quick Start](#quick-start)\n- [Projects Built with slime](#projects-built-with-slime)\n- [Arguments Walkthrough](#arguments-walkthrough)\n- [Developer Guide](#developer-guide)\n- [FAQ & Acknowledgements](#faq--acknowledgements)\n\n## Architecture Overview\n\n![arch](.\u002Fimgs\u002Farch.png)\n\n**Module Descriptions**:\n\n- **training (Megatron)**: Responsible for the main training process, reads data from the Data Buffer, and synchronizes parameters to the rollout module after training.\n- **rollout (SGLang + router)**: Generates new data (including rewards\u002Fverifier outputs) and stores it in the Data Buffer.\n- **data buffer**: A bridge module that manages prompt initialization, custom data, and rollout generation methods.\n\n## Quick Start\n\nFor a comprehensive quick start guide covering environment setup, data preparation, training startup, and key code analysis, please refer to:\n- [Quick Start Guide](.\u002Fdocs\u002Fen\u002Fget_started\u002Fquick_start.md)\n\nWe also provide examples for some use cases not covered in the quick start guide; please check [examples](examples\u002F).\n\n## Projects Built upon slime\n\nslime has powered several novel research projects and production systems. Here are some notable examples:\n\n### 🌈 Relax: Asynchronous RL Engine for Omni-Modal Agentic Training\n\n[**Relax**](https:\u002F\u002Fgithub.com\u002Fredai-infra\u002FRelax) (Reinforcement Engine Leveraging Agentic X-modality) is an omni-modal agentic RL framework open-sourced by the RedAI Infra team, built upon the slime infrastructure stack that combines Ray, Megatron-LM, and SGLang. Relax adopts a service-oriented architecture on Ray Serve with Megatron-LM and SGLang as training\u002Finference backends. It uses [TransferQueue](https:\u002F\u002Fgithub.com\u002FAscend\u002FTransferQueue) to fully decouple Actor, Rollout, ActorFwd, Reference, and Advantage computation onto independent GPU clusters, and introduces **DCS (Distributed Checkpoint Service)** — an NCCL-broadcast weight-sync engine that streams updated Actor weights to Rollout\u002FActorFwd\u002FReference asynchronously and overlaps the transfer with the next training step, enabling fully-async training at configurable staleness. Relax supports end-to-end RL for text, vision, and audio (including Qwen3-Omni) and agentic multi-turn rollouts.\n\n### 🦞 OpenClaw-RL: Train a Personalized Clawbot Simply by Talking to It\n\n[**OpenClaw-RL**](https:\u002F\u002Fgithub.com\u002FGen-Verse\u002FOpenClaw-RL) is an RL server for personalized OpenClaw agents. It hosts the OpenClaw model and improves it from prior conversations across deployments, while slime's asynchronous RL infrastructure prevents training from interfering with API serving. It supports two automatic optimization methods: GRPO with binary feedback inferred from subsequent states, and on-policy distillation that extracts hindsight hints from later feedback for the current policy.\n\n### ⚛️ P1: Mastering Physics Olympiads with Reinforcement Learning\n\n[**P1**](https:\u002F\u002Fprime-rl.github.io\u002FP1\u002F) is a family of open-source physics reasoning models trained entirely through reinforcement learning. P1 leverages slime as the RL post training framework, and introduces a multi-stage RL training algorithm that progressively enhances reasoning ability through adaptive learnability adjustment and stabilization mechanisms. Enpowered by this training paradigm, P1 delivers breakthrough performance in open-source physics reasoning.\n\n### 📈RLVE: Scaling LM RL with Adaptive Verifiable Environments\n\n[**RLVE**](https:\u002F\u002Fgithub.com\u002FZhiyuan-Zeng\u002FRLVE) introduces an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards, to scale up RL for language models (LMs). With joint training across 400 verifiable environments, RLVE enables each environment to dynamically adapt its problem difficulty distribution to the policy model's capabilities as training progresses.\n\n### ⚡ TritonForge: Agentic RL Training Framework for Kernel Generation\n\n[**TritonForge**](https:\u002F\u002Fgithub.com\u002FRLsys-Foundation\u002FTritonForge) leverages slime's SFT & RL capabilities to train LLMs that automatically generate optimized GPU kernels. By using a two-stage training approach—supervised fine-tuning followed by reinforcement learning with multi-turn compilation feedback—TritonForge achieves remarkable results in converting PyTorch operations into high-performance Triton kernels.\n\n### 🚀 APRIL: Accelerating RL Training with Active Partial Rollouts\n\n[**APRIL**](https:\u002F\u002Fgithub.com\u002FRLsys-Foundation\u002FAPRIL) introduces a system-level optimization that seamlessly integrates with slime to accelerate the rollout generation phase in RL training. By intelligently over-provisioning requests and actively managing partial completions, APRIL addresses the long-tail generation bottleneck that typically consumes over 90% of RL training time.\n\n### 🏟️ qqr: Scaling Open-Ended Agents with ArenaRL & MCP\n\n[**qqr**](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002Fqqr) (a.k.a. hilichurl) is a lightweight extension for slime designed to evolve open-ended agents. It implements the **ArenaRL** algorithm to tackle discriminative collapse through tournament-based relative ranking (**e.g., Seeded Single-Elimination, Round-Robin**) and seamlessly integrates the **Model Context Protocol (MCP)**. qqr leverages slime's high-throughput training capabilities to enable scalable, distributed evolution of agents in standardized, decoupled tool environments.\n\nThese projects showcase slime's versatility—from training code-generation models to optimizing RL training systems—making it a powerful foundation for both research and production deployments.\n\n## Arguments Walkthrough\n\nArguments in slime are divided into three categories:\n\n1.  **Megatron arguments**: slime reads all arguments in Megatron. You can configure Megatron by passing arguments like `--tensor-model-parallel-size 2`.\n2.  **SGLang arguments**: All arguments for the installed SGLang are supported. These arguments must be prefixed with `--sglang-`. For example, `--mem-fraction-static` should be passed as `--sglang-mem-fraction-static`.\n3.  **slime-specific arguments**: Please refer to: [slime\u002Futils\u002Farguments.py](slime\u002Futils\u002Farguments.py)\n\nFor complete usage instructions, please refer to the [Usage Documentation](docs\u002Fen\u002Fget_started\u002Fusage.md).\n\n## Developer Guide\n\n- **Contributions are welcome\\!** If you have suggestions for new features, performance tuning, or feedback on user experience, feel free to submit an Issue or PR 😊\n\n- Use [pre-commit](https:\u002F\u002Fpre-commit.com\u002F) to ensure code style consistency for your commits:\n\n```bash\napt install pre-commit -y\npre-commit install\n\n# run pre-commit to ensure code style consistency\npre-commit run --all-files --show-diff-on-failure --color=always\n```\n\n- For debugging tips, please refer to the [Debugging Guide](docs\u002Fen\u002Fdeveloper_guide\u002Fdebug.md)\n\n## FAQ & Acknowledgements\n\n- For frequently asked questions, please see the [Q\\&A](docs\u002Fen\u002Fget_started\u002Fqa.md)\n- Special thanks to the following projects & communities: SGLang, Megatron‑LM, mbridge, OpenRLHF, veRL, Pai-Megatron-Patch and others.\n- To quote slime, please use:\n\n```bibtex\n@misc{slime_github,\n  author       = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},\n  title        = {slime: An LLM post-training framework for RL Scaling},\n  year         = {2025},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FTHUDM\u002Fslime}},\n  note         = {GitHub repository. Corresponding author: Xin Lv},\n  urldate      = {2025-06-19}\n}\n```\n","slime是一个用于强化学习扩展的大规模语言模型后训练框架。它提供了两个核心功能：一是通过连接Megatron与SGLang支持高效的多模式训练，二是通过自定义数据生成接口和基于服务器的引擎实现灵活的数据生成工作流。该框架特别适用于需要对现有语言模型进行性能优化或特定任务定制化训练的场景，如提升对话系统、文本生成等应用的质量与效率。项目采用Python开发，并遵循Apache License 2.0开源许可协议，已成功应用于包括GLM系列在内的多个知名模型的后训练过程中。",2,"2026-06-11 03:40:16","high_star"]