[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-76179":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":14,"lastSyncTime":30,"discoverSource":31},76179,"SDAR","ZJU-REAL\u002FSDAR","ZJU-REAL","Official code for \"Self-Distilled Agentic Reinforcement Learning\"","https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15155",null,"Python",231,15,2,1,0,3,55,141,20,79.61,"Apache License 2.0",false,"master",true,[],"2026-06-12 04:01:20","\u003Ch1 align=\"center\">\nSelf-Distilled Agentic Reinforcement Learning\n\u003C\u002Fh1>\n\u003Cdiv align='center' style=\"font-size:18px;\">\n\u003Cp>\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15155\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arxiv%3A2605.15155-blue\" alt=\"Paper\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2605.15155\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDaily%20Paper-huggingface-yellow\" alt=\"HF Paper\"\u002F>\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\n## 🔥 Overview\n\nWe introduce **SDAR**, a Self-Distilled Agentic Reinforcement learning method.\n\u003Cdiv align=\"center\" style=\"display:flex; justify-content:center; gap:20px; align-items:flex-start;\">\n  \u003Cimg src=\"docs\u002Fsdar\u002Fsdar_teaser.png\" alt=\"motivation\" style=\"width:40%;\">\n  \u003Cimg src=\"docs\u002Fsdar\u002Fsdart_method.png\" alt=\"method\" style=\"width:58%;\">\n\u003C\u002Fdiv>\n\n\n\n\nSDAR achieves substantial improvements over the standard RL baseline on ALFWorld, WebShop, and Search-QA.\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"docs\u002Fsdar\u002Fmetric.png\" alt=\"Logo\" style=\"width:80%;\">\n\u003C\u002Fdiv>\n\n## 🗞️ News\n- **`2026-5-15`**: 🔥 We released our paper and code.\n\n## 🛠️ Installation\n\n\n### Python environment\n\n```bash\nconda create -n skillzero python==3.12 -y\nconda activate skillzero\n\npip3 install vllm==0.11.0\n\npip3 install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir\npip install -e .\n```\n\nLog in to Weights & Biases if you use WandB logging (scripts pass `trainer.logger=['console','wandb']` in many cases):\n\n```bash\nexport WANDB_API_KEY=your_key_here\n```\n\n### Install Supported Environments\n\n#### 1. ALFWorld\nInstall with pip:\n```bash\npip3 install gymnasium==0.29.1\npip3 install stable-baselines3==2.6.0\npip3 install alfworld\n```\n\nDownload PDDL & Game files and pre-trained MaskRCNN detector (will be stored in `~\u002F.cache\u002Falfworld\u002F`):\n```bash\nalfworld-download -f\n```\n\n#### 2. WebShop\nWebShop requires Python \u003C=3.10, so begin by creating a new environment:\n```bash\nconda create -n verl-webshop python==3.10 -y\nconda activate verl-webshop\n```\n\nInstall WebShop:\n```bash\ncd .\u002Fagent_system\u002Fenvironments\u002Fenv_package\u002Fwebshop\u002Fwebshop\n.\u002Fsetup.sh -d all\n```\n\nAfter WebShop is installed, return to the root directory and install the verl package:\n```bash\ncd repo_root\u002F\npip3 install torch==2.6.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\npip3 install flash-attn==2.7.4.post1 --no-build-isolation\npip3 install -e .\npip3 install vllm==0.8.2\n# spacy 3.7.2 requires typer\u003C0.10.0,>=0.3.0, but you have typer 0.15.2 which is incompatible.\n# weasel 0.3.4 requires typer\u003C0.10.0,>=0.3.0, but you have typer 0.15.2 which is incompatible.\n```\nThe warnings can be safely ignored.\n\n#### 3. Search\n```bash\ncd .\u002Fagent_system\u002Fenvironments\u002Fenv_package\u002Fsearch\u002Fthird_party\npip install -e .\npip install gym==0.26.2\n```\n\nPrepare dataset (data will be saved at `~\u002Fdata\u002FsearchR1_processed_direct`):\n```bash\ncd repo_root\u002F\npython examples\u002Fdata_preprocess\u002Fpreprocess_search_r1_dataset.py\n```\n\n\nSince faiss-gpu is not available via pip, we setup a separate conda environment for the local retrieval server. Running this server will use around 6GB of GPU memory per GPU, so make sure to account for this in your training run configuration. Build Retriever environments:\n```bash\n# Create and activate the retriever environment with Python 3.10\nconda create -n retriever python=3.10 -y\nconda activate retriever\n\n# Install PyTorch (with GPU support) and related libraries\nconda install numpy==1.26.4 # needed to stop incompatible version of numpy from being installed via pip\npip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n\n# Install other Python packages\npip install transformers datasets pyserini huggingface_hub\n\n# Install the GPU version of faiss\nconda install faiss-gpu==1.8.0 -c pytorch -c nvidia -y\n\n# Install the API service framework\npip install uvicorn fastapi\n```\n\nDownload the index:\n```bash\nconda activate retriever\n\nlocal_dir=~\u002Fdata\u002FsearchR1\npython examples\u002Fsearch\u002Fsearchr1_download.py --local_dir $local_dir\ncat $local_dir\u002Fpart_* > $local_dir\u002Fe5_Flat.index\ngzip -d $local_dir\u002Fwiki-18.jsonl.gz\n```\n\nStart the local flat e5 retrieval server: \n```bash\nconda activate retriever\n\n# redirect the output to a file to avoid cluttering the terminal\n# we have observed outputting to the terminal causing spikes in server response times\nbash examples\u002Fsearch\u002Fretriever\u002Fretrieval_launch.sh > retrieval_server.log \n```\n\n\n\n### Training\n\nAll scripts live under `examples\u002F` and assume the repo root as working directory. You can run e.g.:\n\n```bash\nbash examples\u002Fsdar_trainer\u002Frun_alfworld_3b.sh\nbash examples\u002Fsdar_trainer\u002Frun_search_3b.sh\nbash examples\u002Fsdar_trainer\u002Frun_webshop_3b.sh\n```\n\n### Merge checkpoints\n\nSee `scripts\u002Fmodel_merger.py` for FSDP\u002FMegatron merge examples using paths under `.\u002Fcheckpoints\u002F...`.\n\n## ⭐️ Citation\n\nIf you find this project useful, welcome to cite us.\n\n```bibtex\n@misc{lu2026sdar,\n      title={Self-Distilled Agentic Reinforcement Learning}, \n      author={Zhengxi Lu and Zhiyuan Yao and Zhuowen Han and Zi-Han Wang and Jinyang Wu and Qi Gu and Xunliang Cai and Weiming Lu and Jun Xiao and Yueting Zhuang and Yongliang Shen},\n      year={2026},\n      eprint={2605.15155},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15155}, \n}\n```\n\n## 🤝 Acknowledgement\n\nThis project builds on [verl-agent](https:\u002F\u002Fgithub.com\u002FlangfengQ\u002Fverl-agent), [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl), [ALFWorld](https:\u002F\u002Fgithub.com\u002Falfworld\u002Falfworld), [SkillRL](https:\u002F\u002Fgithub.com\u002Faiming-lab\u002FSkillRL), and [Search-R1](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1). We thank the authors of those projects.\n","SDAR是一个自蒸馏代理强化学习方法。该项目通过引入自我蒸馏机制，显著提升了在ALFWorld、WebShop和Search-QA等环境中的性能，相比传统的强化学习基线有明显优势。其核心功能包括高效的策略优化和知识迁移能力，利用Python语言实现，并支持多种实验环境的快速部署与测试。适用于需要改进智能体决策效率及泛化能力的研究场景，尤其是在自然语言处理与复杂任务解决领域。","2026-06-11 03:54:44","CREATED_QUERY"]