[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79880":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":12,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":13,"forks30d":13,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":13,"starSnapshotCount":13,"syncStatus":14,"lastSyncTime":27,"discoverSource":28},79880,"Skill1","AlphaLab-USTC\u002FSkill1","AlphaLab-USTC",null,"Python",145,7,1,0,2,4,42,6,2.71,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:03:55","# Skill1\n\n**Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning**\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.06130\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.06130-b31b1b?style=for-the-badge&logo=arxiv\" alt=\"arXiv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2605.06130\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-Paper-ffd21e?style=for-the-badge&logo=huggingface\" alt=\"HuggingFace\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-green?style=for-the-badge\" alt=\"License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n## TL;DR\n\nLLM agents can be augmented with a **skill library** — a persistent memory of reusable strategies. Using such a library requires three coupled capabilities: **selecting** a relevant skill, **utilizing** it during execution, and **distilling** new skills from experience. Prior methods optimize these in isolation with separate reward signals, causing conflicting evolution.\n\n**Skill1** trains a single policy (Qwen2.5-7B) via RL (GRPO) to co-evolve all three capabilities using only one task-outcome reward. Credit assignment is achieved by decomposing the reward into a low-frequency trend (credits selection) and high-frequency variation (credits distillation).\n\n---\n\n## How It Works\n\n```\nTask → [Selection] → [Utilization] → [Distillation] → Skill Library\n         ↑                                                   |\n         └───────────────────────────────────────────────────┘\n```\n\n1. **Skill Selection** — Policy generates a query, retrieves candidates via a frozen encoder, and re-ranks them.\n2. **Skill Utilization** — Policy interacts with the environment conditioned on the selected skill.\n3. **Skill Distillation** — Policy reflects on the trajectory and writes a new reusable skill (strategy + scenario description) into the library.\n\nAll three stages are produced by the same policy and optimized by the same task-outcome signal — no auxiliary models, no hand-crafted rewards.\n\n---\n\n## Quick Start\n\n### 1. Install Base Environment\n\n```bash\nconda create -n skill1 python==3.12 -y\nconda activate skill1\n\npip3 install vllm==0.11.0\npip3 install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir\npip install -e .\n```\n\n### 2. Install Task Environments\n\nALFWorld:\n```bash\nconda env create -f agent-alfworld-env.yaml\n```\n\nWebShop:\n```bash\nconda env create -f agent-webshop-env.yaml\n```\n\n### 3. Download Data\n\nWe use download the Alfworld and WebShop data from the original sources: [alfworld\u002Falfworld](https:\u002F\u002Fgithub.com\u002Falfworld\u002Falfworld) | [princeton-nlp\u002FWebShop](https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002FWebShop)\n\n### 4. Run Training\n\n```bash\n# ALFWorld\nbash launch_scripts\u002Falfworld\u002Ftrain_alfworld.sh\n\n# WebShop\nbash launch_scripts\u002Fwebshop\u002Ftrain_webshop.sh\n```\n\n---\n\n## Acknowledgments\n\nThis code is built upon several open-source projects. We thank the authors and contributors of: [verl](https:\u002F\u002Fgithub.com\u002Fverl-project\u002Fverl), [verl-agent](https:\u002F\u002Fgithub.com\u002FlangfengQ\u002Fverl-agent\u002Ftree\u002Fmaster), and [LaMer](https:\u002F\u002Fgithub.com\u002Fmlbio-epfl\u002FLaMer).\n\n## Citation\n\nIf you find our work useful, please consider citing our paper:\n\n```bibtex\n@article{shi2026skill1,\n  title={Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning},\n  author={Shi, Yaorui and Chen, Yuxin and Lu, Zhengxi and Miao, Yuchun and Liu, Shugui and Gu, Qi and Cai, Xunliang and Wang, Xiang and Zhang, An},\n  journal={arXiv preprint arXiv:2605.06130},\n  year={2026}\n}\n```\n","Skill1是一个通过强化学习统一进化技能增强代理的项目。它利用单一策略（Qwen2.5-7B）通过RL（GRPO）同时优化技能选择、利用和提炼三个核心功能，仅依赖于任务结果奖励信号进行信用分配。此方法避免了传统方法中因使用独立奖励信号而产生的冲突。适用于需要持续学习并积累可重用策略的场景，如复杂的环境交互任务。整个过程不需要额外的辅助模型或手动设计的奖励机制，实现了高效的技能管理和应用。","2026-06-11 03:58:22","CREATED_QUERY"]