[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80710":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":13,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":23,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},80710,"SkillLens","microsoft\u002FSkillLens","microsoft","SkillLens: a framework for studying model-generated agent skills across the full raw experience generation → skill extraction → skill consumption lifecycle. ","https:\u002F\u002Fmicrosoft.github.io\u002FSkillLens\u002F",null,"Python",91,9,44,0,18,47,27,3,"MIT License",false,"main",true,[],"2026-06-12 02:04:05","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"docs\u002Fstatic\u002Fimages\u002Fhero.png\" alt=\"SkillLens — From Raw Experience to Skill Consumption\" width=\"900\"\u002F>\n\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🌐_Project_Page-SkillLens-2563eb?style=for-the-badge)](https:\u002F\u002Fmicrosoft.github.io\u002FSkillLens\u002F)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📄_arXiv-2605.23899-b31b1b?style=for-the-badge)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.23899)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow?style=for-the-badge)](LICENSE)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10+-3776ab?style=for-the-badge&logo=python&logoColor=white)](https:\u002F\u002Fwww.python.org\u002F)\n\n\u003C\u002Fdiv>\n\n## ✨ Overview\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Fstatic\u002Fimages\u002Foverview.png\" alt=\"SkillLens overview\" width=\"820\"\u002F>\n\u003C\u002Fp>\n\n**Skill***Lens* is a framework for systematically studying *model-generated agent skills* across their full lifecycle: **experience generation → skill extraction → skill consumption**. It is built to answer the core question:\n\n> *What makes model-generated skills actually useful to a target model, and what drives skill utility across the experience → extraction → consumption lifecycle?*\n\nThe framework provides:\n\n- 🧪 **Unified trajectory loading** across five agent benchmarks (SWE-bench, ALFWorld, SpreadsheetBench, BFCL v4, SEAL-0)\n- ⚙️ **Two extraction methods** — `sequential` (single-pass baseline) and `parallel` (per-trajectory mode extraction + hierarchical merge, the primary method in the paper)\n- 🚀 **Unified inference CLI** (`skilllens infer`) that runs any benchmark with or without skill injection\n- 📊 **Reproducible evaluation pipeline** for *Extraction Efficacy* and *Target Evolvability* metrics\n\n\n## 🚀 Quick Start\n\n```bash\n# 1. Clone & install\ngit clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSkillLens.git && cd SkillLens\nconda create -n skilllens python=3.10 -y && conda activate skilllens\npip install -e \".[all]\"\n\n# 2. Configure your LLM provider\ncp .env.example .env\n# Edit .env — set OPENAI_API_KEY, or AZURE_OPENAI_ENDPOINT + (AZURE_OPENAI_API_KEY | AZURE_CLIENT_ID)\n\n# 3. Pick a benchmark and run the 4-stage pipeline (ALFWorld as example)\nbash scripts\u002Fsetup_alfworld.sh                                  # one-time data setup\n\n# (a) Raw experience generation\npython -m skilllens infer --benchmark alfworld --model gpt-5.4 \\\n    --num-rounds 1 --workers 16\n\n# (b) Schema normalization (raw → unified Trajectory)\npython -m skilllens convert \\\n    --trajectory-dir inference_output\u002Falfworld\u002F\u003Crun-dir> \\\n    --benchmark alfworld --model-name gpt-5.4 \\\n    -o data\u002Fexperience_pool\u002Falfworld\u002Fmy_pool.json\n\n# (c) Skill extraction\npython -m skilllens extract \\\n    -c configs\u002Fexamples\u002Falfworld_parallel.yaml \\\n    -i data\u002Fexperience_pool\u002Falfworld\u002Fmy_pool.json \\\n    -o extraction_output\u002Falfworld_parallel\u002F\n\n# (d) Skill consumption\nSKILL=$(find extraction_output\u002Falfworld_parallel -name skill_set.json | head -1)\npython -m skilllens infer --benchmark alfworld --model gpt-5.4 \\\n    --num-rounds 1 --workers 16 --skill-set \"$SKILL\"\n```\n\nPer-benchmark prerequisites (data downloads, sandboxes, tool servers) live in each benchmark's README — see the [table below](#-benchmarks).\n\n\n## 🧩 Pipeline\n\nSkillLens organizes every experiment as **four stages**. Each stage has a corresponding CLI subcommand.\n\n| Stage | Subcommand | What it does |\n|------|-----------|--------------|\n| **1. Raw experience generation** | `skilllens infer` | Runs the agent on the benchmark and writes raw trajectories. |\n| **2. Schema normalization** | `skilllens convert` | Converts raw runner outputs into the unified `Trajectory` JSON schema. |\n| **3. Skill extraction** | `skilllens extract` | Distills the experience pool into a `skill_set.json` (sequential or parallel method). |\n| **4. Skill consumption** | `skilllens infer --skill-set` | Re-runs the target model on the same benchmark with the extracted skills injected. |\n\n\n## 📚 Benchmarks\n\nSkillLens ships integrations for five benchmarks. Each one has its own README with the exact prerequisites and step-by-step commands.\n\n| Benchmark | Domain | Details |\n|-----------|--------|---------|\n| **ALFWorld** | Text-based household navigation | [`skilllens\u002Fbenchmarks\u002Falfworld\u002FREADME.md`](skilllens\u002Fbenchmarks\u002Falfworld\u002FREADME.md) |\n| **BFCL v4** | Multi-turn function calling | [`skilllens\u002Fbenchmarks\u002Fbfcl\u002FREADME.md`](skilllens\u002Fbenchmarks\u002Fbfcl\u002FREADME.md) |\n| **SEAL-0** | Web-research agent (LiteResearcher) | [`skilllens\u002Fbenchmarks\u002Fseal0\u002FREADME.md`](skilllens\u002Fbenchmarks\u002Fseal0\u002FREADME.md) |\n| **SpreadsheetBench** | Excel manipulation in a sandboxed Jupyter kernel | [`skilllens\u002Fbenchmarks\u002Fspreadsheetbench\u002FREADME.md`](skilllens\u002Fbenchmarks\u002Fspreadsheetbench\u002FREADME.md) |\n| **SWE-bench Verified** | GitHub bug fixing inside per-task containers | [`skilllens\u002Fbenchmarks\u002Fswebench\u002FREADME.md`](skilllens\u002Fbenchmarks\u002Fswebench\u002FREADME.md) |\n\nFor all benchmarks, the held-out test split is committed under `data\u002Ftest_pool\u002F\u003Cbenchmark>\u002F`.\n\n\n## ⚙️ Configuration\n\nYAML configs (`configs\u002Fexample.yaml`, `configs\u002Fexamples\u002F*.yaml`) describe each extraction run:\n\n```yaml\nllm:\n  provider: \"azure\"            # openai | azure | vllm | gemini\n  model: \"gpt-5.4\"\n\ninput:\n  path: \"data\u002Fexperience_pool\u002Falfworld\u002Fgpt54_baseline.json\"\n  benchmark: \"alfworld\"\n\nextraction:\n  method: \"parallel\"           # sequential | parallel\n  batch_size: 0                # 0 = all trajectories in one batch\n  merge_group_size: 10\n  max_concurrency: 32\n  max_skills: 1\n  max_skill_chars: 3000\n  include_feedback: true\n  max_modes_per_trajectory: 3\n```\n\nFor Azure: set `AZURE_OPENAI_ENDPOINT` + (`AZURE_OPENAI_API_KEY` or `AZURE_CLIENT_ID` for Managed Identity) in `.env`. For per-model endpoint routing, set `AZURE_DEPLOYMENT_MAP` to a JSON dict mapping model name → `{endpoint, api_version}`.\n\n\n## 📄 Citation\n\nIf you find SkillLens useful in your research, please cite:\n\n```bibtex\n@article{huang2026skilllens,\n  title         = {From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills},\n  author        = {Zisu Huang and Jingwen Xu and Yifan Yang and Ziyang Gong and Qihao Yang and Muzhao Tian and Xiaohua Wang and Changze Lv and Xuemei Gao and Qi Dai and Bei Liu and Kai Qiu and Xue Yang and Dongdong Chen and Xiaoqing Zheng and Chong Luo},\n  year          = {2026},\n  journal       = {arXiv preprint arXiv:2605.23899},\n  eprint        = {2605.23899},\n  archivePrefix = {arXiv},\n  url           = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.23899}\n}\n```\n\n\n## 🤝 Contributing\n\nThis project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit [https:\u002F\u002Fcla.opensource.microsoft.com](https:\u002F\u002Fcla.opensource.microsoft.com).\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F). For more information see the [Code of Conduct FAQ](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n## 📬 Support & Security\n\n- **Support:** see [SUPPORT.md](SUPPORT.md)\n- **Reporting security issues:** see [SECURITY.md](SECURITY.md)\n\n## 📜 License\n\nThis project is released under the [MIT License](LICENSE).\n\n## ™️ Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Flegal\u002Fintellectualproperty\u002Ftrademarks\u002Fusage\u002Fgeneral). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.\n","SkillLens 是一个用于研究模型生成代理技能从原始经验生成到技能提取再到技能消费整个生命周期的框架。它支持统一加载五个代理基准（如SWE-bench、ALFWorld等）的数据轨迹，并提供两种技能提取方法：顺序和并行。此外，SkillLens还配备了一个统一的推理命令行界面以及一套可复现的评估流程来衡量提取效率和目标可进化性。该项目适用于需要深入理解或优化AI代理能力的研究场景，特别是当研究重点在于如何使模型生成的技能对目标模型真正有用时。",2,"2026-06-11 04:01:43","CREATED_QUERY"]