[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79088":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":24,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},79088,"SkillOpt","microsoft\u002FSkillOpt","microsoft","SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.",null,"Python",5656,559,32,6,0,487,794,5575,1461,114.24,"MIT License",false,"main",true,[],"2026-06-12 04:01:24","# SkillOpt: Executive Strategy for Self-Evolving Agent Skills\n\n*Train agent skills like you train neural networks — with epochs, (mini-)batchsize, learning rates, and validation gates — but without touching model weights.*\n\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject%20Page-SkillOpt-8dbb3c)](https:\u002F\u002Fmicrosoft.github.io\u002FSkillOpt\u002F) [![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-b31b1b)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.23904) [![Project Video](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject%20Video-Watch%20Demo-ff0000)](https:\u002F\u002Fyoutu.be\u002FJUBMDTCiM0M) [![Python 3.10+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10%2B-blue.svg)](https:\u002F\u002Fwww.python.org\u002F) [![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](LICENSE)\n\n## 🎬 SkillOpt Demo Video\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Feb12d3bc-371c-467f-904d-91b61f339ed7\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fyoutu.be\u002FJUBMDTCiM0M\">\u003Cb>▶ Watch the full demo on YouTube\u003C\u002Fb>\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n## Install\n\n**Requirements:** Python 3.10+\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSkillOpt.git\ncd SkillOpt\npip install -e .\n\n# For ALFWorld benchmark (optional):\npip install -e \".[alfworld]\"\nalfworld-download\n```\n\n### Configure API Credentials\n\n```bash\ncp .env.example .env\n# Edit .env with your API credentials, then:\nsource .env\n```\n\n**Azure OpenAI** (recommended):\n```bash\nexport AZURE_OPENAI_ENDPOINT=\"https:\u002F\u002Fyour-resource.openai.azure.com\u002F\"\n# Option 1: API key auth\nexport AZURE_OPENAI_API_KEY=\"your-key\"\n# Option 2: Azure CLI auth (no API key needed)\nexport AZURE_OPENAI_AUTH_MODE=\"azure_cli\"\n```\n\n> **Note:** `AZURE_OPENAI_ENDPOINT` is always required. Without it, all LLM calls will fail.\n\n**OpenAI** directly:\n```bash\nexport OPENAI_API_KEY=\"sk-...\"\n```\n\n**Anthropic Claude**:\n```bash\nexport ANTHROPIC_API_KEY=\"sk-ant-...\"\n```\n\n**Qwen (local vLLM)**:\n```bash\nexport QWEN_CHAT_BASE_URL=\"http:\u002F\u002Flocalhost:8000\u002Fv1\"\nexport QWEN_CHAT_MODEL=\"Qwen\u002FQwen3.5-4B\"\n```\n\n---\n\n## Data Preparation\n\nSkillOpt expects data in a **split directory** with `train\u002F`, `val\u002F`, `test\u002F` subdirectories, each containing a JSON file (e.g., `items.json`).\n\n```\ndata\u002Fmy_split\u002F\n├── train\u002Fitems.json\n├── val\u002Fitems.json\n└── test\u002Fitems.json\n```\n\nEach JSON file is an array of task items. The required fields depend on the benchmark. For example, SearchQA items look like:\n\n```json\n[\n  {\n    \"id\": \"unique_item_id\",\n    \"question\": \"Who wrote the novel ...\",\n    \"context\": \"[DOC] relevant passage text ...\",\n    \"answers\": [\"expected answer\"]\n  }\n]\n```\n\nSee `skillopt\u002Fenvs\u002F\u003Cbenchmark>\u002Fdataloader.py` for the exact format each benchmark expects.\n\n> **Note:** Benchmark datasets are not included in this repository. Prepare your own data following the format above.\n\n### Supported Benchmarks\n\n| Benchmark | Type | Config |\n|---|---|---|\n| SearchQA | QA | `configs\u002Fsearchqa\u002Fdefault.yaml` |\n| ALFWorld | Embodied agent | `configs\u002Falfworld\u002Fdefault.yaml` |\n| DocVQA | Document QA | `configs\u002Fdocvqa\u002Fdefault.yaml` |\n| LiveMathematicianBench | Math | `configs\u002Flivemathematicianbench\u002Fdefault.yaml` |\n| SpreadsheetBench | Code generation | `configs\u002Fspreadsheetbench\u002Fdefault.yaml` |\n| OfficeQA | Tool-augmented QA | `configs\u002Fofficeqa\u002Fdefault.yaml` |\n\n---\n\n## Quick Start\n\n### Training\n\n```bash\n# Minimal example — train on SearchQA:\npython scripts\u002Ftrain.py \\\n    --config configs\u002Fsearchqa\u002Fdefault.yaml \\\n    --split_dir \u002Fpath\u002Fto\u002Fyour\u002Fsearchqa_split \\\n    --azure_openai_endpoint https:\u002F\u002Fyour-resource.openai.azure.com\u002F \\\n    --optimizer_model gpt-5.5 \\\n    --target_model gpt-5.5\n\n# Train on LiveMathematicianBench:\npython scripts\u002Ftrain.py \\\n    --config configs\u002Flivemathematicianbench\u002Fdefault.yaml \\\n    --split_dir \u002Fpath\u002Fto\u002Fyour\u002Flivemath_split \\\n    --azure_openai_endpoint https:\u002F\u002Fyour-resource.openai.azure.com\u002F \\\n    --optimizer_model gpt-5.5 \\\n    --target_model gpt-5.5\n\n# Train on ALFWorld:\npython scripts\u002Ftrain.py \\\n    --config configs\u002Falfworld\u002Fdefault.yaml \\\n    --split_dir \u002Fpath\u002Fto\u002Fyour\u002Falfworld_split \\\n    --azure_openai_endpoint https:\u002F\u002Fyour-resource.openai.azure.com\u002F \\\n    --optimizer_model gpt-5.5 \\\n    --target_model gpt-5.5\n```\n\nKey CLI arguments:\n\n| Argument | Description | Example |\n|---|---|---|\n| `--config` | Benchmark config YAML | `configs\u002Fsearchqa\u002Fdefault.yaml` |\n| `--split_dir` | Path to data split directory | `\u002Fpath\u002Fto\u002Fsplit` |\n| `--azure_openai_endpoint` | Azure OpenAI endpoint URL | `https:\u002F\u002Fyour-resource.openai.azure.com\u002F` |\n| `--optimizer_model` | Optimizer model deployment name | `gpt-5.5` |\n| `--target_model` | Target model deployment name | `gpt-5.5` |\n| `--num_epochs` | Number of training epochs | `4` |\n| `--batch_size` | Batch size per step | `40` |\n| `--workers` | Parallel rollout workers | `8` |\n| `--out_root` | Output directory | `outputs\u002Fmy_run` |\n\n### Eval Only\n\nEvaluate a trained skill on specific data splits without training:\n\n```bash\n# Evaluate on test set only:\npython scripts\u002Feval_only.py \\\n  --config configs\u002Fsearchqa\u002Fdefault.yaml \\\n  --skill outputs\u002Fmy_run\u002Fbest_skill.md \\\n  --split valid_unseen \\\n  --split_dir \u002Fpath\u002Fto\u002Fsearchqa_split \\\n  --azure_openai_endpoint https:\u002F\u002Fyour-resource.openai.azure.com\u002F\n\n# Evaluate on all splits (train + val + test):\npython scripts\u002Feval_only.py \\\n  --config configs\u002Fsearchqa\u002Fdefault.yaml \\\n  --skill outputs\u002Fmy_run\u002Fbest_skill.md \\\n  --split all \\\n  --split_dir \u002Fpath\u002Fto\u002Fsearchqa_split \\\n  --azure_openai_endpoint https:\u002F\u002Fyour-resource.openai.azure.com\u002F\n```\n\n| Split | Description |\n|---|---|\n| `valid_unseen` | Test set |\n| `valid_seen` | Validation set |\n| `train` | Training set |\n| `all` | All splits combined (default) |\n\n### Output Structure\n\nEach run writes to a structured output directory:\n\n```\noutputs\u002F\u003Crun_name>\u002F\n├── config.json              # Flattened runtime config\n├── history.json             # Per-step training history\n├── runtime_state.json       # Resume checkpoint\n├── best_skill.md            # Best validated skill document\n├── skills\u002Fskill_vXXXX.md   # Skill snapshot per step\n├── steps\u002Fstep_XXXX\u002F        # Per-step artifacts (patches, evals)\n├── slow_update\u002Fepoch_XX\u002F   # Slow update logs\n└── meta_skill\u002Fepoch_XX\u002F    # Meta skill logs\n```\n\nRe-running the same command auto-resumes from the last completed step.\n\n---\n\n## WebUI\n\nLaunch the monitoring dashboard (optional):\n\n```bash\npip install -e \".[webui]\"\npython -m skillopt_webui.app\n```\n\n| Flag | Default | Description |\n|---|---|---|\n| `--port` | 7860 | Server port |\n| `--host` | `0.0.0.0` | Bind address |\n| `--share` | off | Create a public Gradio share link |\n\n```bash\n# With public share link (useful for remote servers)\npython -m skillopt_webui.app --share\n```\n\n---\n\n## Citation\n\n```bibtex\n@misc{yang2026skilloptexecutivestrategyselfevolving,\n      title={SkillOpt: Executive Strategy for Self-Evolving Agent Skills}, \n      author={Yifan Yang and Ziyang Gong and Weiquan Huang and Qihao Yang and Ziwei Zhou and Zisu Huang and Yan Li and Xuemei Gao and Qi Dai and Bei Liu and Kai Qiu and Yuqing Yang and Dongdong Chen and Xue Yang and Chong Luo},\n      year={2026},\n      eprint={2605.23904},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.23904}\n}\n```\n\n","SkillOpt 是一个文本空间优化器，通过轨迹驱动的编辑、验证门控更新和可部署的最佳技能文档来训练冻结的大语言模型代理的可重用自然语言技能。其核心功能包括使用类似神经网络训练的方法（如周期、批量大小、学习率等）对代理技能进行训练，但无需调整模型权重。项目采用Python编写，并支持多种API接口配置以适应不同场景需求。SkillOpt 适用于需要提升大语言模型在特定任务上表现的情况，例如问答系统、文档理解和数学问题解决等领域，特别适合那些希望增强现有模型能力而不直接修改模型参数的研究者和开发者。",2,"2026-06-11 03:57:28","CREATED_QUERY"]