[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80077":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":15,"lastSyncTime":27,"discoverSource":28},80077,"SkillOpt","mitkox\u002FSkillOpt","mitkox","SkillOpt with local AI is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.",null,"Python",75,19,1,0,2,12,6,3.9,"MIT License",false,"main",true,[],"2026-06-12 02:03:57","# SkillOpt: local-first skill optimization for agent workflows\n\n*Optimize a skill document \u002F system prompt like you would optimize code: iterate, evaluate, and keep the model weights frozen.*\n\n[![Python 3.10+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10%2B-blue.svg)](https:\u002F\u002Fwww.python.org\u002F) [![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](LICENSE) [![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-b31b1b)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.23904)\n\nThis open-source fork is focused on **local AI workflows**:\n\n- run SkillOpt against any **OpenAI-compatible local server**\n- use the included **DotNetDebug** example for cheap end-to-end smoke tests\n- keep **private configs, outputs, and secrets out of git**\n- document the local path first, while still supporting cloud backends\n\n> **Important:** SkillOpt optimizes a **skill document \u002F system prompt**, not model weights. Your model stays frozen; SkillOpt improves the instructions it receives.\n\n## What you can do with this repo\n\n- optimize prompts\u002Fskills for benchmarked agent tasks\n- compare skill revisions with validation-gated training loops\n- run local experiments through `openai_compat`\n- inspect generated skills, histories, patches, and evaluation summaries\n\n## Local-first quick start\n\n### 1) Clone and install\n\n**Requirements:** Python 3.10+\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmitkox\u002FSkillOpt.git\ncd SkillOpt\n\npython -m venv .venv\nsource .venv\u002Fbin\u002Factivate\n\npip install -e .\n\n# Optional extras:\npip install -e \".[webui]\"\npip install -e \".[alfworld]\"\n```\n\nIf you install the ALFWorld extra, also download its assets:\n\n```bash\nalfworld-download\n```\n\n### 2) Point SkillOpt at your local model server\n\nCopy the environment template and load it:\n\n```bash\ncp .env.example .env\nset -a\nsource .env\nset +a\n```\n\nThe default local workflow expects an OpenAI-compatible endpoint such as **llama.cpp server**, **vLLM**, **LM Studio**, **Ollama's OpenAI bridge**, or your own local server.\n\n```bash\nexport OPENAI_COMPAT_BASE_URL=\"http:\u002F\u002Flocalhost:8000\u002Fv1\"\nexport OPENAI_COMPAT_API_KEY=\"local\"\n```\n\nThe included local sample config is:\n\n- config: `configs\u002Fdotnetdebug\u002Flocal_mitko.yaml`\n- backend: `openai_compat`\n- default model name: `mitko`\n- sample dataset: `data\u002Fdotnetdebug\u002Ftasks.json`\n- seed skill: `skillopt\u002Fenvs\u002Fdotnetdebug\u002Fskills\u002Finitial.md`\n\nIf your server exposes a different model name, change `model.optimizer` and `model.target` in the config or override them with `--cfg-options`.\n\n### 3) Run the included smoke test\n\nThis is the fastest way to verify the local setup end to end.\n\n```bash\npython scripts\u002Ftrain.py \\\n  --config configs\u002Fdotnetdebug\u002Flocal_mitko.yaml \\\n  --cfg-options \\\n    train.num_epochs=1 \\\n    train.batch_size=2 \\\n    gradient.minibatch_size=2 \\\n    gradient.analyst_workers=1 \\\n    env.workers=1 \\\n    env.limit=2 \\\n    optimizer.learning_rate=2 \\\n    env.out_root=outputs\u002Fdotnetdebug_smoke\n```\n\nInspect the main artifact at:\n\n- `outputs\u002Fdotnetdebug_smoke\u002Fbest_skill.md`\n\nOther useful artifacts:\n\n- `outputs\u002Fdotnetdebug_smoke\u002Fhistory.json`\n- `outputs\u002Fdotnetdebug_smoke\u002Fsummary.json` (if present)\n- `outputs\u002Fdotnetdebug_smoke\u002Fsteps\u002F`\n\n### 4) Evaluate a trained skill\n\n```bash\npython scripts\u002Feval_only.py \\\n  --config configs\u002Fdotnetdebug\u002Flocal_mitko.yaml \\\n  --skill outputs\u002Fdotnetdebug_smoke\u002Fbest_skill.md \\\n  --split test \\\n  --cfg-options \\\n    env.limit=2 \\\n    env.workers=1 \\\n    env.out_root=outputs\u002Fdotnetdebug_eval_smoke\n```\n\n## Optional cloud backends\n\nLocal is the default path in this fork, but SkillOpt also supports hosted backends.\n\n### Azure OpenAI\n\n```bash\nexport AZURE_OPENAI_ENDPOINT=\"https:\u002F\u002Fyour-resource.openai.azure.com\u002F\"\n\n# Option 1: API key auth\nexport AZURE_OPENAI_API_KEY=\"your-key\"\n\n# Option 2: Azure CLI auth\nexport AZURE_OPENAI_AUTH_MODE=\"azure_cli\"\n```\n\n### OpenAI\n\n```bash\nexport OPENAI_API_KEY=\"sk-...\"\n```\n\n### Anthropic Claude\n\n```bash\nexport ANTHROPIC_API_KEY=\"sk-ant-...\"\n```\n\n### Qwen-compatible local backend\n\n```bash\nexport QWEN_CHAT_BASE_URL=\"http:\u002F\u002Flocalhost:8000\u002Fv1\"\nexport QWEN_CHAT_MODEL=\"Qwen\u002FQwen3.5-4B\"\n```\n\n## Supported benchmarks\n\n| Benchmark | Type | Config |\n|---|---|---|\n| SearchQA | QA | `configs\u002Fsearchqa\u002Fdefault.yaml` |\n| ALFWorld | Embodied agent | `configs\u002Falfworld\u002Fdefault.yaml` |\n| DocVQA | Document QA | `configs\u002Fdocvqa\u002Fdefault.yaml` |\n| LiveMathematicianBench | Math | `configs\u002Flivemathematicianbench\u002Fdefault.yaml` |\n| SpreadsheetBench | Code generation | `configs\u002Fspreadsheetbench\u002Fdefault.yaml` |\n| OfficeQA | Tool-augmented QA | `configs\u002Fofficeqa\u002Fdefault.yaml` |\n| DotNetDebug | C# debugging example | `configs\u002Fdotnetdebug\u002Fdefault.yaml` |\n\n## Data preparation\n\nSkillOpt expects data in a split directory with `train\u002F`, `val\u002F`, and `test\u002F` subdirectories, each containing a JSON file such as `items.json`.\n\n```text\ndata\u002Fmy_split\u002F\n├── train\u002Fitems.json\n├── val\u002Fitems.json\n└── test\u002Fitems.json\n```\n\nEach JSON file is an array of task items. The exact schema depends on the benchmark. For example, SearchQA items look like:\n\n```json\n[\n  {\n    \"id\": \"unique_item_id\",\n    \"question\": \"Who wrote the novel ...\",\n    \"context\": \"[DOC] relevant passage text ...\",\n    \"answers\": [\"expected answer\"]\n  }\n]\n```\n\nSee `skillopt\u002Fenvs\u002F\u003Cbenchmark>\u002Fdataloader.py` for benchmark-specific formats.\n\n> **Note:** Most benchmark datasets are not included in this repository. The bundled exception is `data\u002Fdotnetdebug\u002Ftasks.json`, which exists specifically to support a runnable local smoke test.\n\n## Common CLI arguments\n\n| Argument | Description | Example |\n|---|---|---|\n| `--config` | Benchmark config YAML | `configs\u002Fdotnetdebug\u002Flocal_mitko.yaml` |\n| `--split_dir` | Path to data split directory | `\u002Fpath\u002Fto\u002Fsplit` |\n| `--skill` | Skill document to evaluate | `outputs\u002Fmy_run\u002Fbest_skill.md` |\n| `--split` | Split to evaluate | `test` |\n| `--cfg-options` | Inline config overrides | `env.limit=2 env.workers=1` |\n\n## Output structure\n\nEach run writes to a structured output directory:\n\n```text\noutputs\u002F\u003Crun_name>\u002F\n├── config.json             # Flattened runtime config\n├── history.json            # Per-step training history\n├── runtime_state.json      # Resume checkpoint\n├── best_skill.md           # Best validated skill document\n├── skills\u002Fskill_vXXXX.md   # Skill snapshot per step\n├── steps\u002Fstep_XXXX\u002F        # Per-step artifacts\n├── slow_update\u002Fepoch_XX\u002F   # Slow-update logs\n└── meta_skill\u002Fepoch_XX\u002F    # Meta-skill logs\n```\n\nRe-running the same command resumes from the last completed step when possible.\n\n## WebUI\n\nLaunch the optional monitoring dashboard:\n\n```bash\npython -m skillopt_webui.app\n```\n\nCommon flags:\n\n| Flag | Default | Description |\n|---|---|---|\n| `--port` | 7860 | Server port |\n| `--host` | `0.0.0.0` | Bind address |\n| `--share` | off | Create a public Gradio share link |\n\n## Research background\n\nThis repo is grounded in the original SkillOpt research. If you want the paper\u002Fdemo context, see:\n\n- Project page: https:\u002F\u002Fmicrosoft.github.io\u002FSkillOpt\u002F\n- Paper: https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.23904\n- Demo video: https:\u002F\u002Fyoutu.be\u002FJUBMDTCiM0M\n\n## Citation\n\n```bibtex\n@article{skillopt2026,\n  title={SKILLOPT: Executive Strategy for Self-Evolving Agent Skills},\n  author={SkillOpt Team},\n  year={2026}\n}\n```\n\n","SkillOpt 是一个文本空间优化工具，通过轨迹驱动的编辑、验证门控更新以及可部署的最佳技能文档（best_skill.md）来为冻结的大规模语言模型代理训练可复用的自然语言技能。项目采用Python编写，支持与任何OpenAI兼容的本地服务器对接，并提供DotNetDebug示例以低成本完成端到端测试。SkillOpt专注于保持模型权重不变的情况下优化系统提示或技能文档，适合需要在本地环境中迭代和评估自然语言处理任务质量的场景使用，同时确保私有配置和输出不被上传至云端。此外，该项目还允许用户比较不同版本的技能修订并通过验证门控训练循环进行调整，非常适合希望在控制成本的同时提高其AI代理性能的研究人员和开发者。","2026-06-11 03:59:09","CREATED_QUERY"]