[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81772":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":16,"starSnapshotCount":16,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},81772,"skill-up","alibaba\u002Fskill-up","alibaba","A CLI evaluation framework to make your Agent Skill Up.","https:\u002F\u002Falibaba.github.io\u002Fskill-up\u002F",null,"Go",35,6,22,7,0,5,10,13,15,58.84,"Apache License 2.0",false,"main",true,[27,28,29,7,30],"agent-skills","ai","ai-agents","skills","2026-06-12 04:01:35","\u003Cdiv align=\"center\">\n  \u003Cp align=\"center\">\n    \u003Cimg src=\"assets\u002Flogo.png\" alt=\"skill-up logo\" width=\"150\" \u002F>\n  \u003C\u002Fp>\n\n  \u003Ch1>skill-up\u003C\u002Fh1>\n\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Falibaba\u002Fskill-up\u002Factions\">\n      \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Falibaba\u002Fskill-up\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg\" alt=\"CI\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fdeepwiki.com\u002Falibaba\u002Fskill-up\">\n      \u003Cimg src=\"https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg\" alt=\"Ask DeepWiki\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\".\u002F.github\u002Fbadges\u002Fcoverage.json\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fraw.githubusercontent.com\u002Falibaba\u002Fskill-up\u002Fbadges\u002F.github\u002Fbadges\u002Fcoverage.json\" alt=\"Coverage\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgo.dev\u002F\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fgo-%3E%3D1.25-blue\" alt=\"Go Version\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"LICENSE\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-green\" alt=\"License\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgoreportcard.com\u002Freport\u002Fgithub.com\u002Falibaba\u002Fskill-up\">\n      \u003Cimg src=\"https:\u002F\u002Fgoreportcard.com\u002Fbadge\u002Fgithub.com\u002Falibaba\u002Fskill-up\" alt=\"Go Report Card\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Falibaba\u002Fskill-up\u002Freleases\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Falibaba\u002Fskill-up\" alt=\"Release\" \u002F>\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n\n  \u003Cp align=\"center\">\n    \u003Cb>English\u003C\u002Fb> | \u003Ca href=\".\u002FREADME.zh.md\">中文\u003C\u002Fa>\n  \u003C\u002Fp>\n\n  \u003Cp align=\"center\">\n    📖 \u003Ca href=\"https:\u002F\u002Falibaba.github.io\u002Fskill-up\u002F\">User Manual\u003C\u002Fa> · \u003Ca href=\"https:\u002F\u002Falibaba.github.io\u002Fskill-up\u002Fzh\u002F\">用户手册\u003C\u002Fa>\n  \u003C\u002Fp>\n\n  \u003Chr \u002F>\n\u003C\u002Fdiv>\n\n## Overview\n\n**skill-up** is a CLI evaluation framework for Agent Skill developers. Declare your eval environment, dependencies, test cases, and grading strategy in `evals\u002Feval.yaml` and `evals\u002Fcases\u002F*.yaml`, then run evaluations locally or in CI to generate structured reports.\n\n> [!WARNING]\n> This project is still in an **early evolution** stage: the code is not yet fully stable, and some CLI commands, configuration fields, and public APIs may still change in future releases. Please review the [CHANGELOG](CHANGELOG.md) and verify compatibility before using it in production.\n\n## Features\n\n- **Declarative Eval Config**: Define evaluation environment, engine, model, and cases through YAML (`eval.yaml` + `cases\u002F*.yaml`).\n- **Multi-Engine Support**: Works with Qoder CLI, Claude Code, and Codex as Agent Engines.\n- **Flexible Judging**: Supports `rule_based`, `script`, and `agent_judge` evaluation strategies.\n- **Structured Reports**: Outputs Anthropic-compatible `grading.json`, `benchmark.json`, `benchmark.md`, plus `result.json`, JUnit XML, and HTML reports.\n- **Anthropic Compatible**: Import `evals.json` via `skill-up import`, or auto-detect with `--auto`.\n- **CI-Ready**: Designed for local development and continuous integration pipelines.\n\n## Why skill-up\n\nThe official [Agent Skills evaluation guide](https:\u002F\u002Fagentskills.io\u002Fskill-creation\u002Fevaluating-skills) describes the right evaluation loop: write realistic cases, run with and without the Skill, grade outputs, aggregate results, and iterate. `skill-up` turns that workflow into a reusable CLI:\n\n- Replaces ad hoc run folders with a declarative `eval.yaml` + `cases\u002F*.yaml` format.\n- Automates workspace setup, Skill installation, Agent Engine invocation, judging, and report generation.\n- Supports multiple engines (`claude_code`, `codex`, `qodercli`) instead of tying the workflow to one client.\n- Keeps compatibility with Anthropic-style `evals.json` while adding richer judges, CI-friendly commands, and structured reports.\n\n## Recommended Usage: AI-Assisted with skill-upper\n\nFor the best experience, use **skill-upper** — the Agent Skill shipped in this\nrepository. It lets you ask an AI agent to scaffold, validate, run, and explain\nevals instead of hand-writing every YAML file first.\n\n### 1. Install the `skill-upper` Agent Skill\n\nRecommended: install it with the `skills` CLI:\n\n```bash\n# Codex, global install\nnpx skills add https:\u002F\u002Fgithub.com\u002Falibaba\u002Fskill-up\u002Ftree\u002Fmain\u002Fskills\u002Fskill-upper -g -a codex -y\n\n# Claude Code, global install\nnpx skills add https:\u002F\u002Fgithub.com\u002Falibaba\u002Fskill-up\u002Ftree\u002Fmain\u002Fskills\u002Fskill-upper -g -a claude-code -y\n```\n\nYou do not need to install `skill-up` before installing this Skill.\n`skill-upper` checks whether the `skill-up` command is available when it runs\nand guides the agent through installation if it is missing.\n\n### 2. Add and run evals\n\nOpen the target Skill project in your AI agent. The target project should have\nthis shape:\n\n```text\nmy-skill\u002F\n  SKILL.md\n```\n\nThen ask the agent something concrete:\n\n```text\nUse skill-upper to add evals for this Skill.\nAdd this evaluation case:\n- Input: write a hello world program.\n- Evaluation: check that the output contains hello and world.\n\nAfter that run skill-up to validate and run.\n```\n\nThe agent should create files like:\n\n```text\nmy-skill\u002F\n  SKILL.md\n  evals\u002F\n    eval.yaml\n    cases\u002F\n      basic.yaml\nmy-skill-workspace\u002F\n  iteration-1\u002F\n    result.json\n```\n\nWhen `evals\u002Feval.yaml` lives under a directory containing `SKILL.md`,\n`skill-up` automatically installs that local Skill for the run, so you usually\ndo not need to list the Skill path manually in `eval.yaml`.\n\n## Installation\n\nInstall with the script:\n\n```bash\ncurl -fsSL https:\u002F\u002Fraw.githubusercontent.com\u002Falibaba\u002Fskill-up\u002Fmain\u002Finstall.sh | bash\n```\n\nThe installer downloads the matching binary from [GitHub Releases](https:\u002F\u002Fgithub.com\u002Falibaba\u002Fskill-up\u002Freleases).\n\nTo build locally from a checkout, install [Go](https:\u002F\u002Fgo.dev\u002Fdl\u002F) 1.25 or later:\n\n```bash\nmake build\n# or\ngo build -o bin\u002Fskill-up .\u002Fcmd\u002Fskill-up\n```\n\n## Quick Start\n\n### 1. Create Eval Config\n\nIn your Skill directory, create `evals\u002Feval.yaml`:\n\n```yaml\nschema_version: v1alpha1\n\nenvironment:\n  type: none\n\nengine:\n  name: claude_code\n\ncases:\n  files:\n    - evals\u002Fcases\u002Fhello-world.yaml\n```\n\nWhen `evals\u002Feval.yaml` lives under a directory that contains `SKILL.md`, skill-up installs the current Skill automatically. The omitted fields use defaults: JSON report output, `timeout_seconds: 300`, `max_turns: 10`, and `parallelism: 1`.\n\nFor the full `eval.yaml` schema, see [Writing Evals](docs\u002Fguide\u002Fwriting-evals.md).\n\n### 2. Write an Eval Case\n\nCreate `evals\u002Fcases\u002Fhello-world.yaml`:\n\n```yaml\ninput:\n  prompt: |\n    Please generate a Hello World program\n\nexpect:\n  must_contain:\n    - \"Hello\"\n    - \"World\"\n```\n\nThe case `id` defaults to the filename (`hello-world`). Add a `judge` block only when you need script-based or agent-based grading.\n\n### 3. Validate Config\n\n```bash\nskill-up validate\n```\n\nThis step is optional, but useful before the first run: it checks `eval.yaml` and all referenced case files without starting an Agent Engine.\n\n### 4. Run Evaluation\n\n```bash\nskill-up run\n```\n\nResults are written to `\u003Cskill-name>-workspace\u002Fiteration-1\u002F`.\n\nFor engineering conventions (Conventional Commits, Git hooks, golangci-lint), see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## User config\n\nskill-up auto-loads an optional user-level config that supplies default OpenTelemetry env vars and per-environment runtime kwargs. The embedded defaults are empty; downstream consumers maintain their own config file.\n\n### Discovery chain (lowest to highest precedence)\n\n```\nembed (empty) \u003C user (~\u002F.config\u002Fskill-up\u002Fconfig.yaml) \u003C project ($PWD\u002F.skill-up.yaml) \u003C explicit (--config)\n```\n\n| Source     | Path                                                                                                    |\n| ---------- | ------------------------------------------------------------------------------------------------------- |\n| `embed`    | empty `Config{}` — no vendor defaults baked in                                                          |\n| `user`     | `$SKILL_UP_CONFIG`, else `$XDG_CONFIG_HOME\u002Fskill-up\u002Fconfig.yaml`, else `~\u002F.config\u002Fskill-up\u002Fconfig.yaml` |\n| `project`  | `$PWD\u002F.skill-up.yaml`                                                                                   |\n| `explicit` | `--config \u003Cpath>` (must exist)                                                                          |\n\nMissing files at the `user` and `project` layers are silently skipped; a missing `--config` path is a hard error. A corrupt config at any layer also fails the run.\n\n### Quickstart\n\n```bash\nskill-up init                            # writes a template to ~\u002F.config\u002Fskill-up\u002Fconfig.yaml (XDG-aware)\nskill-up init --local                    # writes a template to $PWD\u002F.skill-up.yaml\nskill-up init --print                    # prints the template to stdout\nskill-up init --force                    # overwrite an existing file\nskill-up init --config foo.yaml          # reads foo.yaml, writes it to ~\u002F.config\u002Fskill-up\u002Fconfig.yaml\nskill-up init --config foo.yaml --local  # reads foo.yaml, writes it to $PWD\u002F.skill-up.yaml\n```\n\nWith `--config \u003Cpath>`, `init` reads that file (validating it as a skill-up\nconfig) and writes its raw bytes to the target — comments and formatting are\npreserved. Without `--config`, `init` writes a commented YAML template.\n\n### Schema\n\n```yaml\nschema_version: v1alpha1\nkind: SkillUpConfig\n\ntelemetry:\n  service_name: skill-up                              # OTEL_SERVICE_NAME\n  traces_exporter: otlp                                 # OTEL_TRACES_EXPORTER\n  traces:\n    endpoint: http:\u002F\u002Flocalhost:4317                     # OTEL_EXPORTER_OTLP_TRACES_ENDPOINT (4317 for grpc, 4318\u002Fv1\u002Ftraces for http\u002Fprotobuf)\n    protocol: grpc                                      # OTEL_EXPORTER_OTLP_TRACES_PROTOCOL (grpc | http\u002Fprotobuf); skill-up defaults to grpc\n  resource_attributes:                                  # serialized into OTEL_RESOURCE_ATTRIBUTES\n    deployment.environment: local\n  verbose: false                                        # if true, also enables OTEL_LOG_* payload capture\n\nenv:                                                    # arbitrary defaults, applied only-if-unset\n  OTEL_EXPORTER_OTLP_HEADERS: authorization=${OTLP_TOKEN}\n\nruntime_kwargs:                                         # keyed by environment.type\n  opensandbox:\n    base_url: http:\u002F\u002Flocalhost:8080\n    # extensions: '{}'\n```\n\n### Precedence\n\nFor environment variables: any value already set in the process environment wins; the config only fills in missing keys.\n\nFor `runtime_kwargs`: explicit `--runtime-kwarg` on `run` > `eval.yaml` `environment.kwargs` > user-config `runtime_kwargs[environment.type]`.\n\n### Secrets\n\nPrefer `${ENV_VAR}` references inside the config file rather than baking secret literals. The redaction mechanism (`userconfig.Redact`) masks fields tagged `secret:\"true\"` when printing; currently no Config field carries the tag, but the mechanism is in place for future fields.\n\n## Importing `evals.json`\n\nUse `skill-up import` to migrate an Anthropic-compatible `evals.json` into the YAML layout used by this repo:\n\n```bash\nskill-up import .\u002Fevals\u002Fevals.json --output .\u002Fevals\n```\n\n## CLI Overview\n\n| Command                              | Description                                 |\n| ------------------------------------ | ------------------------------------------- |\n| `skill-up run [path]`                | Run evaluation cases and produce reports    |\n| `skill-up validate [path]`           | Validate `eval.yaml` and case files         |\n| `skill-up list-cases [path]`         | List all cases referenced by the config     |\n| `skill-up report \u003Cresult.json>`      | Generate reports from a previous run        |\n| `skill-up import \u003Cevals.json>`       | Import Anthropic `evals.json` to YAML cases |\n| `skill-up debug judge \u003Cinput.json>`  | Debug judge module with a JSON input        |\n| `skill-up debug report \u003Cinput.json>` | Debug report module with a JSON input       |\n\n## License\n\nApache License 2.0 — see [LICENSE](LICENSE).\n","skill-up 是一个用于AI代理技能评估的CLI框架。其核心功能包括通过YAML文件声明式定义评估环境、依赖项、测试案例及评分策略，支持多种引擎如Qoder CLI、Claude Code和Codex，并提供灵活的评分机制，包括基于规则、脚本或代理判断等多种方式。此外，它能生成结构化的报告，兼容Anthropic标准，并且设计上充分考虑了本地开发与持续集成流水线的需求。适用于需要对AI代理技能进行系统化评估和持续优化的场景，比如开发新的AI助手能力或是改进现有AI服务的表现。",2,"2026-06-11 04:06:19","CREATED_QUERY"]