[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75181":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":10,"languages":10,"totalLinesOfCode":10,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":13,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},75181,"evals-skills","hamelsmu\u002Fevals-skills","hamelsmu","Skills for AI Evals to compliment the course: AI Evals For Engineers & PMs","https:\u002F\u002Fmaven.com\u002Fparlance-labs\u002Fevals?promoCode=evals-info-url",null,1366,142,14,1,0,32,103,42,19.47,"MIT License",false,"main",true,[],"2026-06-12 02:03:33","# Eval Skills for AI Coding Agents\n\nSkills that guide AI coding agents to help you build LLM evaluations.\n\nThese skills guard against common mistakes I've seen helping 50+ companies and teaching students in our [AI Evals course](https:\u002F\u002Fmaven.com\u002Fparlance-labs\u002Fevals?promoCode=evals-info-url). If you're new to evals, see [questions.md](questions.md) for free resources on the fundamentals.\n\n## New to Evals? Start Here\n\nIf you are new to evals, start with the `eval-audit` skill. Give your coding agent these instructions:\n\n> Install the eval skills plugin from https:\u002F\u002Fgithub.com\u002Fhamelsmu\u002Fevals-skills, then run \u002Fevals-skills:eval-audit on my eval pipeline. Investigate each diagnostic area using a separate subagent in parallel, then synthesize the findings into a single report. Use other skills in the plugin as recommended by the audit.\n\nThe audit isn't a complete solution, but it will catch common problems we've seen in evals. It will also recommend other skills to use to fix the problems.\n\n## Installation\n\nIn Claude Code, run these two commands:\n\n```bash\n# Step 1: Register the plugin repository\n\u002Fplugin marketplace add hamelsmu\u002Fevals-skills\n\n# Step 2: Install the plugin\n\u002Fplugin install evals-skills@hamelsmu-evals-skills\n```\n\nTo upgrade:\n\n```bash\n\u002Fplugin update evals-skills@hamelsmu-evals-skills\n```\n\nAfter installation, restart Claude Code. The skills will appear as `\u002Fevals-skills:\u003Cskill-name>`.\n\n## Installation (npx skills)\n\nIf you use the open Skills CLI, install from this repo with:\n\n```bash\nnpx skills add https:\u002F\u002Fgithub.com\u002Fhamelsmu\u002Fevals-skills\n```\n\nInstall one skill only:\n\n```bash\nnpx skills add https:\u002F\u002Fgithub.com\u002Fhamelsmu\u002Fevals-skills --skill eval-audit\n```\n\nCheck for updates:\n\n```bash\nnpx skills check\nnpx skills update\n```\n\n## Available Skills\n\n| Skill | What it does |\n|-------|-------------|\n| eval-audit | Audit an eval pipeline and surface problems with prioritized severity |\n| error-analysis | Guide the user through reading traces and categorizing failures |\n| generate-synthetic-data | Create diverse synthetic test inputs using dimension-based tuple generation |\n| write-judge-prompt | Design LLM-as-Judge evaluators for subjective quality criteria |\n| validate-evaluator | Calibrate LLM judges against human labels using data splits, TPR\u002FTNR, and bias correction |\n| evaluate-rag | Evaluate retrieval and generation quality in RAG pipelines |\n| build-review-interface | Build custom annotation interfaces for human trace review |\n\nInvoke a skill with `\u002Fevals-skills:skill-name`, e.g., `\u002Fevals-skills:error-analysis`.\n\n## Write Your Own Skills\n\nThese skills are a starting point and only encode common mistakes that generalize across projects. Skills grounded in your stack, your domain, and your data will outperform them. Start here, then write your own.\n\nThe [meta-skill](meta-skill.md) can help you ground custom skills. \n\n## Beyond These Skills\n\nThese skills handle the parts of eval work that generalize across projects. Much of the process doesn't: production monitoring, CI\u002FCD integration, data analysis, and much more. The [course](https:\u002F\u002Fmaven.com\u002Fparlance-labs\u002Fevals?promoCode=evals-info-url) covers all of it.\n","该项目提供了用于指导AI编码代理构建LLM评估的技能集，旨在帮助工程师和产品经理更好地进行AI评估。其核心功能包括对评估流程进行审计、错误分析、生成合成数据等，通过这些技能可以识别并解决常见的评估问题。技术特点上，项目支持Claude Code环境下的插件安装以及开放Skills CLI的使用，方便用户根据自身需求选择合适的安装方式。适用于需要对AI系统进行质量评估与优化的各种场景，特别是对于刚开始接触AI评估的新手来说非常友好。",2,"2026-06-11 03:52:35","high_star"]