[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74868":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":35,"readmeContent":36,"aiSummary":37,"trendingCount":16,"starSnapshotCount":16,"syncStatus":38,"lastSyncTime":39,"discoverSource":40},74868,"autoresearch","uditgoenka\u002Fautoresearch","uditgoenka","Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep\u002FDiscard → Repeat forever.","https:\u002F\u002Fudit.co\u002Fprojects\u002Fautoresearch",null,"JavaScript",4907,369,21,1,0,74,138,496,222,109.7,"MIT License",false,"master",true,[27,28,5,29,30,31,32,33,34],"ai","autonomous-agent","claude","claude-code","iteration","karpathy","productivity","skill","2026-06-12 04:01:16","\u003Cdiv align=\"center\">\n\n# Autoresearch\n\n**Turn [Claude Code](https:\u002F\u002Fdocs.anthropic.com\u002Fen\u002Fdocs\u002Fclaude-code), [OpenCode](https:\u002F\u002Fopencode.ai), or [OpenAI Codex](https:\u002F\u002Fdevelopers.openai.com\u002Fcodex) into a relentless improvement engine.**\n\nBased on [Karpathy's autoresearch](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch) — constraint + mechanical metric + autonomous iteration = compounding gains.\n\n[![Claude Code Skill](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FClaude_Code-Skill-blue?logo=anthropic&logoColor=white)](https:\u002F\u002Fdocs.anthropic.com\u002Fen\u002Fdocs\u002Fclaude-code)\n[![OpenCode](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpenCode-Skill-purple)](https:\u002F\u002Fopencode.ai)\n[![Codex](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCodex-Skill-green?logo=openai&logoColor=white)](https:\u002F\u002Fdevelopers.openai.com\u002Fcodex)\n[![Version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fversion-2.0.03-blue.svg)](https:\u002F\u002Fgithub.com\u002Fuditgoenka\u002Fautoresearch\u002Freleases)\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-green.svg)](LICENSE)\n\n[![Based on](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBased_on-Karpathy's_Autoresearch-orange)](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch)\n[![Follow @iuditg](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FFollow-@iuditg-000000?style=flat&logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002Fintent\u002Ffollow?screen_name=iuditg)\n[![Support](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSupport-PayPal-00457C?style=flat&logo=paypal&logoColor=white)](https:\u002F\u002Fpaypal.me\u002Fuditgoenka)\n\n\u003Cbr>\n\n*\"Set the GOAL → The agent runs the LOOP → You wake up to results\"*\n\n*You don't need AGI. You need a goal, a metric, and a loop that never quits.*\n\n**Now supports Claude Code, OpenCode, and OpenAI Codex.**\n\n\u003Cbr>\n\n[How It Works](#how-it-works) · [Commands](#commands) · [Quick Start](#quick-start) · [Guides](guide\u002F) · [FAQ](#faq)\n\n\u003C\u002Fdiv>\n\n---\n\n```\n      PLAN              LOOP             DEBUG              FIX            SECURE            SHIP\n ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐\n │   Goal   │     │  Modify  │     │   Find   │     │   Fix    │     │  STRIDE  │     │  Stage   │\n │  Metric  │────▶│  Verify  │────▶│   Bugs   │────▶│  Errors  │────▶│  OWASP   │────▶│  Deploy  │\n │  Scope   │     │  Keep\u002F   │     │  Trace   │     │  Repair  │     │  Red     │     │ Release  │\n └──────────┘     │  Discard │     └──────────┘     └──────────┘     │  Team    │     └──────────┘\n\u002Fautoresearch:    └──────────┘    \u002Fautoresearch:    \u002Fautoresearch:   └──────────┘    \u002Fautoresearch:\n  plan            \u002Fautoresearch     debug              fix          \u002Fautoresearch:      ship\n                                                                     security\n\n ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐\n │  Probe   │     │ Scenario │     │ Predict  │     │  Learn   │     │  Reason  │\n │ Require- │     │   Edge   │     │ 5-Expert │     │   Docs   │     │  Debate  │\n │  ments   │     │   Cases  │     │  Swarm   │     │   Gen    │     │ Converge │\n └──────────┘     └──────────┘     └──────────┘     └──────────┘     └──────────┘\n\u002Fautoresearch:   \u002Fautoresearch:   \u002Fautoresearch:   \u002Fautoresearch:   \u002Fautoresearch:\n  probe            scenario         predict           learn           reason\n```\n\n---\n\n## Why This Exists\n\n[Karpathy's autoresearch](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch) demonstrated that a 630-line Python script could autonomously improve ML models overnight — **100 experiments per night** — by following simple principles: one metric, constrained scope, fast verification, automatic rollback, git as memory.\n\n**Claude Autoresearch generalizes these principles to ANY domain.** Not just ML — code, content, marketing, sales, HR, DevOps, or anything with a number you can measure.\n\n---\n\n## How It Works\n\n```\nLOOP (FOREVER or N times):\n  1. Review current state + git history + results log\n  2. Pick the next change (based on what worked, what failed, what's untried)\n  3. Make ONE focused change\n  4. Git commit (before verification)\n  5. Run mechanical verification (tests, benchmarks, scores)\n  6. If improved → keep. If worse → git revert. If crashed → fix or skip.\n  7. Log the result\n  8. Repeat. Never stop until you interrupt (or N iterations complete).\n```\n\nEvery improvement stacks. Every failure auto-reverts. Progress is logged in TSV format.\n\n### The Setup Phase\n\nBefore looping, Claude performs a one-time setup:\n\n1. **Read context** — reads all in-scope files\n2. **Define goal** — extracts or asks for a mechanical metric\n3. **Define scope** — which files can be modified vs read-only\n4. **Establish baseline** — runs verification on current state (iteration #0)\n5. **Confirm and go** — shows setup, then begins the loop\n\n### 8 Critical Rules\n\n| # | Rule |\n|---|------|\n| 1 | **Loop until done** — unbounded: forever. Bounded: N times then summarize |\n| 2 | **Read before write** — understand full context before modifying |\n| 3 | **One change per iteration** — atomic changes. If it breaks, you know why |\n| 4 | **Mechanical verification only** — no subjective \"looks good.\" Use metrics |\n| 5 | **Automatic rollback** — failed changes revert instantly |\n| 6 | **Simplicity wins** — equal results + less code = KEEP |\n| 7 | **Git is memory** — experiments committed with `experiment:` prefix, `git revert` preserves failed experiments in history, agent MUST read `git log` + `git diff` before each iteration |\n| 8 | **When stuck, think harder** — re-read, combine near-misses, try radical changes |\n\n---\n\n## Commands\n\n| Command | What it does |\n|---------|--------------|\n| `\u002Fautoresearch` | Run the autonomous iteration loop (unlimited) |\n| `Iterations: N` | Add to inline config to run exactly N iterations then stop |\n| `\u002Fautoresearch:plan` | Interactive wizard: Goal → Scope, Metric, Verify config |\n| `\u002Fautoresearch:security` | Autonomous STRIDE + OWASP + red-team security audit |\n| `\u002Fautoresearch:ship` | Universal shipping workflow (code, content, marketing, sales, research, design) |\n| `\u002Fautoresearch:debug` | Autonomous bug-hunting loop — scientific method + iterative investigation |\n| `\u002Fautoresearch:fix` | Autonomous fix loop — iteratively repair errors until zero remain |\n| `\u002Fautoresearch:scenario` | Scenario-driven use case generator — explore situations, edge cases, derivative scenarios |\n| `\u002Fautoresearch:predict` | Multi-persona prediction | Pre-analyze code from 5 expert perspectives before acting |\n| `\u002Fautoresearch:learn` | Autonomous documentation engine — scout codebase, generate\u002Fupdate docs, validate, fix loop |\n| `\u002Fautoresearch:reason` | Adversarial refinement — blind judge panel converges subjective content through isolated multi-agent debate |\n| `\u002Fautoresearch:probe` | Adversarial requirement \u002F assumption interrogation — 8 personas probe user + codebase until net-new constraints saturate, emits ready-to-run autoresearch config |\n| `Guard: \u003Ccommand>` | Optional safety net — must pass for changes to be kept |\n\n**All commands use interactive setup when invoked without arguments.** Just type the command — the agent will ask you what you need step by step with smart defaults based on your codebase. Power users can skip the wizard by providing flags inline.\n\n> **OpenCode users:** Commands use underscore naming (`\u002Fautoresearch_debug`, `\u002Fautoresearch_fix`, etc.) instead of colons. See [OpenCode Quick Start](#opencode-quick-start) below.\n>\n> **Codex users:** Invoke the skill via `$autoresearch` mention syntax. Subcommands are keywords: `$autoresearch plan`, `$autoresearch debug`, etc. See [Codex Quick Start](#codex-quick-start) below.\n\n### Quick Decision Guide\n\n| I want to... | Use |\n|--------------|-----|\n| Improve test coverage \u002F reduce bundle size \u002F any metric | `\u002Fautoresearch` (add `Iterations: N` for bounded runs) |\n| Don't know what metric to use | `\u002Fautoresearch:plan` |\n| Run a security audit | `\u002Fautoresearch:security` |\n| Ship a PR \u002F deployment \u002F release | `\u002Fautoresearch:ship` |\n| Optimize without breaking existing tests | Add `Guard: npm test` |\n| Hunt all bugs in a codebase | `\u002Fautoresearch:debug` (add `Iterations: 20` for bounded runs) |\n| Fix all errors (tests, types, lint) | `\u002Fautoresearch:fix` |\n| Debug then auto-fix | `\u002Fautoresearch:debug --fix` |\n| Check if something is ready to ship | `\u002Fautoresearch:ship --checklist-only` |\n| Explore edge cases for a feature | `\u002Fautoresearch:scenario` |\n| Generate test scenarios | `\u002Fautoresearch:scenario --domain software --format test-scenarios` |\n| Stress test a user journey | `\u002Fautoresearch:scenario --depth deep` |\n| I want expert opinions before I start | `\u002Fautoresearch:predict` |\n| Analyze this from multiple angles | `\u002Fautoresearch:predict --chain debug` |\n| Generate docs for a new codebase | `\u002Fautoresearch:learn --mode init` |\n| Update existing docs after changes | `\u002Fautoresearch:learn --mode update` |\n| Check if docs are stale | `\u002Fautoresearch:learn --mode check` |\n| Debate an architecture decision | `\u002Fautoresearch:reason --domain software` |\n| Refine a pitch or proposal adversarially | `\u002Fautoresearch:reason --domain business` |\n| Converge on best design then validate | `\u002Fautoresearch:reason --chain predict` |\n| Surface hidden constraints before starting | `\u002Fautoresearch:probe` |\n| Pre-flight a fuzzy goal then loop | `\u002Fautoresearch:probe --chain plan,autoresearch` |\n| Stress-test requirements adversarially | `\u002Fautoresearch:probe --adversarial --depth deep` |\n\n---\n\n## Quick Start\n\n### Claude Code\n\n**Option A — npx install (recommended):**\n\n```bash\nnpx skills add uditgoenka\u002Fautoresearch\n```\n\nThat's it. All 11 commands are available after restarting Claude Code.\n\n**Option B — Plugin install:**\n\nIn Claude Code, run:\n```\n\u002Fplugin marketplace add uditgoenka\u002Fautoresearch\n\u002Fplugin install autoresearch@autoresearch\n```\n\n> **Note:** Start a new Claude Code session after installing. Reference files aren't resolvable in the same session where installation happened — this is a Claude Code platform limitation.\n\n**Updating (no reinstall needed):**\n```\n\u002Fplugin update autoresearch\n```\n\nThat pulls the latest version. Run `\u002Freload-plugins` to activate. No need to uninstall or re-clone.\n\n**Option C — Manual copy:**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fuditgoenka\u002Fautoresearch.git\n\n# Copy skill + subcommands to your project\ncp -r autoresearch\u002Fclaude-plugin\u002Fskills\u002Fautoresearch .claude\u002Fskills\u002Fautoresearch\ncp -r autoresearch\u002Fclaude-plugin\u002Fcommands\u002Fautoresearch .claude\u002Fcommands\u002Fautoresearch\ncp autoresearch\u002Fclaude-plugin\u002Fcommands\u002Fautoresearch.md .claude\u002Fcommands\u002Fautoresearch.md\n```\n\nOr install globally:\n```bash\ncp -r autoresearch\u002Fclaude-plugin\u002Fskills\u002Fautoresearch ~\u002F.claude\u002Fskills\u002Fautoresearch\ncp -r autoresearch\u002Fclaude-plugin\u002Fcommands\u002Fautoresearch ~\u002F.claude\u002Fcommands\u002Fautoresearch\ncp autoresearch\u002Fclaude-plugin\u002Fcommands\u002Fautoresearch.md ~\u002F.claude\u002Fcommands\u002Fautoresearch.md\n```\n\n> **Note:** The `commands\u002F` directory is required for subcommands (`\u002Fautoresearch:ship`, `\u002Fautoresearch:plan`, `\u002Fautoresearch:security`) to work.\n\n**Option D — Guided installer:**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fuditgoenka\u002Fautoresearch.git\ncd autoresearch\n.\u002Fscripts\u002Finstall.sh --claude --global\n```\n\n### OpenCode Quick Start\n\n**Option A — Guided installer (recommended):**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fuditgoenka\u002Fautoresearch.git\ncd autoresearch\n.\u002Fscripts\u002Finstall.sh --opencode --global\n```\n\n**Option B — Manual copy:**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fuditgoenka\u002Fautoresearch.git\n\n# Copy to your project\ncp -r autoresearch\u002F.opencode\u002Fskills\u002Fautoresearch .opencode\u002Fskills\u002Fautoresearch\ncp autoresearch\u002F.opencode\u002Fcommands\u002Fautoresearch*.md .opencode\u002Fcommands\u002F\ncp autoresearch\u002F.opencode\u002Fagents\u002Fdocs-manager.md .opencode\u002Fagents\u002Fdocs-manager.md\n```\n\nOr install globally:\n```bash\ncp -r autoresearch\u002F.opencode\u002Fskills\u002Fautoresearch ~\u002F.config\u002Fopencode\u002Fskills\u002Fautoresearch\ncp autoresearch\u002F.opencode\u002Fcommands\u002Fautoresearch*.md ~\u002F.config\u002Fopencode\u002Fcommands\u002F\ncp autoresearch\u002F.opencode\u002Fagents\u002Fdocs-manager.md ~\u002F.config\u002Fopencode\u002Fagents\u002Fdocs-manager.md\n```\n\n> **OpenCode command names:** Use underscores instead of colons — `\u002Fautoresearch_debug`, `\u002Fautoresearch_fix`, `\u002Fautoresearch_plan`, etc. All 11 commands are available.\n\n### Codex Quick Start\n\n**Option A — Guided installer (recommended):**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fuditgoenka\u002Fautoresearch.git\ncd autoresearch\n.\u002Fscripts\u002Finstall.sh --codex --global\n```\n\n**Option B — Manual copy:**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fuditgoenka\u002Fautoresearch.git\n\n# Copy to your project\ncp -r autoresearch\u002F.agents\u002Fskills\u002Fautoresearch .codex\u002Fskills\u002Fautoresearch\n```\n\nOr install globally:\n```bash\ncp -r autoresearch\u002F.agents\u002Fskills\u002Fautoresearch ~\u002F.codex\u002Fskills\u002Fautoresearch\n```\n\n> **Codex invocation:** Use `$autoresearch` mention syntax in your prompt. Subcommands are keywords — `$autoresearch plan`, `$autoresearch debug`, `$autoresearch security`, etc. Codex discovers installed skills from `${CODEX_HOME:-~\u002F.codex}\u002Fskills` and project-local `.codex\u002Fskills\u002F` directories.\n\n### 2. Run It\n\n```\n\u002Fautoresearch\nGoal: Increase test coverage from 72% to 90%\nScope: src\u002F**\u002F*.test.ts, src\u002F**\u002F*.ts\nMetric: coverage % (higher is better)\nVerify: npm test -- --coverage | grep \"All files\"\n```\n\n### 3. Walk Away\n\nClaude reads all files, establishes a baseline, and starts iterating — one change at a time. Keep improvements, auto-revert failures, log everything. **Never stops until you interrupt** (or N iterations complete).\n\n---\n\n## \u002Fautoresearch:plan — Goal → Config Wizard\n\nThe hardest part isn't the loop — it's defining Scope, Metric, and Verify correctly. `\u002Fautoresearch:plan` converts your plain-language goal into a validated, ready-to-execute configuration.\n\n```\n\u002Fautoresearch:plan\nGoal: Make the API respond faster\n```\n\nThe wizard walks you through 5 steps: capture goal → define scope → define metric → define direction → validate verify command (dry-run). Every gate is mechanical — scope must resolve to files, metric must output a number, verify must pass a dry-run.\n\n---\n\n## \u002Fautoresearch:security — Autonomous Security Audit\n\nRead-only security audit using STRIDE threat modeling, OWASP Top 10 sweeps, and red-team adversarial analysis with 4 hostile personas.\n\n```\n\u002Fautoresearch:security\nIterations: 10\n```\n\n**What it does:** Codebase recon → asset inventory → trust boundaries → STRIDE threat model → attack surface map → autonomous testing loop → structured report.\n\nEvery finding requires **code evidence** (file:line + attack scenario). No theoretical fluff.\n\n| Flag | Purpose |\n|------|---------|\n| `--diff` | Only audit files changed since last audit |\n| `--fix` | Auto-fix confirmed Critical\u002FHigh findings |\n| `--fail-on \u003Cseverity>` | Exit non-zero for CI\u002FCD gating |\n\n**Output:** Creates `security\u002F{date}-{slug}\u002F` with 7 structured report files.\n\n---\n\n## \u002Fautoresearch:ship — Universal Shipping Workflow\n\nShip anything through 8 phases: **Identify → Inventory → Checklist → Prepare → Dry-run → Ship → Verify → Log.**\n\n```\n\u002Fautoresearch:ship --auto\n```\n\nAuto-detects what you're shipping (code PR, deployment, blog post, email campaign, sales deck, research paper, design assets) and generates domain-specific checklists — every item mechanically verifiable.\n\n| Flag | Purpose |\n|------|---------|\n| `--dry-run` | Validate everything but don't ship |\n| `--auto` | Auto-approve if checklist passes |\n| `--force` | Skip non-critical items (blockers still enforced) |\n| `--rollback` | Undo last ship action |\n| `--monitor N` | Post-ship monitoring for N minutes |\n| `--type \u003Ctype>` | Override auto-detection |\n| `--checklist-only` | Just check readiness |\n\n**9 supported types:** code-pr, code-release, deployment, content, marketing-email, marketing-campaign, sales, research, design.\n\n---\n\n## \u002Fautoresearch:debug — Autonomous Bug Hunter (v1.3.0)\n\nScientific method meets autoresearch loop. Doesn't stop at one bug — iteratively hunts ALL bugs using falsifiable hypotheses, evidence-based investigation, and 7 investigation techniques.\n\n```\n\u002Fautoresearch:debug\nScope: src\u002Fapi\u002F**\u002F*.ts\nSymptom: API returns 500 on POST \u002Fusers\nIterations: 20\n```\n\n**How it works:** Gather symptoms → Recon (map error surface) → Hypothesize (specific, testable) → Test (one experiment per iteration) → Classify (confirmed\u002Fdisproven\u002Finconclusive) → Log → Repeat.\n\nEvery finding requires **code evidence** (file:line + reproduction steps). Every disproven hypothesis is logged — equally valuable. Uses 7 techniques: binary search, differential debugging, minimal reproduction, trace execution, pattern search, working backwards, rubber duck.\n\n| Flag | Purpose |\n|------|---------|\n| `--fix` | After hunting, auto-switch to `\u002Fautoresearch:fix` |\n| `--scope \u003Cglob>` | Limit investigation scope |\n| `--symptom \"\u003Ctext>\"` | Pre-fill symptom |\n| `--severity \u003Clevel>` | Minimum severity to report |\n\n---\n\n## \u002Fautoresearch:fix — Autonomous Error Crusher (v1.3.0)\n\nTakes a broken state and iteratively repairs it until everything passes. ONE fix per iteration. Atomic, committed, verified, auto-reverted on failure.\n\n```\n\u002Fautoresearch:fix\n```\n\n**How it works:** Auto-detects what's broken (tests, types, lint, build) → Prioritizes (blockers first) → Fixes ONE thing → Commits → Verifies error count decreased → Guard check (no regressions) → Keep\u002FRevert → Repeat until zero errors.\n\n**Stops automatically when error count hits zero** — even in unbounded mode.\n\n| Flag | Purpose |\n|------|---------|\n| `--target \u003Ccommand>` | Explicit verify command |\n| `--guard \u003Ccommand>` | Safety command that must always pass |\n| `--category \u003Ctype>` | Only fix specific type (test, type, lint, build) |\n| `--from-debug` | Read findings from latest debug session |\n\n**Chain them:** Run `\u002Fautoresearch:debug` with `Iterations: 15`, then `\u002Fautoresearch:fix --from-debug` with `Iterations: 30`\n\n---\n\n## \u002Fautoresearch:learn — Autonomous Documentation Engine\n\nScout codebase → generate docs → validate → fix → repeat. 4 modes: init (create from scratch), update (refresh existing), check (read-only health report), summarize (quick overview).\n\n```\n\u002Fautoresearch:learn --mode init --depth deep\n```\n\nDynamic doc discovery (scans `docs\u002F*.md`), project-type detection, validation-fix loop (max 3 retries), scale-aware scouting, git-diff scoping for updates, selective single-doc update with `--file`. Auto-generates Mermaid architecture diagrams, conditional docs (API reference, testing guide, config guide, changelog), cross-reference links between docs, and dependency documentation. Supports `--format` for alternative output formats.\n\n---\n\n## \u002Fautoresearch:predict — Multi-Persona Prediction (v1.7.0)\n\nBefore you debug, fix, or ship — get 5 expert perspectives in 2 minutes.\n\n`\u002Fautoresearch:predict` simulates a team of experts (Architect, Security Analyst, Performance Engineer, Reliability Engineer, Devil's Advocate) who independently analyze your code, debate findings, and reach consensus. Chain the output directly to any other command:\n\n- `\u002Fautoresearch:predict --chain debug` — pre-ranked hypotheses before debugging\n- `\u002Fautoresearch:predict --chain security` — multi-persona red team analysis\n- `\u002Fautoresearch:predict --chain scenario,debug,fix` — full quality pipeline\n\n---\n\n## \u002Fautoresearch:reason — Adversarial Refinement (v1.9.0)\n\nExtends autoresearch to **subjective domains** where no objective metric exists. The blind judge panel IS the fitness function — it's val_bpb for architecture decisions, product strategy, content quality, and design debates.\n\n```\n\u002Fautoresearch:reason\nTask: Should we use event sourcing for our order management system?\nDomain: software\nIterations: 8\n```\n\n**How it works:** Generate-A → Critic attacks (strawman) → Author-B responds → Synthesizer merges → Blind judge panel (randomized labels) picks winner → Winner becomes new A → Repeat until convergence.\n\n**Key invariant:** Every agent is a cold-start fresh invocation — no shared session, no history bleed. Judges never see A\u002FB\u002FAB labels, only X\u002FY\u002FZ.\n\n| Flag | Purpose |\n|------|---------|\n| `--iterations N` | Bounded mode — run exactly N rounds |\n| `--judges N` | Judge count (3-7, odd preferred) |\n| `--convergence N` | Consecutive wins to converge (default: 3) |\n| `--mode \u003Cmode>` | convergent (default), creative, debate |\n| `--domain \u003Ctype>` | software, product, business, security, research, content |\n| `--chain \u003Ctargets>` | Chain converged output to any autoresearch command |\n\n**Chain patterns:** `reason → predict` (converge then stress-test), `reason → plan,fix` (converge then implement), `reason → scenario` (converge then explore edge cases).\n\n**Output:** Creates `reason\u002F{date}-{slug}\u002F` with lineage.md, candidates.md, judge-transcripts.md, reason-results.tsv, handoff.json.\n\n---\n\n## \u002Fautoresearch:probe — Adversarial Requirement Interrogation (v1.10.0)\n\nThe requirement-clarification layer for autoresearch. Eight adversarial personas interrogate user and codebase together until net-new constraints per round drop below a threshold (mechanical saturation). Output is the 5 autoresearch primitives (Goal\u002FScope\u002FMetric\u002FDirection\u002FVerify) plus a `handoff.json` ready to feed any other autoresearch command.\n\n```\n\u002Fautoresearch:probe\nTopic: Reduce p99 latency below 200ms for \u002Fsearch\n\n# Pre-flight pipeline — probe → plan → loop\n\u002Fautoresearch:probe --chain plan,autoresearch\nTopic: Add multi-tenant isolation to the database layer\n```\n\n**How it works:** Seed Capture → Persona Activation → Codebase Grounding → Round Generation (each persona drafts cold-start questions) → Synthesis (dedupe, cap ≤5) → Answer Capture (single batched `AskUserQuestion`) → Constraint Extraction (7 atom types) → Cross-Check → Saturation Check → Synthesize & Handoff.\n\n**The 8 personas:** Skeptic, Edge-Case Hunter, Scope Sentinel, Ambiguity Detective, Contradiction Finder, Prior-Art Investigator, Success-Criteria Auditor, Constraint Excavator. Each is cold-start within a round — no persona sees others' candidate questions until synthesis. `--adversarial` rotates Skeptic + Contradiction Finder + Edge-Case Hunter to the front.\n\n| Flag | Purpose |\n|------|---------|\n| `--depth \u003Clevel>` | shallow (5 rounds), standard (15), deep (30) |\n| `--personas N` | active persona count (3-8, default 6) |\n| `--saturation-threshold N` | net-new atoms threshold (default 2, window K=3) |\n| `--scope \u003Cglob>` | codebase glob for grounding |\n| `--chain \u003Ctargets>` | downstream commands: plan, predict, debug, scenario, reason, fix, ship, learn |\n| `--mode \u003Cmode>` | interactive (default) or autonomous (self-answer with confidence labels) |\n| `--adversarial` | rotate the 3 most adversarial personas to the front |\n| `--iterations N` | hard cap on rounds, overrides `--depth` |\n\n**Mechanical saturation:** probe stops when net-new constraints fall below the threshold for K consecutive rounds — not when it \"feels done.\" Other terminations: `BOUNDED` (Iterations exhausted), `USER_INTERRUPT` (Ctrl+C), `SCOPE_LOCKED` (all atoms classified out-of-scope for 2 rounds).\n\n**Output:** Creates `probe\u002F{date}-{slug}\u002F` with probe-spec.md, constraints.tsv, questions-asked.tsv, contradictions.md, hidden-assumptions.md, autoresearch-config.yml, summary.md, handoff.json.\n\n---\n\n## \u002Fautoresearch:scenario — Scenario Explorer (v1.6.0)\n\nAutonomous scenario exploration engine. Takes a seed scenario and iteratively generates situations across 12 dimensions — happy paths, errors, edge cases, abuse, scale, concurrency, temporal, data variation, permissions, integrations, recovery, and state transitions.\n\n```\n\u002Fautoresearch:scenario\nScenario: User attempts to checkout with multiple payment methods\nIterations: 25\n```\n\n**How it works:** Seed analysis → Decompose into 12 dimensions → Generate ONE situation per iteration → Classify (new\u002Fvariant\u002Fduplicate) → Expand edge cases → Log → Repeat until all dimensions explored.\n\nAdaptive setup: provides 4-8 questions based on how much context you give. Just type `\u002Fautoresearch:scenario` with nothing else and it walks you through everything.\n\n| Flag | Purpose |\n|------|---------|\n| `--domain \u003Ctype>` | Domain: software, product, business, security, marketing |\n| `--depth \u003Clevel>` | Depth: shallow (10), standard (25), deep (50+) |\n| `--format \u003Ctype>` | Output: use-cases, user-stories, test-scenarios, threat-scenarios |\n| `--focus \u003Carea>` | Prioritize: edge-cases, failures, security, scale |\n| `--scope \u003Cglob>` | Limit to specific files\u002Ffeatures |\n\n**5 domains supported** with tailored dimension priorities and output formats. **Chain with** `\u002Fautoresearch:debug` to hunt bugs in discovered edge cases, or `\u002Fautoresearch:security` to audit discovered threat scenarios.\n\n---\n\n## Guard — Prevent Regressions (v1.0.4)\n\nWhen optimizing a metric, the loop might break existing behavior. **Guard** is an optional safety net.\n\n```\n\u002Fautoresearch\nGoal: Reduce API response time to under 100ms\nVerify: npm run bench:api | grep \"p95\"\nGuard: npm test\n```\n\n- **Verify** = \"Did the metric improve?\" (the goal)\n- **Guard** = \"Did anything else break?\" (the safety net)\n\nIf the metric improves but the guard fails, Claude reworks the optimization (up to 2 attempts). Guard\u002Ftest files are never modified.\n\n> **Credit:** Guard was contributed by [@pronskiy](https:\u002F\u002Fgithub.com\u002Fpronskiy) (JetBrains) in [PR #7](https:\u002F\u002Fgithub.com\u002Fuditgoenka\u002Fautoresearch\u002Fpull\u002F7).\n\n---\n\n## Results Tracking\n\nEvery iteration is logged in TSV format:\n\n```tsv\niteration  commit   metric  delta   status    description\n0          a1b2c3d  85.2    0.0     baseline  initial state\n1          b2c3d4e  87.1    +1.9    keep      add tests for auth edge cases\n2          -        86.5    -0.6    discard   refactor test helpers (broke 2 tests)\n3          c3d4e5f  88.3    +1.2    keep      add error handling tests\n```\n\nEvery 10 iterations, Claude prints a progress summary. Bounded loops print a final summary with baseline → current best.\n\n---\n\n## Crash Recovery\n\n| Failure | Response |\n|---------|----------|\n| Syntax error | Fix immediately, don't count as iteration |\n| Runtime error | Attempt fix (max 3 tries), then move on |\n| Resource exhaustion | Revert, try smaller variant |\n| Infinite loop \u002F hang | Kill after timeout, revert |\n| External dependency | Skip, log, try different approach |\n\n---\n\n## Repository Structure\n\n```\nautoresearch\u002F\n├── README.md\n├── COMPARISON.md                                  ← Karpathy's Autoresearch vs Claude Autoresearch\n├── guide\u002F                                         ← Comprehensive guides — one per command + advanced patterns\n├── scripts\u002F\n│   ├── install.sh                                 ← Guided installer (Claude Code + OpenCode + Codex)\n│   ├── sync-opencode.sh                           ← Sync .claude\u002F → .opencode\u002F with adaptations\n│   ├── sync-codex.sh                              ← Sync .claude\u002F → .agents\u002F with Codex adaptations\n│   ├── release.sh                                 ← Release automation\n│   └── release.md                                 ← Release checklist\n├── .claude\u002Fskills\u002Fautoresearch\u002F                   ← Claude Code source (canonical)\n│   ├── SKILL.md                                   ← Main skill\n│   └── references\u002F                                ← 13 workflow protocol files\n├── .opencode\u002F                                     ← OpenCode port (generated via sync-opencode.sh)\n│   ├── skills\u002Fautoresearch\u002F                       ← Adapted SKILL.md + references\n│   ├── commands\u002F                                  ← 11 command files (autoresearch_*.md)\n│   └── agents\u002Fdocs-manager.md                     ← Subagent for learn workflow\n├── .agents\u002Fskills\u002Fautoresearch\u002F                   ← Codex port (generated via sync-codex.sh)\n│   ├── SKILL.md                                   ← Adapted SKILL.md + references\n│   ├── references\u002F                                ← 12 workflow protocol files\n│   └── agents\u002Fopenai.yaml                         ← UI metadata for Codex\n├── claude-plugin\u002F                                 ← Distribution package (Claude Code plugin install)\n│   ├── .claude-plugin\u002Fplugin.json                 ← Plugin metadata + version\n│   ├── commands\u002F                                  ← Command registrations\n│   └── skills\u002Fautoresearch\u002F                       ← Skill + references\n└── LICENSE\n```\n\n---\n\n## FAQ\n\n**Q: I don't know what metric to use.**\nA: Run `\u002Fautoresearch:plan` — it analyzes your codebase, suggests metrics, and dry-runs the verify command before you launch.\n\n**Q: Does this work with any project?**\nA: Yes. Any language, framework, or domain. Install via `\u002Fplugin marketplace add uditgoenka\u002Fautoresearch` (Claude Code), `.\u002Fscripts\u002Finstall.sh --opencode --global` (OpenCode), `.\u002Fscripts\u002Finstall.sh --codex --global` (Codex), or manually copy files.\n\n**Q: Does this work with OpenCode?**\nA: Yes, as of v2.0.0. Run `.\u002Fscripts\u002Finstall.sh --opencode --global` or manually copy `.opencode\u002F` files. Commands use underscore naming (`\u002Fautoresearch_debug` instead of `\u002Fautoresearch:debug`).\n\n**Q: Does this work with OpenAI Codex?**\nA: Yes, as of v2.0.0. Run `.\u002Fscripts\u002Finstall.sh --codex --global` or copy `.agents\u002Fskills\u002Fautoresearch\u002F` to `${CODEX_HOME:-~\u002F.codex}\u002Fskills\u002Fautoresearch`. Invoke via `$autoresearch` mention syntax in Codex.\n\n**Q: How do I stop the loop?**\nA: `Ctrl+C` or add `Iterations: N` to your inline config to run exactly N iterations. Claude commits before verifying, so your last successful state is always in git.\n\n**Q: Can I use this for non-code tasks?**\nA: Absolutely. Sales emails, marketing copy, HR policies, runbooks — anything with a measurable metric. See [Examples by Domain](guide\u002Fexamples-by-domain.md).\n\n**Q: Does \u002Fautoresearch:security modify my code?**\nA: No. It's read-only — analyzes code and produces a structured report. Use `--fix` to opt into auto-remediation of confirmed Critical\u002FHigh findings.\n\n**Q: Can I use MCP servers?**\nA: Yes. Any MCP server configured in Claude Code is available during the loop for database queries, API calls, analytics, etc. See [Advanced Patterns](guide\u002Fadvanced-patterns.md#using-with-mcp-servers).\n\n**Q: What's the difference between \u002Fautoresearch:predict and \u002Fautoresearch:reason?**\nA: Predict is a one-shot analysis — 5 experts debate your existing code. Reason is an iterative refinement loop — competing candidates are generated, critiqued, synthesized, and blind-judged over multiple rounds until convergence. Use predict for analysis before acting; use reason for decisions where no objective metric exists.\n\n---\n\n## Contributing\n\nContributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md).\n\nAreas of interest: new domain examples, verification script templates, CI\u002FCD integrations, real-world benchmarks. All guides are in the [guide\u002F](guide\u002F) folder.\n\n---\n\n## Star History\n\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F?repos=uditgoenka%2Fautoresearch&type=timeline&legend=top-left\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fimage?repos=uditgoenka\u002Fautoresearch&type=timeline&theme=dark&legend=bottom-right&v=20260319\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fimage?repos=uditgoenka\u002Fautoresearch&type=timeline&legend=bottom-right&v=20260319\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fimage?repos=uditgoenka\u002Fautoresearch&type=timeline&legend=bottom-right&v=20260319\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n---\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n---\n\n## Credits\n\n- **[Andrej Karpathy](https:\u002F\u002Fgithub.com\u002Fkarpathy)** — for [autoresearch](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch)\n- **[Anthropic](https:\u002F\u002Fanthropic.com)** — for [Claude Code](https:\u002F\u002Fdocs.anthropic.com\u002Fen\u002Fdocs\u002Fclaude-code) and the skills system\n- **[OpenCode](https:\u002F\u002Fopencode.ai)** — for the OpenCode terminal agent\n- **[OpenAI](https:\u002F\u002Fopenai.com)** — for [Codex](https:\u002F\u002Fdevelopers.openai.com\u002Fcodex) and the agent skills standard\n\n---\n\n\u003Cdiv align=\"center\">\n\n## About the Author\n\n\u003Ca href=\"https:\u002F\u002Fudit.co\">\n  \u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fuditgoenka\" width=\"80\" style=\"border-radius: 50%;\" alt=\"Udit Goenka\" \u002F>\n\u003C\u002Fa>\n\n**[Udit Goenka](https:\u002F\u002Fudit.co)** — AI Product Expert, Founder & Angel Investor\n\nSelf-taught builder who went from a slow internet connection in India to founding multiple companies and helping 700+ startups generate over ~$25m in revenue.\n\n**Building:** [TinyCheque](https:\u002F\u002Ftinycheque.com) (India's first agentic AI venture studio) · [Firstsales.io](https:\u002F\u002Ffirstsales.io) (sales automation)\n\n**Investing:** 38 startups backed, 6 exits. Focused on early-stage AI and SaaS.\n\n**Connect:** [udit.co](https:\u002F\u002Fudit.co) · [@iuditg](https:\u002F\u002Fx.com\u002Fiuditg) · [@uditgoenka](https:\u002F\u002Fgithub.com\u002Fuditgoenka) · [Newsletter](https:\u002F\u002Fudit.co\u002Fblog)\n\n> *\"Autonomy scales when you constrain scope, clarify success, mechanize verification, and let agents optimize tactics while humans optimize strategy.\"*\n\n\u003C\u002Fdiv>\n","Autoresearch 项目旨在将 Claude Code、OpenCode 或 OpenAI Codex 转化为一个持续改进的引擎，通过自主迭代实现目标导向的任务优化。其核心功能包括修改、验证、保留或丢弃代码，并不断重复这一过程，从而在无需人工干预的情况下实现代码质量的逐步提升。该项目基于 Karpathy 的 autoresearch 概念，强调了约束条件、机械度量标准与自主迭代相结合带来的复合收益。它特别适用于需要持续优化和自动化测试的软件开发场景，能够显著提高开发者的工作效率和代码质量。",2,"2026-06-11 03:51:12","high_star"]