[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80171":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":15,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":16,"rankGlobal":10,"rankLanguage":10,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":15,"starSnapshotCount":15,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},80171,"crucible","Bambushu\u002Fcrucible","Bambushu","Codebase-level adversarial review by a panel of frontier models. A Claude Code skill that runs every file through DeepSeek + Gemini + Kimi + MiniMax in sequence, then has Claude verify the findings against the actual source.","https:\u002F\u002Fgithub.com\u002FBambushu\u002Fcrucible",null,"Python",46,7,51,0,2.71,"MIT License",false,"main",[21,22,23,24,25,26,27,28],"adversarial-review","ai-code-review","claude-code","claude-code-skill","code-review","llm-tools","openrouter","security-audit","2026-06-12 02:03:59","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"assets\u002Fhero.png\" alt=\"Crucible\" width=\"520\" \u002F>\n\n# Crucible\n\n**Codebase-level adversarial review by a panel of frontier models.**\n\nA Claude Code skill that walks your code piece-by-piece and puts every file under simultaneous pressure from a panel of structurally different models, then aggregates the findings into a single severity-ranked report that Claude itself verifies before you see it.\n\n[Install](#install) · [How it works](#how-it-works) · [Cost](#cost) · [Modes](#modes) · [Sample report](#sample-report)\n\n\u003C\u002Fdiv>\n\n---\n\n## What it is\n\n`\u002Fcrucible` is a [Claude Code](https:\u002F\u002Fclaude.com\u002Fclaude-code) slash-command skill. You drop the folder into `~\u002F.claude\u002Fskills\u002F`, set one env var, and from inside any project you can run:\n\n```\n\u002Fcrucible                              # review the current branch's diff\n\u002Fcrucible --all                        # review the whole repo\n\u002Fcrucible --paths \"src\u002Fapi\u002F**\u002F*.ts\"    # review a glob\n\u002Fcrucible --diff main...HEAD           # review a specific range\n```\n\nBehind the scenes, Claude:\n\n1. Resolves the file list and prints a pre-flight (files, models, est. cost).\n2. Loads a panel of four current SOTA paid models from OpenRouter, each from a different vendor family (e.g. DeepSeek, Google, Moonshot, MiniMax).\n3. Reviews every in-scope file through the panel: pass 1 finds, pass 2 validates, pass 3 consolidates with severity ranks.\n4. Runs one cross-file architectural meta-pass to catch repeated anti-patterns, missing layers, and coupling smells.\n5. **Verifies the report.** Claude reads every CRITICAL and HIGH finding back against the actual source code, marks them confirmed, refined, disputed, or \"needs human judgment\", and adds up to three findings the panel missed.\n6. Drops a single `report.md` in `.crucible-cache\u002F\u003Crun-id>\u002F`.\n\nFindings are persisted as they land, so a network blip or rate-limit hiccup mid-run is just a `--resume \u003Crun-id>` away.\n\n---\n\n## Why this exists\n\nSingle-model review has correlated blind spots. If GPT misses a vulnerability, the runner-up GPT-class model usually misses it too. A panel of models drawn from genuinely different training runs (DeepSeek vs. Gemini vs. Kimi vs. MiniMax) does not.\n\nBut three OS models converging on the same hallucination is still a hallucination. So Crucible adds a final step: Claude reads the report against the actual code and tells you which findings are real, which are misreads, and what the panel missed. That verification phase is what makes the deliverable trustworthy enough to act on without re-reading every file yourself.\n\nCompared to alternatives:\n\n| | Scope | Models | Verification |\n|---|---|---|---|\n| `\u002Frival` (single-file) | One file or short diff | 1 OpenRouter model | None |\n| GitHub Copilot review | Whole PR | One model family | None |\n| Internal personas (e.g. RaadSmid) | Diff or repo | Same Claude, multiple personas | Self-check |\n| **Crucible** | **Diff, glob, or whole repo** | **4 SOTA models, 4 different families** | **Claude reads findings back against source** |\n\nIf you only need a quick second opinion on one file, use `\u002Frival`. Crucible is for when the work matters enough to justify a $0.10–$0.75 audit run.\n\n---\n\n## Install\n\nCrucible is a Claude Code skill. It's a folder you drop into your skills directory.\n\n```bash\n# 1. Clone into your Claude Code skills dir\ngit clone https:\u002F\u002Fgithub.com\u002FBambushu\u002Fcrucible.git ~\u002F.claude\u002Fskills\u002Fcrucible\n\n# 2. Set your OpenRouter API key (https:\u002F\u002Fopenrouter.ai\u002Fkeys)\nexport OPENROUTER_API_KEY=sk-or-...\n# add the line to ~\u002F.zshrc or ~\u002F.bashrc to persist\n\n# 3. Restart Claude Code so it picks up the new skill\n```\n\nVerify it's loaded by typing `\u002F` in any Claude Code session — `crucible` should appear in the slash-command list.\n\n**Requirements:**\n- [Claude Code](https:\u002F\u002Fclaude.com\u002Fclaude-code) (latest)\n- [OpenRouter](https:\u002F\u002Fopenrouter.ai) account with credit (the panel runs paid models; budget ~$0.30 for a typical PR-sized review)\n- Python 3.9+ (used by the bundled orchestrator and report builder)\n- `git` (used for diff scope resolution)\n\n---\n\n## How it works\n\nThe metaphor: a crucible is a vessel that holds material under heat from multiple sources until only what survives the test remains.\n\n```\n                  ╲   ╱                ← model A (e.g. DeepSeek)  finds\n                   ╲ ╱                 ← model B (e.g. Gemini)    validates + adds\n              ┌──── ◉ ────┐            ← model C (e.g. Kimi)      consolidates + ranks\n              │   FILE    │            ← model D (e.g. MiniMax)   final pass\n              └───────────┘\n                  ╱   ╲\n```\n\nEach file passes through the chain in order. Each model sees the prior model's findings and either validates, contests, or adds. The last model in the chain emits the consolidated severity-ranked output.\n\nIn `--blind` mode, the same panel runs in parallel and never sees each other's output. Findings that overlap on file + line + topic become \"consensus\" findings ranked higher.\n\nAfter every file is done, one final pass takes the project tree + every per-file finding's title and looks for cross-file architectural issues that no single-file pass could catch. Then Claude does the verification step described above.\n\n---\n\n## Cost\n\nThe default panel runs paid SOTA models. Rough numbers:\n\n| Scope | Files | Calls | Wall clock | Approx cost |\n|---|---|---|---|---|\n| Small diff | 5 | 20 | ~3 min | **$0.01–$0.05** |\n| Typical PR | 20 | 80 | ~10 min | **$0.10–$0.20** |\n| Full feature branch | 50 | 200 | ~25 min | **$0.30–$0.50** |\n| Whole-repo deep audit | 100 | 400 | ~60 min | **$0.50–$0.75** |\n\nPre-flight always shows the estimate before kicking off, and pauses for confirmation if scope crosses any of: > 10 files, > 30 calls, any single file > 2000 lines, or family-diversity warning. Under those thresholds it just runs.\n\nYou can swap to free-tier models (`\u002Frival`'s panel) by deleting `~\u002F.crucible\u002Fmodels.json`. Crucible will fall back to the free roster and warn you it's running in degraded mode.\n\n---\n\n## Modes\n\n```\n\u002Fcrucible                              # default: diff, sequential 4-model chain\n\u002Fcrucible --all                        # whole repo (with safe excludes)\n\u002Fcrucible --paths \"src\u002Fapi\u002F**\u002F*.ts\"    # glob\n\u002Fcrucible --diff main...HEAD           # specific git range\n\u002Fcrucible --files src\u002Fauth.ts src\u002Fdb.ts\n\u002Fcrucible --deep                       # deeper sequential chain\n\u002Fcrucible --blind                      # parallel-independent (consensus mode)\n\u002Fcrucible --models 2                   # smaller panel (1–4)\n\u002Fcrucible --no-meta                    # skip cross-file architectural pass\n\u002Fcrucible --include-tests              # don't skip *.test.* \u002F *.spec.*\n\u002Fcrucible --resume \u003Crun-id>            # resume an interrupted run\n\u002Fcrucible --deployment-context \"...\"   # free-text scoping (see below)\n```\n\nCombine freely: `\u002Fcrucible --all --deep --blind` runs the full repo with three models per file independently, then does consensus dedup.\n\n### `--deployment-context`\n\nThis is the highest-leverage flag. Frontier models default to \"this code could run anywhere\" and routinely flag concerns that don't apply to your actual deployment shape — multi-worker auth, multi-region race conditions, public-internet hardening — when the code is a desktop sidecar bound to localhost.\n\n```bash\n\u002Fcrucible --deployment-context \"Desktop Tauri sidecar bound to 127.0.0.1, single-process. Multi-worker uvicorn \u002F deployed-service findings are out of scope.\"\n```\n\nIn real runs this is the single biggest false-positive reduction.\n\n---\n\n## Sample report\n\n```markdown\n# Crucible Report — 2026-04-26-1532\n\nScope:    diff main...HEAD\nFiles reviewed:  12\nModels:   deepseek\u002Fdeepseek-v4-pro, google\u002Fgemini-3.1-pro-preview, moonshotai\u002Fkimi-k2.6, minimax\u002Fminimax-m2.7\nMode:     sequential\nDuration: 8m 14s\nTotal findings: 17 (2 critical, 5 high, 7 medium, 3 low)\n\n---\n\n## CRITICAL (2)\n\n### `src\u002Fapi\u002Fauth\u002Fsession.ts:48` — Session token comparison uses ==, vulnerable to timing attack\nModels flagged by:  deepseek-v4-pro, gemini-3.1-pro, kimi-k2.6\nCategory: security\nWhy it matters: An attacker who can measure response timing can recover the session\n                token byte-by-byte. `crypto.timingSafeEqual` is required here.\nFix: Replace `if (token == expected)` with\n     `if (crypto.timingSafeEqual(Buffer.from(token), Buffer.from(expected)))`.\n\n### `src\u002Fdb\u002Fmigrations\u002F0042.sql:1` — DROP COLUMN before backfill on 50M-row table\n...\n\n---\n\n## Verification Pass (Claude)\n\nVerified by:    claude-opus-4-7\nOS findings reviewed:  2 critical + 5 high\n\nConfirmed (5)\n- src\u002Fapi\u002Fauth\u002Fsession.ts:48  →  matched code at line; fix is sound\n- src\u002Fdb\u002Fmigrations\u002F0042.sql:1  →  table size is in fact ~50M rows per ANALYZE...\n\nRefined (1)\n- src\u002Fapi\u002Fupload.ts:112  →  bug is real but severity should be MEDIUM not HIGH\n                            (only triggers under multipart, not the current\n                            ingest path)\n\nDisputed \u002F False Positives (1)\n- src\u002Futils\u002Fredact.ts:23  →  panel flagged unsanitized regex, but the input\n                              already passes through `sanitizeUserInput` at\n                              line 8 of the calling site\n\nAdditional findings caught by Claude verifier (2)\n- src\u002Fapi\u002Fauth\u002Fsession.ts:71 — refresh-token rotation missing\n- src\u002Fdb\u002Frepo.ts:88 — missing FOR UPDATE on the read in this transaction\n```\n\n---\n\n## What's in the box\n\n```\ncrucible\u002F\n├── skill.md              # the full Claude Code skill spec (the brain)\n├── review-prompts.md     # the four prompt templates (pass 1\u002F2\u002F3 + meta)\n├── scripts\u002F\n│   ├── crucible-run.sh   # one-shot end-to-end wrapper\n│   ├── orchestrate.py    # per-file dispatch engine, OpenRouter calls\n│   ├── build_report.py   # aggregates per-file findings into report.md\n│   ├── compare-reports.py # diff two runs side-by-side\n│   ├── chunk-file.py     # language-aware splitter for files > 1500 lines\n│   └── discover-premium.sh # populates ~\u002F.crucible\u002Fmodels.json\n├── assets\u002F\n│   └── hero.png\n├── LICENSE\n└── README.md\n```\n\nThe `skill.md` file is the canonical spec. If you want to understand exactly what Crucible does on every phase, read that. The README is the marketing-facing version.\n\n---\n\n## Configuration\n\nTwo files control behaviour outside the flags:\n\n- **`~\u002F.crucible\u002Fmodels.json`** — the model panel. Auto-generated by `scripts\u002Fdiscover-premium.sh` (runs once per ~3 days). To force a refresh: `bash ~\u002F.claude\u002Fskills\u002Fcrucible\u002Fscripts\u002Fdiscover-premium.sh`. To pin specific models, edit the file by hand; Crucible will use whatever is there.\n- **`review-prompts.md`** — the prompt templates. Edit if you want to bias the panel toward, say, performance over security, or toward your specific stack.\n\nRun cache lives in the project under `.crucible-cache\u002F\u003Crun-id>\u002F`. Add it to your `.gitignore` (Crucible auto-adds it on first run).\n\n---\n\n## FAQ\n\n**Why a Claude Code skill instead of a standalone CLI?**\nBecause the verification phase needs Claude. The whole point is that the OS panel finds, and Claude verifies. Running it inside Claude Code keeps the verifier and the panel in the same loop with full source access, file-by-file.\n\n**Why OpenRouter instead of calling each model's native API?**\nOne key, one billing, one rate-limit story, and it's the cleanest place to compare frontier models from different vendors as they ship. If your favorite model isn't on OpenRouter you can edit `scripts\u002Forchestrate.py` to add a custom backend, but the default works for ~95% of users.\n\n**How is this different from `\u002Fraadsmid`?**\nRaadSmid spins up four Claude personas with different lenses (security, performance, architecture, user-experience). It's fast and free but every persona is the same model. Crucible spins up four genuinely different models from four different vendor families, then has Claude verify. RaadSmid is a quick second-opinion. Crucible is a deep audit.\n\n**Can I run it in CI?**\nYes. `scripts\u002Fcrucible-run.sh` is a single-command end-to-end wrapper that runs without the LLM-driven orchestration. Pipe its `report.md` into a PR comment, or fail the build on critical findings. The verification phase is skipped in this mode (it requires Claude); you get the panel's raw output.\n\n**What does it cost to run on this repo?**\n$0.18, roughly. Try it.\n\n---\n\n## Contributing\n\nIssues and PRs welcome. Two especially good directions:\n\n- **More vendor families.** The panel is hardcoded to prefer DeepSeek → Google → Moonshot → MiniMax → Qwen → GLM → Llama-4. PRs that add genuinely-different new families (xAI, Mistral, Reka, etc.) and tune the rotation are great.\n- **Auto-chunking.** Files over 1500 lines aren't auto-split yet. The scaffolding is in `scripts\u002Fchunk-file.py`; wiring it into `orchestrate.py` is the next obvious win.\n\n---\n\n## License\n\nMIT. See [LICENSE](LICENSE).\n\n---\n\n\u003Cdiv align=\"center\">\n\nBuilt at [Bambushu](https:\u002F\u002Fgithub.com\u002FBambushu). Inspired by [`\u002Frival`](https:\u002F\u002Fgithub.com\u002FBambushu\u002Frival) (single-file adversarial review) and the realisation that consensus across one model family is not actually consensus.\n\n\u003C\u002Fdiv>\n","Crucible 是一个基于代码库级别的对抗性审查工具，通过一组前沿模型对代码进行逐文件审查。其核心功能包括使用DeepSeek、Gemini、Kimi和MiniMax四种不同架构的先进模型对每个文件进行两轮审查，并由Claude模型验证审查结果，最终生成按严重程度排序的报告。该工具采用Python编写，支持MIT许可证。Crucible适合需要进行全面代码审查以确保质量和安全性的场景，如项目开发中的代码审计或准备发布前的安全检查。",2,"2026-06-11 03:59:30","CREATED_QUERY"]