[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-77418":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":14,"starSnapshotCount":14,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},77418,"audit","evilsocket\u002Faudit","evilsocket","An 8-stage vulnerability-discovery agent.",null,"Python",608,89,1,0,4,18,378,16,9.86,"MIT License",false,"main",true,[],"2026-06-12 02:03:43","# audit\n\nAn 8-stage vulnerability-discovery agent, driven by your **Claude Pro \u002F Max\nsubscription** through the official Claude Code Agent SDK. Many narrow agents,\ndeliberate disagreement, and an explicit reachability gate.\n\nMIT-licensed. No API key needed if you already use `claude login`.\n\n## Origin\n\nThis project is a from-scratch reimplementation of the pipeline described in\nCloudflare's [Project Glasswing](https:\u002F\u002Fblog.cloudflare.com\u002Fcyber-frontier-models\u002F)\npost, which tested Anthropic's Mythos preview LLM against Cloudflare's own\ncodebase. The blog argues that real-world vulnerability discovery does **not**\ncome from asking one big model \"find bugs here\" — it comes from:\n\n1. **Many narrow agents** working in parallel on tightly-scoped questions\n   (\"Look for command injection in this specific function, with this trust\n   boundary above it\") rather than one exhaustive agent.\n2. **Deliberate disagreement** — a second agent, on a different model, that\n   tries to *disprove* the first agent's findings.\n3. **A reachability trace** as the gating step — most \"is this code buggy?\"\n   findings are noise unless an attacker-controlled input can actually reach\n   the sink from outside the system.\n4. **A feedback loop** so reachable bugs in one place automatically seed\n   hunts for the same pattern elsewhere.\n\nThis repo packages that pipeline into a runnable agent. The Cloudflare post\nshowed the architecture; this codebase ships the prompts, schemas, state\nstore, and orchestrator.\n\n## The 8 stages\n\n![Vulnerability discovery harness — 8 stages](https:\u002F\u002Fraw.githubusercontent.com\u002Fevilsocket\u002Faudit\u002Fmain\u002Fdocs\u002Fpipeline.png)\n\n\u003Csub>Diagram from Cloudflare's [Project Glasswing](https:\u002F\u002Fblog.cloudflare.com\u002Fcyber-frontier-models\u002F) post, reproduced here for reference.\u003C\u002Fsub>\n\n| # | Stage    | Default model | Purpose |\n|---|----------|---------------|---------|\n| 1 | Recon    | Opus 4.7  | Map the repo, emit narrowly-scoped Hunt tasks |\n| 2 | Hunt     | Sonnet 4.6 | One attack class per agent; compile\u002Frun PoCs |\n| 3 | Validate | Opus 4.7  | Adversarial re-read; tries to **disprove** (different model from Hunt) |\n| 4 | Gapfill  | Sonnet 4.6 | Re-queue under-covered areas |\n| 5 | Dedupe   | Sonnet 4.6 | Cluster findings by root cause |\n| 6 | Trace    | Opus 4.7  | Prove attacker-controlled input reaches the sink |\n| 7 | Feedback | Sonnet 4.6 | Turn reachable traces into new Hunt tasks |\n| 8 | Report   | Sonnet 4.6 | Schema-validated structured report |\n\nEach stage is one markdown prompt in `prompts\u002F` + one JSON Schema in\n`schemas\u002F`. The orchestrator passes the schema into the system prompt so\nevery output is shape-stable on the first try.\n\n## Quickstart\n\n```bash\n# 1. Install\npython -m venv .venv && source .venv\u002Fbin\u002Factivate\npip install -e .\n\n# 2. Auth (pick one)\n#    (a) Already logged in via claude login? You're done.\n#    (b) Or generate a 1-year OAuth token for CI \u002F non-interactive use:\nclaude setup-token\necho \"CLAUDE_CODE_OAUTH_TOKEN=\u003Cpaste>\" > .env\n\n# 3. Verify\naudit auth-check\n\n# 4. Run\naudit run --repo \u002Fpath\u002Fto\u002Ftarget --run-id my-run\naudit status --run-id my-run\naudit report --run-id my-run --format md > report.md\n```\n\nBy default the agent uses **subscription billing** via your Claude.ai\nlogin — it does **not** call the metered Anthropic API. The on-disk auth\nmodule scrubs `ANTHROPIC_API_KEY` from the environment so it can't\nsilently route around the OAuth flow.\n\n## Using a different model \u002F provider\n\nThe auth module picks one of three modes, in this order:\n\n1. **LLM gateway** (OpenRouter, custom proxy, etc.) — when\n   `ANTHROPIC_BASE_URL` points away from `anthropic.com` AND\n   `ANTHROPIC_AUTH_TOKEN` is set. The gateway env is left intact;\n   only `ANTHROPIC_API_KEY` is scrubbed (it would otherwise outrank the\n   gateway token).\n2. **Subscription OAuth (headless)** — `CLAUDE_CODE_OAUTH_TOKEN` from\n   `claude setup-token`. Best for CI.\n3. **Subscription OAuth (interactive)** — `~\u002F.claude\u002F.credentials.json`\n   from `claude login`. Best for local dev.\n\n### OpenRouter\n\nOpenRouter exposes Claude-compatible Anthropic-API endpoints behind its\nown credit system; that lets you spend OpenRouter credits instead of an\nAnthropic subscription, and gives you access to Sonnet\u002FOpus *and* other\nmodels through the same SDK path. See [OpenRouter's Agent SDK guide](https:\u002F\u002Fopenrouter.ai\u002Fdocs\u002Fguides\u002Fcommunity\u002Fanthropic-agent-sdk).\n\n```bash\nexport ANTHROPIC_BASE_URL=\"https:\u002F\u002Fopenrouter.ai\u002Fapi\"\nexport ANTHROPIC_AUTH_TOKEN=\"$OPENROUTER_API_KEY\"\nexport ANTHROPIC_API_KEY=\"\"           # must be explicitly empty \u002F unset\n# optional: pick a non-Anthropic model\nexport ANTHROPIC_MODEL=\"anthropic\u002Fclaude-sonnet-4-6\"\n# or e.g.: ANTHROPIC_MODEL=\"openai\u002Fgpt-5\"\n#         ANTHROPIC_MODEL=\"google\u002Fgemini-2.5-pro\"\n#         ANTHROPIC_MODEL=\"qwen\u002Fqwen3-coder-480b\"\n\naudit auth-check                       # confirms \"using LLM gateway at https:\u002F\u002Fopenrouter.ai\u002Fapi\"\naudit run --repo \u002Fpath\u002Fto\u002Ftarget --run-id orun --max-cost-usd 30\n```\n\nCaveats:\n- Per-stage model overrides in `config\u002Fstages.yaml` are model **names**\n  (e.g. `claude-opus-4-7`); OpenRouter accepts slash-prefixed forms like\n  `anthropic\u002Fclaude-opus-4-7`. Edit the YAML if you want different\n  providers per stage. Otherwise `ANTHROPIC_MODEL` forces every stage\n  onto one model.\n- Non-Claude models may not produce schema-compliant JSON as reliably.\n  The runner's schema-validation + repair turn still applies; quality\n  varies by model.\n- Tool-use semantics (Read\u002FGrep\u002FGlob\u002FBash) are part of the Claude Code\n  CLI, not the model — they work as long as the gateway speaks the\n  Anthropic Messages API.\n\n### Other gateways \u002F cloud providers\n\nSame recipe — anything that exposes the Anthropic Messages API at a URL\n+ a bearer token works:\n\n```bash\nexport ANTHROPIC_BASE_URL=\"https:\u002F\u002Fyour-proxy.example.com\"\nexport ANTHROPIC_AUTH_TOKEN=\"$YOUR_TOKEN\"\nunset ANTHROPIC_API_KEY\n```\n\nFor Amazon Bedrock \u002F Google Vertex \u002F Microsoft Foundry, Claude Code has\nfirst-class env-var flags (`CLAUDE_CODE_USE_BEDROCK=1` etc.) that\noutrank everything else. See the [Claude Code auth docs](https:\u002F\u002Fcode.claude.com\u002Fdocs\u002Fen\u002Fauthentication).\n\n## Cost containment\n\nA real production codebase can produce 15-50 Hunt tasks and 25+ findings to\nvalidate. At default concurrency this gets expensive. Flags to keep it sane:\n\n```bash\naudit run --repo \u002Fpath\u002Fto\u002Ftarget \\\n  --max-concurrency 1 \\           # one claude subprocess at a time\n  --max-recon-tasks 15 \\          # cap initial Hunt fanout\n  --max-cost-usd 30               # abort cleanly if exceeded\n```\n\nThe budget guard fires between *and* within stages — a per-task check in\nHunt cooperatively aborts rather than running 30 more tasks past the cap.\n\n## Live-target reproduction (optional)\n\nIf the target has a running deployment, point the agents at it. Hunt now\n**reproduces** each finding against the live service instead of compiling\na local PoC, Validate **rejects** findings that don't reproduce, and Trace\n**confirms** reachability with real HTTP round-trips. The static path\nremains available — these flags are opt-in.\n\n```bash\naudit run --repo \u002Fpath\u002Fto\u002Ftarget --run-id live \\\n  --max-concurrency 1 --max-cost-usd 30 \\\n  --target-url http:\u002F\u002Fserver.local:8888 \\\n  --target-creds email=admin@system.com \\\n  --target-creds password=changechangeme\n```\n\nRules the agents follow when `--target-url` is set:\n- Network egress is restricted to that host + `127.0.0.1`. No other external\n  hosts.\n- A finding that doesn't reproduce against the live target is dropped or\n  rejected (depending on stage) — \"no fabrication\".\n- Credentials flow into every relevant stage's user_input as a dict.\n\n## Scope notes (optional)\n\nTargets often have intentionally-loose-by-design surfaces that aren't bugs\n(e.g. plaintext API keys when that's a feature, test-only Mailpit endpoints,\nanonymous-analytics ingest). Drop them in a text file and pass it in — the\nnotes are appended verbatim to every stage's user_input, and Recon \u002F Hunt \u002F\nValidate honor exclusions you list.\n\n```bash\naudit run --repo \u002Fpath\u002Fto\u002Ftarget --scope-notes target_scope.md\n```\n\nExample `target_scope.md`:\n\n```markdown\n- Mailpit (port 1025) is test-only; ignore.\n- Plaintext API keys in the database are a required feature.\n- Don't flag rate-limit absence on anonymous \u002Fping endpoints.\n- Only consider critical\u002Fhigh severity.\n```\n\n## Recon mines git history\n\nRecon greps the git history for past security patches\n(`CVE`, `sec:`, `fix.*auth`, `sanitize`, …) — patched files are hardened,\nbut **sibling files with the same idiom often aren't**. Findings get seeded\nagainst the unpatched copies. Adds zero cost on repos without that pattern;\ncatches real cross-component bugs on repos that have it.\n\n## Logic chains\n\nThe pipeline's default is one-attack-class-per-task (the Cloudflare paper's\nnarrow-scope rule). Recon can also emit `logic_chain` tasks for high-impact\nmulti-component paths (auth-bypass + IDOR + path-traversal that compose into\nRCE, etc.) — one chain per task, with the `scope_hint` naming the specific\nchain. This is the one allowed exception to single-attack-class scoping.\n\n## Layout\n\n```\nprompts\u002F        8 stage prompts (markdown, loaded as system prompts)\nschemas\u002F        9 JSON schemas — every agent output is validated\nconfig\u002F         stages.yaml — model + concurrency + tool allowlist per stage\naudit\u002F          Python package\n  auth.py       OAuth check + ANTHROPIC_API_KEY scrubbing\n  state.py      SQLite DAO (runs, tasks, findings, traces, dedupe, costs)\n  runner.py     claude-agent-sdk wrapper with schema validation + repair turn\n  orchestrator.py pipeline driver\n  stages\u002F       one module per stage\nwork\u002F           per-Hunt-task scratch dirs (sandbox for PoC compile\u002Frun)\nresults\u002F        JSONL artifacts per stage + final report.json\nstate.db        SQLite (gitignored)\n```\n\n## Safety\n\nHunt agents have Bash and run inside per-task scratch dirs. They are **not**\nsandboxed at the OS level. Run the audit inside a disposable VM or container\nwhen you don't trust the target source — a target with malicious build\nscripts could otherwise execute on your host during PoC compilation.\n\nThe agent reads everything you `--add-dir`, including any `.env` or\n`secrets\u002F` directories in the target. Outputs land in `results\u002F\u003Crun-id>\u002F`\nwhich is `.gitignore`d but **not** scrubbed of those reads.\n\n## License\n\n[MIT](LICENSE). Reuse freely. No warranty.\n\n## Acknowledgements\n\n- The pipeline design is from Cloudflare's [Project Glasswing](https:\u002F\u002Fblog.cloudflare.com\u002Fcyber-frontier-models\u002F)\n  blog post. The credit for the architecture goes there.\n- Built on the official [Claude Code Agent SDK](https:\u002F\u002Fcode.claude.com\u002Fdocs\u002Fen\u002Fagent-sdk\u002Foverview).\n","evilsocket\u002Faudit 是一个八阶段的漏洞发现代理工具，通过Claude Pro \u002F Max订阅和官方Claude Code Agent SDK驱动。其核心功能包括多个专门的代理并行工作、故意的分歧验证以及明确的可达性检查等八个步骤，以提高漏洞检测的准确性和效率。该工具使用Python编写，并采用MIT许可证开源。适用于需要对代码库进行深入安全审计的场景，特别是对于那些希望减少误报并确保漏洞可被实际利用的开发者或安全团队而言，是一个非常实用的选择。",2,"2026-06-11 03:55:25","CREATED_QUERY"]