[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80905":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":13,"stars7d":17,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":17,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":22,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":16,"starSnapshotCount":16,"syncStatus":14,"lastSyncTime":27,"discoverSource":28},80905,"CodeCome","pruiz\u002FCodeCome","pruiz","An Agentic vulnerability research harness for everybody..","",null,"Python",36,1,2,7,0,3,0.9,"GNU General Public License v3.0",false,"master",true,[],"2026-06-12 02:04:08","# CodeCome\n\n\u003Cimg src=\"CodeCome.png\" alt=\"CodeCome Logo\" width=\"300\">\n\n> The harness for building your own Mythos of vulnerability research at home.\n\n[![License: GPL-3.0-or-later OR AGPL-3.0-or-later](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-GPL--3.0--or--later%20OR%20AGPL--3.0--or--later-blue.svg)](#license)\n[![Status: early PoC](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstatus-early%20PoC-orange.svg)](#project-status)\n[![Python 3.10+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10%2B-blue.svg)](#prerequisites)\n[![Built on OpenCode](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fbuilt%20on-OpenCode-7b3fe4.svg)](https:\u002F\u002Fopencode.ai)\n\n## What is CodeCome?\n\n[CodeCome](https:\u002F\u002Fcodecome.ai) is the harness I built to let an AI agent help me audit source code without losing the trail.\n\nIt turns \"I think there might be a bug here\" into a structured Markdown finding, validates it inside a sandbox, escalates the ones that matter into working proof-of-concept exploits, and produces a report you can read, grep, and commit. It is not a scanner and not a pentest tool. Think of it as a **research methodology made executable**: the same six phases, the same artifact shapes, the same evidence rules — every time, for every target.\n\nThe whole audit lives on disk as plain Markdown and YAML. No database, no RAG, no ticketing system, nothing magical. If you can read a directory, you can review a CodeCome audit.\n\n## Screenshots\n\nThese screenshots are sanitized\u002Fredrawn from real CodeCome runs: enough to show the workflow, without leaking target-specific exploit details or credentials. Click any image to open it at full size.\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd width=\"25%\" align=\"center\">\n      \u003Ca href=\"docs\u002Fimages\u002Fscreenshots\u002Ffinding-queue.svg\">\n        \u003Cimg src=\"docs\u002Fimages\u002Fscreenshots\u002Ffinding-queue.svg\" alt=\"CodeCome finding queue\" width=\"240\">\n      \u003C\u002Fa>\n      \u003Cbr>\n      \u003Csub>\u003Cstrong>Finding queue\u003C\u002Fstrong>\u003Cbr>Reviewable hypotheses.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd width=\"25%\" align=\"center\">\n      \u003Ca href=\"docs\u002Fimages\u002Fscreenshots\u002Fagent-workflow.svg\">\n        \u003Cimg src=\"docs\u002Fimages\u002Fscreenshots\u002Fagent-workflow.svg\" alt=\"CodeCome agent workflow\" width=\"240\">\n      \u003C\u002Fa>\n      \u003Cbr>\n      \u003Csub>\u003Cstrong>Agent workflow\u003C\u002Fstrong>\u003Cbr>Agentic, but auditable.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd width=\"25%\" align=\"center\">\n      \u003Ca href=\"docs\u002Fimages\u002Fscreenshots\u002Fsandbox-validation.svg\">\n        \u003Cimg src=\"docs\u002Fimages\u002Fscreenshots\u002Fsandbox-validation.svg\" alt=\"CodeCome sandbox validation\" width=\"240\">\n      \u003C\u002Fa>\n      \u003Cbr>\n      \u003Csub>\u003Cstrong>Sandbox validation\u003C\u002Fstrong>\u003Cbr>Validation before belief.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd width=\"25%\" align=\"center\">\n      \u003Ca href=\"docs\u002Fimages\u002Fscreenshots\u002Fevidence-artifacts.svg\">\n        \u003Cimg src=\"docs\u002Fimages\u002Fscreenshots\u002Fevidence-artifacts.svg\" alt=\"CodeCome evidence artifacts\" width=\"240\">\n      \u003C\u002Fa>\n      \u003Cbr>\n      \u003Csub>\u003Cstrong>Evidence artifacts\u003C\u002Fstrong>\u003Cbr>Evidence written to disk.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"25%\" align=\"center\">\n      \u003Ca href=\"docs\u002Fimages\u002Fscreenshots\u002Fsandbox-script.svg\">\n        \u003Cimg src=\"docs\u002Fimages\u002Fscreenshots\u002Fsandbox-script.svg\" alt=\"CodeCome generated validation helper\" width=\"240\">\n      \u003C\u002Fa>\n      \u003Cbr>\n      \u003Csub>\u003Cstrong>Generated helpers\u003C\u002Fstrong>\u003Cbr>Sandbox scripts on demand.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd width=\"25%\" align=\"center\">\n      \u003Ca href=\"docs\u002Fimages\u002Fscreenshots\u002Fexploit-notes-sanitized.svg\">\n        \u003Cimg src=\"docs\u002Fimages\u002Fscreenshots\u002Fexploit-notes-sanitized.svg\" alt=\"CodeCome exploit development notes\" width=\"240\">\n      \u003C\u002Fa>\n      \u003Cbr>\n      \u003Csub>\u003Cstrong>Exploit notes\u003C\u002Fstrong>\u003Cbr>Readable PoC writeups.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd width=\"25%\" align=\"center\">\n      \u003Ca href=\"docs\u002Fimages\u002Fscreenshots\u002Fcounter-analysis.svg\">\n        \u003Cimg src=\"docs\u002Fimages\u002Fscreenshots\u002Fcounter-analysis.svg\" alt=\"CodeCome counter-analysis task list\" width=\"240\">\n      \u003C\u002Fa>\n      \u003Cbr>\n      \u003Csub>\u003Cstrong>Counter-analysis\u003C\u002Fstrong>\u003Cbr>Try to disprove first.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd width=\"25%\" align=\"center\">\n      \u003Ca href=\"docs\u002Fimages\u002Fscreenshots\u002Fexploit-impact.svg\">\n        \u003Cimg src=\"docs\u002Fimages\u002Fscreenshots\u002Fexploit-impact.svg\" alt=\"CodeCome exploit impact summary\" width=\"240\">\n      \u003C\u002Fa>\n      \u003Cbr>\n      \u003Csub>\u003Cstrong>Impact summary\u003C\u002Fstrong>\u003Cbr>Exploited findings with artifacts.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\nA recorded run is also planned:\n\n\u003C!-- TODO: replace with real asciinema cast -->\n\u003C!-- [![asciicast](https:\u002F\u002Fasciinema.org\u002Fa\u002FPLACEHOLDER.svg)](https:\u002F\u002Fasciinema.org\u002Fa\u002FPLACEHOLDER) -->\n\n## Prerequisites\n\nCodeCome runs on top of [OpenCode](https:\u002F\u002Fopencode.ai), an open-source AI coding agent.\n\n1. **Install OpenCode** — follow the [installation guide](https:\u002F\u002Fopencode.ai\u002Fdocs\u002F#install).\n2. **Configure a provider** — connect at least one LLM provider with an API key. See [provider setup](https:\u002F\u002Fopencode.ai\u002Fdocs\u002F#configure).\n3. **Python 3.10+** — needed for workspace tooling (`make venv` creates a local virtualenv).\n4. **GNU Make** — drives the workflow.\n5. **Docker** — required for the sandboxed validation environment.\n6. **Optional: exploit recording tools** — for Phase 5 visual evidence:\n   - `asciinema` — terminal recordings.\n   - `agg` — renders `.cast` files to GIFs (CodeCome falls back to a Docker container if missing).\n   - `ffmpeg` and `xvfb` (or `xvfb-run`) — for GUI\u002Fbrowser exploits.\n\n`make check` will warn about missing optional tools, but the core workflow runs fine without them.\n\nBefore pointing CodeCome at code you don't fully trust, read [Safety considerations](#safety-considerations).\n\n## Quick start\n\nWhat CodeCome needs from you is simple: **drop a source tree under `src\u002F`**, tell it the project name in `codecome.yml`, and run the phases.\n\nA few things to know up front about `src\u002F`:\n\n- It can be a copied source tree, a git submodule, a checked-out repo, an extracted archive, or a benchmark corpus. CodeCome doesn't care which.\n- The harness **will try to build, test, and run the target inside the sandbox**. That's the point — validation happens against a real build. Phase 1b bootstraps a Docker-based sandbox suited to your stack (Python, C\u002FC++, .NET, PHP, etc.).\n- If your project has unusual build steps, vendored directories, or generated code, you'll want to adjust `audit.scope` (include\u002Fexclude globs) and `audit.focus` (vulnerability classes to prioritize) in `codecome.yml`. The defaults work for most projects, but five minutes spent here pays off.\n- You don't have to do everything at once. Run Phase 1, look at the recon notes, tweak `codecome.yml`, then keep going. Phases are designed to be re-runnable.\n\nWhen you're ready:\n\n    make venv                       # set up the local Python virtualenv\n    make check                      # sanity-check the workspace\n    make phase-1                    # recon + sandbox bootstrap\n    make phase-2                    # generate candidate findings\n    make phase-3                    # counter-analysis (dedup \u002F reject)\n    make phase-4 FINDING=CC-0001    # validate one finding\n    make phase-5 FINDING=CC-0001    # build a PoC for a confirmed finding\n    make phase-6                    # generate the report\n\nThere are convenience targets too — `make validate-all`, `make exploit-all`, `make sweep` — but you almost never want to use them on a fresh project. Walk one finding through end-to-end first; you'll learn more from one CC-0001 than from twenty PENDING ones.\n\n## How it works\n\nSix phases. Each one is a `make` target. Each one writes to disk.\n\n1. **Recon (`make phase-1`)** — agent reads `src\u002F`, infers the target type, languages, build model, attack surface, and writes notes under `itemdb\u002Fnotes\u002F`. Also bootstraps a Docker sandbox suited to the stack.\n2. **Hypothesis (`make phase-2`)** — agent writes candidate findings under `itemdb\u002Ffindings\u002FPENDING\u002F`. Each one points at specific code, sources, sinks, and a trust boundary.\n3. **Counter-analysis (`make phase-3`)** — a reviewer pass tries to disprove or deduplicate findings. Weak ones move to `REJECTED\u002F`, repeats to `DUPLICATE\u002F`.\n4. **Validation (`make phase-4 FINDING=CC-XXXX`)** — one finding at a time, in the sandbox. Build the target, write a small PoC, capture evidence, decide CONFIRMED or REJECTED.\n5. **Exploit (`make phase-5 FINDING=CC-XXXX`)** — for confirmed findings worth escalating, build a real PoC that shows concrete impact: code execution, data exfiltration, privilege escalation. Severity gets adjusted based on what you actually demonstrate.\n6. **Reporting (`make phase-6`)** — generate a Markdown report grouping exploited and confirmed findings with evidence references.\n\nThe finding lifecycle:\n\n```mermaid\nstateDiagram-v2\n    [*] --> PENDING\n    PENDING --> CONFIRMED : evidence captured\n    PENDING --> REJECTED : disproved\n    PENDING --> DUPLICATE : already filed\n    CONFIRMED --> EXPLOITED : impact demonstrated\n    CONFIRMED --> [*] : not feasible to exploit\n    EXPLOITED --> [*]\n    REJECTED --> [*]\n    DUPLICATE --> [*]\n```\n\nPhases 1–3 are batch operations. Phases 4 and 5 are run **per finding** — that's intentional. One finding at a time keeps evidence traceable and lets you mix model choices, prompt overrides, and rerun loops without polluting the audit.\n\n## Who is this for?\n\n- **Solo security researchers** who want LLM help on source-code audits but refuse to trust an opaque chat session.\n- **Blue and red teamers** doing internal source-code review and looking for a workflow that produces commit-friendly artifacts.\n- **People studying LLM-assisted security work** — the workspace is intentionally simple enough to instrument, fork, or compare across models.\n\nIf you want a one-click vulnerability scanner, this is not it. CodeCome is for people who want **the model to help them think**, not to replace the thinking.\n\n## Why I built it\n\nAfter watching too many chat sessions produce confident-sounding \"potential SQL injection\" claims with zero evidence, I wanted a workflow where:\n\n- every claim is a file on disk,\n- every file points at specific lines of code,\n- every finding either has evidence or gets rejected,\n- and the whole thing is reviewable by a human in an afternoon.\n\nCodeCome is the harness I wish I'd had the first time I tried to use an agent for vulnerability research.\n\n## What a finding looks like\n\nHere is what one of those Markdown files actually looks like — trimmed from a example CC-0022 audit (SQL injection in Apps's `user.get` JSON-RPC API):\n\n```markdown\n---\nid: \"CC-0022\"\ntitle: \"SQL injection via unvalidated selectRole option in user.get JSON-RPC API\"\nstatus: \"EXPLOITED\"\nseverity: \"CRITICAL\"\nconfidence: \"CONFIRMED\"\ncategory: \"SQL Injection\"\ncwe:\n  - \"CWE-89\"\nlanguage: \"php\"\ntarget_area: \"JSON-RPC API user.get method\"\nfiles:\n  - \"src\u002Fapp-1.4.1\u002Fui\u002Finclude\u002Fclasses\u002Fapi\u002Fservices\u002FCUser.php\"\nsymbols:\n  - \"CUser::addRelatedObjects()\"\nsources:\n  - \"JSON-RPC options['selectRole'] parameter\"\nsinks:\n  - \"DBselect() at CUser.php:2243-2248\"\ntrust_boundary: \"authenticated API user -> raw SQL SELECT clause\"\nvalidation:\n  status: \"CONFIRMED\"\n  methods: [\"http_exploit\", \"runtime_reproduction\"]\n  evidence_dir: \"itemdb\u002Fevidence\u002FCC-0022\"\nexploitation:\n  status: \"DEMONSTRATED\"\n  severity_before: \"HIGH\"\n  severity_after: \"CRITICAL\"\n  artifacts_dir: \"itemdb\u002Fevidence\u002FCC-0022\u002Fexploits\"\n---\n\n# Summary\n\nThe `user.get` JSON-RPC API accepts a `selectRole` array whose elements are\nconcatenated into a SQL SELECT clause via `implode(',r.', ...)` without any\nallowlist check. Authenticated users at the lowest privilege level can inject\narbitrary SQL fragments and extract data from the database.\n\n# Affected code\n\n`CUser::addRelatedObjects()` at `CUser.php:2238-2248` builds raw SQL from\n`$options['selectRole']` after `zbx_array_merge()` skipped input validation.\n\n# Counter-analysis\n\n- `CApiInputValidator` is not used on this code path. Verified by reading\n  `CUser::get()` at line 91.\n- `dbConditionInt()` only sanitises `$userIds`, not the SELECT clause.\n- No framework-level escaping of column lists in `DBselect()`.\n\n# Validation plan\n\nSend a `user.get` JSON-RPC request as a low-privilege user with\n`selectRole: [\"roleid,(SELECT version())\"]` and observe the version string\nreturned inline. Evidence under `itemdb\u002Fevidence\u002FCC-0022\u002F`.\n```\n\nThat single file is the entire interface between the model and you: a hypothesis with enough detail to either disprove it, validate it, or hand it to a developer.\n\n## Workspace layout\n\n    .\n    ├── README.md                # you are here\n    ├── AGENTS.md                # rules the agents follow\n    ├── codecome.yml             # project + audit configuration\n    ├── src\u002F                     # target source code\n    ├── sandbox\u002F                 # Docker-based validation environment\n    ├── itemdb\u002F                  # findings, evidence, notes, reports\n    ├── runs\u002F                    # run summaries and transcripts\n    ├── templates\u002F               # finding, evidence, report templates\n    ├── tools\u002F                   # Python helper scripts\n    ├── prompts\u002F                 # reusable phase prompts\n    ├── docs\u002F                    # deeper documentation\n    └── .opencode\u002F               # agents and skills\n\n`itemdb\u002F` is the heart of an audit. Everything important lives there:\n\n- `itemdb\u002Fnotes\u002F` — reconnaissance notes (target profile, attack surface, build model, trust boundaries, …)\n- `itemdb\u002Ffindings\u002FPENDING|CONFIRMED|EXPLOITED|REJECTED|DUPLICATE\u002F` — findings by status.\n- `itemdb\u002Fevidence\u002F\u003Cfinding-id>\u002F` — validation evidence and PoCs.\n- `itemdb\u002Freports\u002F` — generated reports.\n\nAgents live under `.opencode\u002Fagents\u002F`:\n\n- `recon` — Phase 1\n- `auditor` — Phase 2\n- `reviewer` — Phase 3\n- `validator` — Phase 4\n- `exploiter` — Phase 5\n- `reporter` — Phase 6\n\n### `codecome.yml` at a glance\n\nThe shipped defaults work out of the box. The keys you'll most often touch:\n\n- `project.name` — identifies the target in output and reports.\n- `audit.scope` — include\u002Fexclude globs for which files agents inspect.\n- `audit.focus` — vulnerability classes to prioritize.\n- `audit.extra_prompts` — persistent per-phase prompt additions.\n- `agents.\u003Cname>.model` \u002F `.variant` — pin a specific model per phase (see [Model selection](#model-selection-and-rerunning-phases)).\n- `environment` — sandbox paths and scripts.\n- `validation` — confirmation policies, allowed write paths, validation methods.\n\n## Running the workflow\n\nRun phases through `make` targets — they handle readiness gates and agent selection for you.\n\n### Phase 1 — reconnaissance + sandbox bootstrap\n\n    make phase-1\n\nTwo things happen together:\n\n- **1a (recon)** — notes written under `itemdb\u002Fnotes\u002F`.\n- **1b (sandbox bootstrap)** — picks a curated baseline from `templates\u002Fsandboxes\u002F\u003Cid>\u002F`, applies it to `sandbox\u002F` (with marker substitution), validates it, and writes `itemdb\u002Fnotes\u002Fsandbox-plan.md` plus `sandbox\u002FCODECOME-GENERATED.md`.\n\n`sandbox\u002F` is semi-ephemeral; Phase 1b regenerates its contents based on what is in `src\u002F`.\n\nBootstrap helpers:\n\n    make sandbox-list\n    make sandbox-detect\n    make sandbox-inspect ID=python\n    make sandbox-bootstrap ID=python\n    make sandbox-validate\n    make sandbox-regenerate\n    make sandbox-status\n\nSandbox runtime helpers (one target per capability):\n\n    make sandbox-setup        # setup.sh or `docker compose build`\n    make sandbox-up           # start\n    make sandbox-check        # sanity\n    make sandbox-build        # build target\n    make sandbox-test         # run target tests\n    make sandbox-down         # stop\n    make sandbox-shell        # open a shell\n    make sandbox-logs         # tail logs\n    make sandbox-clean        # clean runtime artifacts\n    make sandbox-reset        # reset to a known state\n\nSee `docs\u002Fsandbox.md` for the full bootstrap workflow.\n\n### Phase 2 — hypothesis generation\n\n    make phase-2\n\nCreates candidate findings under `itemdb\u002Ffindings\u002FPENDING\u002F`. Gated by the sandbox: blocks if `sandbox\u002F` is missing or if the most recent validation failed. Override with `CODECOME_ALLOW_NO_SANDBOX=1`.\n\n#### Deep sweep (optional)\n\nA Deep Sweep runs the `auditor` agent **once per file**, forcing exhaustive line-by-line analysis. It complements the broad Phase 2 pass.\n\nWhen to use it:\n\n- Phase 1 flagged many score-4\u002F5 files and you want to be sure none were skipped.\n- Phase 2 produced few findings on a large codebase.\n- A specific subsystem deserves focused attention.\n- A confirmed finding suggests related files deserve a second look.\n\nTrade-off: token cost scales linearly with the number of files swept (one full agent session per file). Sweep on 10 high-risk files costs roughly as many tokens as 10 Phase 2 runs. It produces overlapping findings that Phase 3 has to deduplicate. Always preview first with `--dry-run`.\n\nHow it works: the sweep runner reads `itemdb\u002Fnotes\u002Ffile-risk-index.yml` (written by Phase 1), selects all files at score 4 or above (or the files matched by `FILE=`), writes one prompt per file under `tmp\u002Ffile-sweep-prompts\u002F`, then invokes the `auditor` agent once per file in sequence.\n\n    make list-risk-files                     # preview which files would be swept\n    python tools\u002Frun-sweep.py --dry-run      # show selected files and prompts, no agent calls\n    make sweep                               # sweep all files at score 4+\n    make sweep FILE=\"src\u002Fpath\u002Fto\u002Ffile.ext\"   # sweep a specific file\n    make sweep FILE=\"src\u002F**\u002F*.cs\"            # sweep all .cs files under src\u002F\n\nSweep findings overlap with Phase 2 output by design. Phase 3 deduplicates on semantic frontmatter fields (`sources`, `sinks`, `entry_points`, `trust_boundary`, `target_area`), so overlaps are merged gracefully.\n\nSee `docs\u002Ffile-risk-sweeps.md` for the full reference.\n\n### Phase 3 — counter-analysis\n\n    make phase-3\n\nReviews candidate findings. Moves weak findings to `itemdb\u002Ffindings\u002FREJECTED\u002F` and repeats to `itemdb\u002Ffindings\u002FDUPLICATE\u002F`.\n\n### Phase 4 — validation\n\n    make phase-4 FINDING=CC-0001\n\nOne finding at a time. Stores evidence under `itemdb\u002Fevidence\u002F\u003Cfinding-id>\u002F` and moves findings to `CONFIRMED\u002F` or `REJECTED\u002F`.\n\nTo run validation across all PENDING findings:\n\n    make validate-all\n\n### Phase 5 — exploit development\n\n    make phase-5 FINDING=CC-0001\n\nDevelops a working PoC for one confirmed finding. Artifacts go under `itemdb\u002Fevidence\u002F\u003Cfinding-id>\u002Fexploits\u002F`. The exploiter may adjust severity based on demonstrated impact, and may move findings to `EXPLOITED\u002F`.\n\nFor all confirmed findings that aren't already marked as not-feasible:\n\n    make exploit-all\n\n### Phase 6 — reporting\n\n    make phase-6\n\nA lightweight local report (no agent involved) is also available:\n\n    make report\n\nThe default report path is `itemdb\u002Freports\u002Freport.md`.\n\n## Customizing phase prompts\n\nExtra instructions can be appended to any phase prompt from three sources, applied in this order (all additive):\n\n1. **`codecome.yml`** — persistent per-phase instructions under `audit.extra_prompts`. Always applied when the phase runs.\n\n       audit:\n         extra_prompts:\n           reconnaissance: |\n             Focus sandbox on ASAN builds.\n             Skip fuzzing harness for now.\n\n2. **`PROMPT_EXTRA_FILE`** — path to a file whose content is appended.\n\n       make phase-1 PROMPT_EXTRA_FILE=my-notes.md\n\n3. **`PROMPT_EXTRA`** — inline text appended directly.\n\n       make phase-1 PROMPT_EXTRA=\"Also try clang for the sandbox setup.\"\n\nAll three can be combined in a single invocation.\n\n## Local helper commands\n\n    make help                                  # show all available commands\n    make check                                 # validate workspace\n    make status                                # show finding status counts\n    make findings                              # list findings\n    make findings STATUS=PENDING               # filter by status\n    make findings-create TITLE=\"Buffer overflow in parser\"\n    make findings-move FINDING=CC-0001 STATUS=CONFIRMED\n    make findings-evidence FINDING=CC-0001     # create evidence dir\n    make next-id                               # next free finding id\n    make frontmatter                           # validate finding frontmatter\n    make index                                 # regenerate finding index\n    make report                                # regenerate report\n    make list-risk-files                       # top-scoring risky files from index\n    make itemdb-reset                          # reset local audit artifacts\n    make sandbox-check                         # sandbox sanity\n    make sandbox-shell                         # open sandbox shell\n\n## Starting over\n\nIf you want a completely clean workspace, the safest option is to clone a fresh copy of CodeCome.\n\nIf you only want to clear local audit artifacts without recloning:\n\n    make itemdb-reset\n\nThis removes local notes, findings, evidence, reports, run summaries, and temporary artifacts, then recreates the expected `.gitkeep` files. Don't use it if you want to preserve prior audit work.\n\n## Advanced: wrapper internals\n\nBy default, phase targets use a CodeCome-owned styled wrapper around `opencode run --format json` so assistant output, tool calls, and tool results render with consistent colors and structure. The wrapper pretty-renders `read`, `write`, `edit`, `apply_patch`, `grep`, `glob`, `bash`, `todowrite`, and `skill` tool calls; all others get a generic JSON panel.\n\nThe wrapper also detects bash invocations of `tools\u002Fsandbox-bootstrap.py --format json …` (and `make sandbox-* BOOTSTRAP_ARGS='--format json'` wrappers) and renders them as a structured Sandbox panel with capability tables, validation tier summaries, and color-coded gate badges.\n\nSome models prefer to invoke CLI helpers via the bash tool instead of the OpenCode-native Read\u002FGrep\u002FGlob tools (e.g. `rtk read FILE`, `rtk grep PAT PATH`, `rtk ls`, plain `rg PAT`, `cat FILE`, `head -n N FILE`, `tail -n N FILE`, `find PATH`, `tree`). The wrapper detects those calls and routes their output through the matching styled renderer so the panels look the same regardless of how the agent invoked the operation. Pipelines, redirections, and command substitutions are intentionally left for the generic Bash panel.\n\nAll `make` targets that invoke Python tools expect a repo-local virtualenv at `.venv\u002F`. If it is missing or stale, the command will stop with a setup message telling you to run `make venv`.\n\n`make tests` runs the Python test suite under `tests\u002F` and validates finding YAML frontmatter via `tools\u002Fcheck-frontmatter.py`. This catches regressions such as malformed finding metadata that can break helper scripts.\n\n### Reusable prompts\n\nCodeCome ships reusable phase prompts under `prompts\u002F`:\n\n    prompts\u002Fphase-1-recon.md\n    prompts\u002Fphase-2-audit.md\n    prompts\u002Fphase-3-review.md\n    prompts\u002Fphase-4-validate.md\n    prompts\u002Fphase-5-exploit.md\n    prompts\u002Fphase-6-report.md\n    prompts\u002Fsweep.md\n\n### Wrapper environment variables\n\n    CODECOME_USE_WRAPPER=0              # bypass the styled wrapper\n    CODECOME_THINKING=1                 # show model reasoning\u002Fthinking blocks in output\n    CODECOME_THINKING=0                 # hide model reasoning\u002Fthinking blocks\n    CODECOME_RENDER_REASONING=0         # suppress on-screen Thinking panels (independent override)\n    CODECOME_REASONING_MAX_CHARS=4000   # truncate long reasoning blocks\n    CODECOME_SANDBOX_RENDER=0           # disable structured Sandbox panel\n    CODECOME_SANDBOX_VALIDATE_STDERR_LINES=20\n    CODECOME_SANDBOX_FILES_CAP=15\n    CODECOME_BOOTSTRAP_MAX_RETRIES=3    # agent remediation budget during bootstrap\n    CODECOME_BOOTSTRAP_DRY_RUN=1        # force --dry-run on sandbox apply\u002Fregenerate\n    CODECOME_BASH_SHIM_RENDER=0         # disable rtk\u002Fcat\u002Fhead\u002Ftail\u002Frg\u002Fls\u002Ffind\u002Ftree routing\n    CODECOME_BASH_SHIM_LS_STRIP_LONG_FORMAT=0\n    OPENCODE_ARGS='...'                 # extra flags for opencode run (forwarded directly when CODECOME_USE_WRAPPER=0; in wrapper mode only --model, --variant and --thinking are used)\n    CODECOME_MODEL=\u003Cid>                 # pin model per phase, e.g. anthropic\u002Fclaude-opus-4-7\n    CODECOME_MODEL_VARIANT=\u003Cv>          # pin model variant, e.g. high, max\n\n### Model resolution and thinking display\n\nThe wrapper resolves the effective model in this order:\n\n1. `OPENCODE_ARGS` (`--model …` \u002F `--variant …`)\n2. env (`CODECOME_MODEL`, `CODECOME_MODEL_VARIANT`)\n3. `codecome.yml` (`agents.\u003Cname>.model` \u002F `.variant`)\n4. the model used in your most recent OpenCode session for this project (best-effort, read from OpenCode's local DB)\n5. unknown\n\nThe chosen value is shown in the phase header banner along with its source.\n\nPer-provider thinking-display defaults:\n\n- `anthropic\u002F*` → off. Claude already interleaves thinking with normal `text` blocks via OpenCode's interleaved-thinking beta header, so `Assistant` panels already show the model's working.\n- `openai\u002F*`, `xai\u002F*`, `github-copilot\u002F*`, `groq\u002F*`, `cerebras\u002F*`, `google\u002F*`, `google-vertex\u002F*` → on.\n- Anything else (unknown \u002F future provider) → on. Cheaper to over-surface than under-surface in vulnerability research.\n\nOverride precedence: `CODECOME_THINKING` env > per-provider default. `CODECOME_RENDER_REASONING=0` acts as an independent escape hatch that suppresses rendering even when thinking is enabled. Some providers bill reasoning tokens; set `CODECOME_THINKING=0` per phase to opt out without losing the styled wrapper.\n\nPrint the full resolution table for any agent without launching a phase:\n\n    make show-model\n    make show-model AGENT=auditor\n\nThe wrapper currently targets OpenCode 1.14.39 or newer.\n\n### Manual invocation\n\nIf you prefer direct `opencode run` commands instead of `make` targets:\n\n    opencode run --agent recon \"$(cat prompts\u002Fphase-1-recon.md)\"\n    opencode run --agent auditor \"$(cat prompts\u002Fphase-2-audit.md)\"\n    opencode run --agent reviewer \"$(cat prompts\u002Fphase-3-review.md)\"\n    opencode run --agent validator \"$(sed 's#FINDING_PATH_OR_ID#CC-0001#g' prompts\u002Fphase-4-validate.md)\"\n    opencode run --agent exploiter \"$(sed 's#FINDING_PATH_OR_ID#CC-0001#g' prompts\u002Fphase-5-exploit.md)\"\n    opencode run --agent reporter \"$(cat prompts\u002Fphase-6-report.md)\"\n\n`make report` is a lightweight local summary generator. Use `make phase-6` when you want the full AI-written report flow.\n\nDirect manual `opencode run` usage remains unchanged. The styled wrapper is only used by `make phase-*` targets.\n\n## Design principles\n\n### Findings are artifacts\n\nEvery relevant issue must be written as a Markdown file. The model should not leave important security claims only in chat history or run transcripts.\n\n### Hypotheses are not confirmed bugs\n\nA plausible vulnerability is first a hypothesis. Confirmation requires evidence.\n\n### Impact must be demonstrated\n\nConfirmed vulnerabilities should have their real-world impact demonstrated through exploit development whenever feasible. Without this, developers may dismiss findings as theoretical or low-impact.\n\n### Counter-analysis is mandatory\n\nEvery finding includes an attempt to disprove it. The reviewer looks for unreachable code paths, input validation, authorization checks, framework-level protections, false assumptions, duplicate reports, and missing exploitability conditions.\n\n### Validation is sandboxed\n\nThe validator and exploiter may freely experiment inside `sandbox\u002F`, but should not modify target source code unless explicitly instructed.\n\n### The core is target-agnostic\n\nCodeCome adapts to whatever sits under `src\u002F`. Target-specific behavior lives in skills, adapters, notes, or config — not in the core workflow.\n\n## Model selection and rerunning phases\n\nThe model you pick has a real effect on the output. Some patterns I've found useful:\n\n- **Different models see different bugs.** Running Phase 2 with two different models on the same codebase usually produces two partially overlapping sets of findings. Phase 3 deduplicates them on semantic frontmatter fields, so it's safe to combine the runs.\n- **Different models for different phases.** Reasoning-heavy models (Opus, GPT reasoning variants, Gemini Pro reasoning) tend to do better on Phase 2 (audit) and Phase 5 (exploit). Fast workhorses are often enough for Phase 3 (counter-analysis) and Phase 6 (reporting). Pin per phase with `agents.\u003Cname>.model` in `codecome.yml`, or pass `CODECOME_MODEL=…` on the command line.\n- **Rerunning a phase with a second model.** You can re-run Phase 2 with another model and end up with extra findings — they go into `PENDING\u002F` alongside the existing ones. Phase 3 then folds duplicates into `DUPLICATE\u002F`. Same trick works for Phase 4: if model A can't reproduce a finding, model B sometimes can with a different sandbox approach.\n- **Deep sweep as a feedback amplifier.** `make sweep` runs the auditor once per high-risk file. It produces noisier output and burns more tokens, but it catches bugs that broad Phase 2 sometimes misses because the model gets the whole file in a single focused context window. Use it on a subset (`FILE=\"src\u002Fauth\u002F**\"`) to keep cost contained.\n- **Use the resolution banner.** Every wrapped phase prints which model it actually picked and where the value came from. If a run feels off, that banner is the first place to look.\n\nThe right combination depends on your provider mix, your token budget, and your target. Experiment.\n\n## Safety considerations\n\n> ⚠️ **Disclaimer — read this before pointing CodeCome at code you did not write.**\n\nCodeCome operates by feeding target source code (under `src\u002F`) to an LLM agent\nthat has powerful tools at its disposal: it reads and writes files in the\nworkspace, executes commands in a sandbox, builds and runs the target, and can\nfetch resources from the network. Treating unknown source code as data is not\nsafe by default.\n\nThe risks worth knowing about:\n\n- **Prompt injection from the target.** Comments, docstrings, README files,\n  test fixtures, log strings, commit messages, filenames, and even crafted\n  binary blobs inside `src\u002F` can contain instructions aimed at the agent\n  (\"ignore previous instructions…\", \"exfiltrate $HOME\u002F.ssh\u002F…\", etc.). The\n  agent reads these as input, not as instructions, but LLMs are still\n  susceptible.\n- **Supply-chain hazards in the sandbox.** Phase 1b will try to build and run\n  the target. A malicious build script (`setup.py`, `package.json` lifecycle\n  hooks, `Makefile`, `Dockerfile`, `configure`, …) executes inside the\n  sandbox container with whatever permissions Docker gives it.\n- **Resource exhaustion and side effects.** Adversarial code may try to\n  consume CPU, disk, or network from the validation phase.\n- **Exfiltration via network.** If the sandbox or your host can reach the\n  internet, an injected agent or a malicious build step can attempt to send\n  data out.\n\n**Recommended precautions:**\n\n1. **Run the whole workspace inside an isolation boundary** when auditing\n   untrusted sources — a disposable VM (e.g. Multipass, Vagrant, UTM,\n   Proxmox), a dedicated container, or a remote throwaway host. Do not run\n   CodeCome on a machine that holds credentials, SSH keys, browser\n   profiles, or production access you cannot afford to lose.\n2. **Treat `src\u002F` as untrusted.** Do not run anything from `src\u002F` directly\n   on your host. CodeCome funnels execution through `sandbox\u002F`, but the\n   `make` runner itself, the agent, and any helper scripts still execute on\n   the host.\n3. **Restrict network egress** from the sandbox (and ideally from the\n   outer VM) to only what you need for builds and package installs.\n4. **Use a fresh API key with low spend limits** for the LLM provider so\n   prompt-injected runaway loops cannot rack up an unbounded bill.\n5. **Review what the agent writes** under `itemdb\u002F`, `sandbox\u002F`, and\n   `tmp\u002F` before trusting any of it. Findings, evidence, and reports are\n   all attacker-influenced when the target is untrusted.\n6. **Avoid `make exploit-all` \u002F `make validate-all` on untrusted targets**\n   until you have walked at least one finding through manually and\n   confirmed the sandbox behaves the way you expect.\n\nCodeCome's sandbox is a containment aid, not a security boundary against a\ndetermined attacker. If you would not be willing to run `docker build` and\n`.\u002Frun-tests.sh` from the target's repo on the host, you should not run\nCodeCome against it on the host either.\n\n## Project status\n\nThis is early-stage software. Honestly:\n\n**What works well today:**\n\n- Markdown findings with structured YAML frontmatter — stable format, no surprise schema changes.\n- File-based item database — no DB, no RAG, easy to grep, easy to commit.\n- Per-phase make targets with readiness gates.\n- Docker-based sandbox bootstrap for common stacks (Python, C\u002FC++, .NET, PHP, IaC, …).\n- Styled wrapper output with per-tool renderers.\n- Per-finding evidence directories and an exploit subdirectory layout for Phase 5.\n\n**What's still rough or missing:**\n\n- One agent at a time. No parallel validation, no parallel auditing.\n- One validation worker at a time. `make validate-all` is sequential.\n- Docker is the only first-class sandbox runtime today. Remote sandboxes and disposable VMs are future work.\n- Phase 2 and the deep sweep produce overlapping findings that Phase 3 has to clean up — this works, but it can be wasteful on tokens.\n- Provider coverage for the `--thinking` flag is hand-maintained.\n- No CI. Quality gate is `make tests` run locally.\n\nPatches, issues, and feedback all welcome.\n\n## Documentation\n\n| Doc | What's in it |\n|-----|--------------|\n| [`docs\u002Ftarget-setup.md`](docs\u002Ftarget-setup.md) | Supported target layouts: copied source trees, submodules, archives, benchmark corpora |\n| [`docs\u002Fworkflow.md`](docs\u002Fworkflow.md) | Full phase-by-phase workflow reference |\n| [`docs\u002Fsandbox.md`](docs\u002Fsandbox.md) | Sandbox usage, boundaries, evidence capture, validation environment notes |\n| [`docs\u002Ffile-risk-sweeps.md`](docs\u002Ffile-risk-sweeps.md) | File risk index format and deep sweep reference |\n| [`docs\u002Fdevelopment.md`](docs\u002Fdevelopment.md) | Repository conventions, helper tools, contributor workflow |\n\n## Contributing\n\nIssues, ideas, and pull requests are welcome — see [`CONTRIBUTING.md`](CONTRIBUTING.md). If something feels rough, that's probably because it is; please tell me about it.\n\n## Authors\n\n- **Pablo Ruiz García** — Project Lead  \n  Architecture, engineering, implementation, and the person who turns vague ideas into working code.\n\n- **Alejandro Ramos** — Product Lead  \n  Product direction, use cases, requirements, and official provider of impossible requests that somehow keep becoming roadmap items.\n\n## Contributors\n\nContributions are welcome. Pull requests are expected, encouraged, and appreciated.\n\n## License\n\nCodeCome is dual-licensed under your choice of:\n\n- GNU General Public License version 3 or later (`GPL-3.0-or-later`), or\n- GNU Affero General Public License version 3 or later (`AGPL-3.0-or-later`).\n\nSPDX expression: `GPL-3.0-or-later OR AGPL-3.0-or-later`.\n\nThe files under `templates\u002Fsandboxes\u002F` are an exception: they are licensed under the **MIT License** so they can be copied into user workspaces without imposing copyleft obligations on those user projects.\n\nSee `LICENSE`, `AGPL-LICENSE`, `templates\u002Fsandboxes\u002FLICENSE`, and `NOTICE`. Contributions are accepted under the terms described in `CONTRIBUTING.md`.\n\nCopyright (C) 2025-2026 Pablo Ruiz García \u003Cpablo.ruiz@gmail.com>.\n","CodeCome 是一个用于漏洞研究的代理工具，旨在帮助用户在家中构建自己的漏洞研究体系。该项目采用 Python 语言编写，支持 Python 3.10 及以上版本，其核心功能是将模糊的安全假设转化为结构化的 Markdown 格式发现，并通过沙箱环境验证这些发现，进而生成可执行的漏洞利用证明和报告。CodeCome 的设计强调透明性和可审计性，整个审计过程以纯文本格式（Markdown 和 YAML）保存，不依赖数据库或其他复杂系统。这种工具特别适合需要进行源代码安全审计的研究人员或团队使用，它提供了一种可重复、可验证的研究方法论。","2026-06-11 04:02:47","CREATED_QUERY"]