[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81835":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":13,"stars30d":13,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":27,"discoverSource":28},81835,"patchwatch","originsec\u002Fpatchwatch","originsec","A local tool for ingesting Windows Patch Tuesday CVEs, diffing patched binaries with Ghidriff and surfacing LLM-generated security analysis through a browser UI","",null,"Rust",38,10,28,1,0,2,6,3.12,"Apache License 2.0",false,"main",[],"2026-06-12 02:04:20","# PatchWatch\n\nA local tool for ingesting Windows Patch Tuesday CVEs, diffing patched binaries with [Ghidra](https:\u002F\u002Fghidra-sre.org\u002F) (via [ghidriff](https:\u002F\u002Fgithub.com\u002Fclearbluejar\u002Fghidriff)), and surfacing LLM-generated security analysis through a browser UI.\n\nPatchWatch wires together a few public data sources and an LLM into a single workflow:\n\n- **[Microsoft Security Update Guide](https:\u002F\u002Fmsrc.microsoft.com\u002Fupdate-guide\u002F)** (SUG) — CVE metadata, affected products, CVSS, exploited status.\n- **Microsoft Support pages + Update Catalog** — enumerate which files a given KB ships.\n- **[Winbindex](https:\u002F\u002Fwinbindex.m417z.com\u002F)** — locate pre-patch and post-patch binary versions.\n- **[ghidriff](https:\u002F\u002Fgithub.com\u002Fclearbluejar\u002Fghidriff)** (Ghidra under the hood) — produce structured function-level diffs.\n- **Anthropic API** — triage, synthesize, and deep-analyze the diff in three LLM stages.\n\nIt is designed to run **locally** against your own data store. Nothing is uploaded except prompts sent to the LLM provider you configure.\n\n## Prerequisites\n\nYou'll need three things installed:\n\n1. **Rust toolchain** (stable, 2024 edition).\n   - Install via [rustup.rs](https:\u002F\u002Frustup.rs\u002F).\n   - On Windows you also need the **MSVC build tools \u002F Windows SDK** so `rustc` can link. The simplest option is to install Visual Studio 2022 Community with the *Desktop development with C++* workload, which pulls in the Windows 11 SDK and `link.exe`. The [`rustup-init` installer](https:\u002F\u002Frustup.rs\u002F) will prompt you for this on first run if it's missing.\n\n2. **Docker Desktop** with the **WSL 2 backend** enabled.\n   - Used to run ghidriff in a container so you don't have to install Ghidra\u002FJava\u002FPython locally.\n   - Make sure `docker run hello-world` succeeds from PowerShell before continuing.\n   - If you'd rather run ghidriff natively, see the *Local ghidriff* section below.\n\n3. **ghidriff** — a Ghidra-based binary differ.\n   - Project page and setup instructions: \u003Chttps:\u002F\u002Fgithub.com\u002Fclearbluejar\u002Fghidriff>\n   - For the Docker workflow, see the *Build the ghidriff image* step in the quickstart.\n\n4. **An Anthropic API key.** PatchWatch is currently hardcoded to the Anthropic Messages API. Set the model in `config.yaml`.\n\n## Quickstart\n\n```powershell\n# 1. Clone and enter the project\ngit clone https:\u002F\u002Fgithub.com\u002Foriginsec\u002Fpatchwatch.git\ncd patchwatch\n\n# 2. Build the local ghidriff Docker image.\n#    The upstream :latest image ships Ghidra 11.3.1 but pyghidra 3.x requires\n#    Ghidra 12.0+. The Dockerfile here pins pyghidra to the last 11.3-compatible\n#    release. See https:\u002F\u002Fgithub.com\u002Fclearbluejar\u002Fghidriff\u002Fissues\u002F134\ndocker build -f Dockerfile.ghidriff -t ghidriff-fixed:latest .\n\n# 3. Copy and edit the env file. Set your Anthropic API key and a random CSRF\n#    secret (any reasonably long random string).\ncopy .env.example .env\nnotepad .env\n\n# 4. (Optional) Copy and edit the config file. Defaults are fine for a first run.\ncopy crates\\patchwatch\\config.example.yaml crates\\patchwatch\\config.yaml\n\n# 5. Build\ncargo build --release\n\n# 6. Ingest the most recent Patch Tuesday release. No LLM calls happen here\n#    unless a CVE has CVSS >= 9.0 or is marked exploited.\n.\\target\\release\\patchwatch.exe --config crates\\patchwatch\\config.yaml poll --n 1\n\n# 7. Start the local web UI at http:\u002F\u002F127.0.0.1:8765\n.\\target\\release\\patchwatch.exe --config crates\\patchwatch\\config.yaml web\n```\n\n`cargo run` works too if you'd rather not build a release binary up front:\n\n```powershell\ncargo run --release -- --config crates\\patchwatch\\config.yaml poll --n 1\ncargo run --release -- --config crates\\patchwatch\\config.yaml web\n```\n\nFrom the web UI, click any ingested CVE and hit **Analyze** to run the full diff + LLM pipeline. Results render inline as soon as each stage completes.\n\nYou can also run the pipeline from the CLI:\n\n```powershell\n# Ingest a specific release instead of the most recent\npatchwatch poll --release 2025-Apr\n\n# Run the full analysis pipeline on an already-ingested CVE\npatchwatch analyze CVE-2025-26633\n\n# Force a specific binary instead of using triage rankings\npatchwatch analyze CVE-2025-26633 --binary mscms.dll\n```\n\n## Configuration\n\n`crates\u002Fpatchwatch\u002Fconfig.example.yaml` is the canonical example. The fields most worth knowing about:\n\n```yaml\nllm:\n  model_primary: \"claude-sonnet-4-6\"\n  model_fallback: \"claude-haiku-4-5-20251001\"\n  api_key_env: \"ANTHROPIC_API_KEY\"   # env var name that holds the key\n  triage_top_n: 5\n  max_diff_candidates: 5\n\ndiff_engine:\n  mode: docker\n  image: \"ghidriff-fixed:latest\"\n  volume_root: \"~\u002Fpatchwatch\u002Fghidriff\"\n\nstorage:\n  base_dir: \"~\u002Fpatchwatch\"          # SQLite DB, binary cache, reports\n\nweb:\n  bind_addr: \"127.0.0.1:8765\"\n  csrf_secret_env: \"PATCHWATCH_CSRF_SECRET\"\n  allow_non_loopback: false         # set true only behind a reverse proxy\n```\n\nTildes (`~`) in paths are expanded to `%USERPROFILE%` on Windows.\n\n### Local ghidriff (no Docker)\n\nIf you'd rather install ghidriff and Ghidra locally, swap the `diff_engine` block for:\n\n```yaml\ndiff_engine:\n  mode: local\n  ghidriff_bin: \"ghidriff\"                    # or absolute path\n  ghidra_install_dir: \"C:\u002Fpath\u002Fto\u002Fghidra\"     # used as GHIDRA_INSTALL_DIR\n  output_dir: \"~\u002Fpatchwatch\u002Fghidriff\"\n```\n\nFollow the ghidriff [installation instructions](https:\u002F\u002Fgithub.com\u002Fclearbluejar\u002Fghidriff#installation) to get `ghidriff` on `PATH` and a working Ghidra install.\n\n## Data Flow\n\nSee [docs\u002Fdataflow.md](docs\u002Fdataflow.md) for the full Mermaid diagram.\n\n## How It Works\n\n### KB Enumeration\n\nBefore any LLM work, PatchWatch enumerates which files the patch touches. Two tiers are tried in order:\n\n**Tier 1 — Support page CSV** (`support.microsoft.com\u002Fhelp\u002F\u003CKB>`)\n\nThe KB article page is fetched and scraped for a \"file information\" download link (anchor text must contain `\"file information\"`, excluding SSU and hash links). The link is a `go.microsoft.com\u002Ffwlink\u002F` redirector that resolves to a CSV on `download.microsoft.com`. The CSV is a multi-section file: each section is preceded by a banner row encoding the architecture (`x64-based`, `arm64-based`, `x86-based`), followed by a header row and data rows. Each data row becomes a `KbFile { filename, version, arch, file_size, date_stamp }`.\n\n**Tier 2 — Update Catalog MSU** (fallback when no CSV link exists)\n\nThe Microsoft Update Catalog is searched for the KB number. The x64 result is selected, its `.msu` URL is resolved via `DownloadDialog.aspx`, and the MSU is downloaded and expanded in two passes with `expand.exe` (MSU -> CAB -> extracted files). `.manifest` XML files inside the CAB are parsed for `\u003CassemblyIdentity>` (version + arch) and `\u003Cfile name>` entries. Only the x64 MSU is fetched, so arm64 entries are absent. Results are deduplicated by `(filename, arch, version)`.\n\nThe file list is stored in the DB after first enumeration and reused on subsequent ingests of the same KB.\n\n### LLM Analysis Pipeline\n\n#### Stage 1 — Triage\n\n**Trigger:** Poll ingest, gated on `CVSS base_score >= 9.0 OR exploited == \"yes\"`. Below that threshold the KB file list is still stored but no LLM call is made. When `analyze` is run directly on a CVE with no existing triage in the DB, triage runs on-demand regardless of score.\n\n**Input:** CVE title, description, CWE + the full list of changed binaries from KB enumeration (filename, architectures, version).\n\n**Output:** `Vec\u003CRanking>` — every file in the patch ranked by probability of containing the CVE fix, with a confidence score (0–1) and reasoning string. Stored in DB. Used to prioritize which binaries get downloaded and diffed; candidates are sorted descending by confidence and capped at `llm.max_diff_candidates`.\n\nTriage is idempotent: if the CVE's SUG revision number hasn't changed since the last ingest, existing rankings are reused.\n\n#### Interlude — Winbindex + ghidriff\n\nFor each top-ranked binary, the orchestrator fetches Winbindex to locate pre-patch and post-patch versions matching the KB, downloads both, and runs **ghidriff**. The resulting JSON is parsed into two representations:\n\n- **`DiffSummary`** — compact, name-only view: lists of added\u002Fdeleted\u002Fmodified function names, per-function similarity ratios (`0.0` = completely rewritten, `1.0` = identical), whether changes are code-level vs. address-relocation-only, and added\u002Fdeleted strings. Passed to Stage 2.\n- **`DiffIndex`** — full code view: pre-patch and post-patch decompiled C code for every modified function. Passed to Stage 3.\n\n#### Stage 2 — Synthesis\n\n**Trigger:** User-initiated analyze job, after ghidriff completes.\n\n**Input:** CVE metadata + all diffed binaries, each with its Stage 1 confidence\u002Freasoning and its `DiffSummary` (function names, ratios, change types). No decompiled code.\n\n**Output:** `SynthesisResult`\n- `per_binary` — per-binary security relevance assessment with confidence and reasoning\n- `primary_binaries` — the subset of binaries that contain security-relevant changes; these proceed to Stage 3\n- `ranked_functions` — up to 50 functions most likely to contain the fix (code-changed only, ordered by score), tagged with the binary they belong to\n- `overall_summary` — consolidated narrative of what the patch does\n\n#### Stage 3 — Deep Analysis\n\n**Trigger:** Runs for each binary in `primary_binaries` from Stage 2.\n\n**Function selection:** Takes the top-N functions from Stage 2's `ranked_functions` for this binary. Remaining slots are filled with any code-changed functions not already selected, sorted ascending by ratio (most heavily modified first). Functions with only address\u002Frefcount changes are excluded from the fallback pool.\n\n**Input:** CVE metadata + before\u002Fafter decompiled C code for each selected function (from `DiffIndex`).\n\n**Output:** `DeepAnalysisResult`\n- `findings` — one `FunctionFinding` per function: relevance score, explanation of what changed and why it relates to the CVE, key changed lines as `old_snippet` \u002F `new_snippet`\n- `patch_summary` — consolidated description of what this binary's patch does in the context of the CVE\n\nResults are stored in the DB and rendered in the web UI report view. The orchestrator also writes `report.md` and `report.json` to `\u003Cstorage_dir>\u002Freports\u002F\u003Ccve_id>\u002F`.\n\n### LLM Calls Summary\n\n| Stage | Trigger | Input | Output |\n|---|---|---|---|\n| **Triage** | Poll ingest (score >= 9 or exploited), or on-demand via analyze | CVE description + all KB file names with arch\u002Fversion | `Vec\u003CRanking>`: confidence + reasoning per file |\n| **Synthesis** | User-triggered analyze, after ghidriff | CVE + diff summaries (function names, ratios, change types) for all diffed binaries | Primary binaries, ranked functions (top 50, code-changed only), overall summary |\n| **Deep analysis** | After synthesis, per primary binary | CVE + full decompiled before\u002Fafter code for selected functions | Per-function: relevance score, change explanation, old\u002Fnew snippets; patch summary |\n\n## Architecture Notes\n\n- **Idempotent poll**: KB file enumeration is cached in the DB. CVEs are skipped at triage if the SUG revision number hasn't changed.\n- **Serial analysis jobs**: `AnalyzeService` processes one job at a time via a channel. Ghidra analysis is CPU-heavy, so parallelism isn't a win.\n- **Binary download cache**: Winbindex downloads are stored on disk by SHA256 hash. Re-running analyze on the same CVE skips downloads.\n- **CSRF**: Double-submit cookie (HMAC-SHA256). Set `PATCHWATCH_CSRF_SECRET` before `patchwatch web`.\n- **Non-loopback bind**: Blocked by default. Set `web.allow_non_loopback: true` in config to expose on LAN (do this only behind a reverse proxy that adds auth).\n\n## Troubleshooting \u002F Setup Verification\n\nWhen the full pipeline misbehaves, `patchwatch validate` exposes each external\ndependency as a standalone smoke test so you can isolate which stage is broken\nwithout polluting the real DB:\n\n| Subcommand | What it verifies |\n|---|---|\n| `validate sug` | SUG API is reachable. Lists 2026 releases. |\n| `validate kb-csv \u003CKB>` | Tier 1 KB enumeration: support page scrape + CSV download + parser. Example: `patchwatch validate kb-csv KB5036893`. |\n| `validate kb-msu \u003CKB> --cache-dir \u003Cdir>` | Tier 2 KB enumeration: Update Catalog scrape + MSU download + `expand.exe` extraction + manifest parser. Needs `expand.exe` on `PATH` (it ships with Windows). |\n| `validate winbindex \u003Cfilename> \u003CKB>` | Winbindex query + patched\u002Fprevious pair selection + binary download to the on-disk cache. Example: `patchwatch validate winbindex mscms.dll KB5036893`. |\n| `validate ghidra \u003Cbinary>` | Runs ghidriff on the binary against itself. Verifies the Docker image (or local install) is wired up correctly and exits 0. |\n| `validate dry-run \u003CCVE>` | Full ingest + analyze end-to-end against an **in-memory** SQLite DB. Useful for exercising the complete pipeline without touching `patchwatch.db`. |\n\nAll `validate` subcommands honor the same `--config` flag as the top-level CLI.\n\n## Security and Privacy Notes\n\n- PatchWatch sends CVE descriptions, file names, and (in Stage 3) decompiled function bodies to whatever LLM endpoint is configured via `api_key_env`. All decompiled code originates from Microsoft-shipped Windows binaries that are already publicly downloadable, but be aware of where the prompts are going.\n- The Anthropic API key and CSRF secret are loaded from environment variables (or a local `.env` file, which is gitignored). Never commit either.\n- The web UI binds to loopback by default and uses CSRF double-submit cookies. It has no authentication — don't expose it to a network you don't control.\n\n## Contributing\n\nIssues and PRs welcome. This is a research tool, not a product — expect rough edges and breaking changes between versions.\n\n---\n\n## License\n\nApache 2.0 — see [LICENSE](.\u002FLICENSE) and [NOTICE](.\u002FNOTICE)\n\nBuilt by [Origin](https:\u002F\u002Foriginhq.com) for security research and red team operations.\n","PatchWatch 是一个本地工具，用于分析 Windows 每月安全更新（Patch Tuesday）中的 CVE 漏洞，通过 Ghidra 对修补前后的二进制文件进行差异比较，并通过浏览器界面提供基于大语言模型的安全分析。其核心功能包括从 Microsoft 安全更新指南等公开数据源获取漏洞信息，使用 Ghidriff 生成结构化的函数级差异，并结合 Anthropic API 进行多层次的安全分析。该工具适合需要对 Windows 系统补丁进行深入安全评估的场景，如企业内部安全审计或研究机构。PatchWatch 设计为完全本地运行，确保敏感数据不会外泄，仅将必要的提示发送给配置的语言模型提供商。","2026-06-11 04:06:54","CREATED_QUERY"]