[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82032":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},82032,"vault","vaultmcp\u002Fvault","vaultmcp","MCP prompt-injection scanning proxy — runtime security for MCP tool responses",null,"Solidity",605,277,5,0,340,11.33,"MIT License",false,"main",[],"2026-06-12 02:04:22","# Vault — MCP Prompt Injection Firewall\n\n**Site:** [vaultmcp.io](https:\u002F\u002Fvaultmcp.io)  **·**  **X:** [@vaultmcp](https:\u002F\u002Fx.com\u002Fvaultmcp)  **·**  **Repo:** [github.com\u002Fvaultmcp\u002Fvault](https:\u002F\u002Fgithub.com\u002Fvaultmcp\u002Fvault)\n\nVault is a production prompt-injection firewall for MCP. It intercepts every tool response before your agent reads it and scans through three layers of detection.\n\n---\n\n## Requirements\n\nVault requires an LLM for Layer 3 in production. Three options:\n\n- **Anthropic** (`claude-haiku-4-5-20251001`, recommended) — set `ANTHROPIC_API_KEY`\n- **OpenAI-compatible** (`gpt-4o-mini`, or self-hosted via vLLM\u002Fllama.cpp) — set `OPENAI_API_KEY`\n- **Ollama** (local, air-gapped) — set `OLLAMA_HOST=http:\u002F\u002Flocalhost:11434`\n\nWithout any of the above, Vault runs in offline mode (L1+L2 only). Offline mode has documented limitations — see [LIMITATIONS §11](packages\u002FLIMITATIONS.md).\n\n### Offline operation with Ollama\n\nOllama lets you run L3 locally with no cloud dependency — useful for air-gapped environments, development without an API key, and cost-sensitive pipelines.\n\n```bash\n# 1. Install Ollama and pull a model (one-time)\ncurl -fsSL https:\u002F\u002Follama.ai\u002Finstall.sh | sh\nollama pull llama3.2:3b\n\n# 2. Run Vault with Ollama as the L3 backend\nexport OLLAMA_HOST=http:\u002F\u002Flocalhost:11434\nnpx @aimcpvault\u002Fmcp-proxy -- npx -y @modelcontextprotocol\u002Fserver-filesystem \u002Fdata\n```\n\nTo use a different model or a remote Ollama instance:\n\n```bash\nexport VAULT_LAYER3_PROVIDER=ollama\nexport VAULT_LAYER3_MODEL=mistral:7b\nexport VAULT_LAYER3_BASE_URL=http:\u002F\u002Fgpu-box:11434\n```\n\n**Caveats:**\n- Measured TPR numbers in this README are for Anthropic Haiku. Local 3B models vary; run the eval harness before relying on them in production. Anecdotally, `llama3.2:3b` has missed subtle role-hijack and multi-turn-setup attacks in smoke tests where Haiku catches them.\n- Reachability is verified on the first scan, not at startup. If Ollama isn't running, Vault logs `vault: layer-3 failed (...)` per request and falls back to L1+L2.\n- **Latency:** the default L3 timeout is 15s, sized to absorb the cold-start latency of local 3B models on CPU. If you're running on slower hardware or larger models, increase it further via `VAULT_LAYER3_TIMEOUT_MS=30000`. The first scan after Ollama loads the model can take much longer than steady-state — subsequent calls typically complete in 1–2s.\n- **Security:** the default URL is `http:\u002F\u002Flocalhost:11434`. If you set `VAULT_LAYER3_BASE_URL=http:\u002F\u002F0.0.0.0:11434` or point to a remote host, tool response content (which may include sensitive data) is sent over the network to that host. Only point to a trusted, local-network Ollama instance.\n\n---\n\n## On-chain reputation inspector\n\n`vault inspect` reads your Claude Desktop MCP config and reports the on-chain reputation Vault has accumulated for each server you have configured.\n\n```bash\nnpx @aimcpvault\u002Fmcp-proxy inspect\n# vault inspect (3 MCP servers)\n#   config:   \u002FUsers\u002Fyou\u002FLibrary\u002FApplication Support\u002FClaude\u002Fclaude_desktop_config.json\n#   network:  Sepolia (testnet)\n#   contract: 0x3A977E4D8BA43367cc41BB4695feFF4615fec189\n#\n#   TRUSTED     filesystem [stdio:npx:@modelcontextprotocol\u002Fserver-filesystem]\n#               score=0.980 scans=412 blocks=2 maliciousRate=0.5%\n#   NEW         git        [stdio:uvx:mcp-server-git]\n#               score=1.000 scans=0 blocks=0 maliciousRate=0.0%\n#   UNTRUSTED   sketchy    [stdio:npx:some-untrusted-pkg]\n#               score=0.400 scans=83 blocks=14 maliciousRate=16.9%\n```\n\n**Trust thresholds (transparent, no magic numbers):**\n- `TRUSTED`   score ≥ 0.95 AND totalScans ≥ 100\n- `UNTRUSTED` maliciousRate ≥ 0.10\n- `CAUTION`   totalScans ≥ 10 AND maliciousRate ≥ 0.01\n- `NEW`       totalScans \u003C 10\n\n**Flags:**\n- `--config \u003Cpath>` override the Claude Desktop config path\n- `--rpc \u003Curl>` and `--contract \u003Caddr>` point at a different network\u002Fdeployment\n- `--json` one JSON record per server (suitable for CI \u002F scripts)\n- `--strict` exit code 1 if any server is UNTRUSTED — useful in CI checks\n\nThe reputation contract today is on **Base Sepolia (testnet)**. Mainnet deployment is pending; the inspector will continue to default to Sepolia until mainnet is live, with a yellow warning printed on each run.\n\n---\n\n## Local scan history (opt-in)\n\nVault can persist every scan to a local SQLite database (`~\u002F.vault\u002Fscans.db`) so you can review what was blocked, search by tool\u002Fserver\u002Fverdict, and view a local dashboard.\n\n**Off by default.** Enable with `VAULT_PERSIST=1`:\n\n```bash\nVAULT_PERSIST=1 npx @aimcpvault\u002Fmcp-proxy -- npx -y @modelcontextprotocol\u002Fserver-filesystem \u002Fdata\n# vault: persisting scan history to \u002FUsers\u002Fyou\u002F.vault\u002Fscans.db (set VAULT_PERSIST=0 to disable)\n```\n\n**Query from the CLI:**\n\n```bash\nnpx @aimcpvault\u002Fmcp-proxy history                          # last 50\nnpx @aimcpvault\u002Fmcp-proxy history --verdict malicious      # only blocks\nnpx @aimcpvault\u002Fmcp-proxy history --since 7d --json        # last week, JSON\nnpx @aimcpvault\u002Fmcp-proxy history --server stdio:npx:@modelcontextprotocol\u002Fserver-filesystem\n```\n\n**Browse the dashboard:**\n\n```bash\nnpx @aimcpvault\u002Fmcp-proxy dashboard\n# vault dashboard: http:\u002F\u002F127.0.0.1:9876\n```\n\nDark-themed single-page dashboard with verdict cards, per-day stacked bars (30 days), top tools, and a recent-scans table. Auto-refreshes every 5s (configurable via `--refresh \u003Csec>`).\n\n**Privacy guarantees:**\n- Off by default — no data is written unless `VAULT_PERSIST=1` is set.\n- All data stays local. The dashboard and history reader both serve `~\u002F.vault\u002Fscans.db` directly; nothing is uploaded.\n- Content previews are passed through a regex redactor before storage (API keys, GitHub\u002FAWS tokens, JWTs, emails, SSN-shaped strings, credit-card-shaped numbers, generic `password:` \u002F `token:` headers). This is a best-effort filter; if you need stronger guarantees, leave persistence off.\n- 30-day rolling retention. Override with `VAULT_RETENTION_DAYS=\u003Cn>` (set to `0` to disable purging).\n- Override the DB path with `VAULT_PERSIST_PATH=\u002Fpath\u002Fto\u002Fscans.db`.\n\n---\n\n```bash\n# Wrap any MCP server — zero config change to your agent\nnpx @aimcpvault\u002Fmcp-proxy -- npx -y @modelcontextprotocol\u002Fserver-filesystem \u002Fpath\u002Fto\u002Fdata\n```\n\n---\n\n## Measured performance\n\nAll numbers below are reproducible from the harness committed in this repo. Each is footnoted with the commit and dataset it was measured on.\n\n| metric | value | source |\n|---|---|---|\n| TPR — v2 holdout (L3 enabled, 80 attacks) | **100%** (80 \u002F 80) · 95.5%+ lower bound at 95% CI | [^holdout-l3] |\n| TPR — L1+L2 only, no API key | materially lower (L3 is required for production) | [^holdout-degraded] |\n| False positive rate (benign flagged) | **0.0%** (0 \u002F 100) | [^holdout-l3] |\n| L1 latency (p50 \u002F p99) | 0.03 ms \u002F 0.53 ms | [^holdout-l3] |\n| L2 latency (p50 \u002F p99) | 11.05 ms \u002F 69.75 ms | [^holdout-l3] |\n| L3 latency (p50 \u002F p99) | 1541 ms \u002F 3499 ms | [^holdout-l3] |\n| Sustained throughput (single proxy) | 100 req\u002Fs, 0 errors over 30,000 requests | [^load] |\n| Steady-state memory (RSS) | 135–180 MB after warmup, no leak observed | [^load] |\n| 500 KB response scan time | ~25 s (embedder-bound, see LIMITATIONS §14) | [^edge] |\n\n[^holdout-l3]: `packages\u002Feval\u002Fresults\u002Feval-clean-baseline-v2-L3-enabled-2026-05-21.md` — one-shot eval against v2 holdout constructed after detection code was frozen at commit `8d230e4`. Dataset: `packages\u002Feval\u002Fdatasets\u002Fholdout-v2-novel\u002F` (50 attacks) + `packages\u002Feval\u002Fdatasets\u002Fholdout-v2-paraphrase\u002F` (30 attacks) + `packages\u002Feval\u002Fdatasets\u002Fbenign-v2\u002F` (100 entries). L3 enabled (Anthropic Haiku 4.5). No tuning occurred between holdout construction and eval run.\n[^holdout-degraded]: Without `ANTHROPIC_API_KEY`, Vault runs L1+L2 only. On the v2 novel holdout (attacks designed to sit outside L2's detection range) TPR is near 0%; on paraphrase attacks it is ~7%. L3 is the detection backbone for out-of-distribution attacks. See `packages\u002FLIMITATIONS.md` §11 for offline-mode limitations.\n[^load]: `packages\u002Feval\u002Fload\u002Freport.md` — 100 req\u002Fs × 300 s × single stdio proxy instance × ~200-byte stub-MCP responses, L1+L2 only. Past 100 req\u002Fs the cliff is not yet measured.\n[^edge]: `packages\u002Fproxy\u002Ftest\u002Fedge-cases.test.ts` scenario 4. The latency is bounded by the L2 embedder iterating ~140 streaming chunks; L1 stays sub-millisecond at any size. See LIMITATIONS §14.\n\n---\n\n## What Vault catches\n\nWith L3 enabled (Anthropic API key set), Vault catches 100% of attacks in our 80-entry public structural-generalization eval, at 0% false positive rate on 100 benign documents. Detection methodology and datasets are public — see `\u002Fpackages\u002Feval\u002F`.\n\nBreakdown by category (80 attacks, holdout-v2 structural-generalization eval, 2026-05-21):\n\n| Attack category | Caught \u002F Total | TPR |\n|---|---|---|\n| exfiltration | 60 \u002F 60 | 100.0% |\n| instruction_override | 13 \u002F 13 | 100.0% |\n| multi_turn_setup | 6 \u002F 6 | 100.0% |\n| encoded_payload | 1 \u002F 1 | 100.0% |\n\nNumbers are from `packages\u002Feval\u002Fresults\u002Feval-clean-baseline-v2-L3-enabled-2026-05-21.md` with L3 enabled (Anthropic Haiku 4.5). Dataset: `packages\u002Feval\u002Fdatasets\u002Fholdout-v2-novel\u002F` (50 attacks) and `packages\u002Feval\u002Fdatasets\u002Fholdout-v2-paraphrase\u002F` (30 attacks), verified non-overlapping with the detection corpus. See [Measured performance](#measured-performance) for latency and throughput context.\n\n---\n\n## What Vault does NOT catch\n\n- **User-initiated jailbreaks** — out of scope by design. Vault sits between the agent and the upstream MCP server, not between the user and the agent. Jailbreaks typed directly by the user are the model provider's responsibility, not a proxy's.\n- **Genuinely novel injection patterns** — attacks that paraphrase far from all known examples (L2 cosine distance > 0.50 and no L3 key set) evade detection. The corpus covers published 2022–2024 attack literature; novel post-cutoff techniques may not be represented. See [`packages\u002FLIMITATIONS.md`](packages\u002FLIMITATIONS.md) §3.\n- **Protocol-encoded data without L3** — MCP tools that return Pub\u002FSub messages, SQS payloads, binary blobs, or encrypted content will produce false positives when L3 is disabled (~40% FP rate on that traffic class). See LIMITATIONS §11.\n- **Multi-turn attacks split across sessions** — Vault scans each MCP response independently. A sleeper directive established in session A and activated in session B is outside Vault's detection window. See LIMITATIONS §10.\n- **Image\u002Faudio embedded instructions** — Vault is text-only. Binary content in tool responses (images, audio files, PDFs) is forwarded to the agent unscanned. If your MCP server returns image data or vision-model inputs, Vault provides no protection for instructions hidden in that content.\n\nFor the complete gap taxonomy: [`packages\u002FLIMITATIONS.md`](packages\u002FLIMITATIONS.md) — 15 sections covering multilingual, semantic, structural, normalization, self-referential, protocol-encoding, and scale gaps. [`packages\u002FSECURITY_MODEL.md`](packages\u002FSECURITY_MODEL.md) maps each gap to the threat it leaves open.\n\n---\n\n## Offline mode (no API key)\n\nWhen `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` are both unset, Vault runs in offline mode:\n\n- Layer 0 (decoder) and Layer 1 (heuristics) operate normally\n- Layer 2 (embedding similarity) operates normally\n- Layer 3 (LLM judge) is disabled\n\n**Suitable for:**\n- Development environments and local testing\n- CI\u002FCD pipelines where the agent isn't running production traffic\n- Air-gapped or offline deployments\n\n**NOT suitable for:**\n- Production MCP servers returning Pub\u002FSub, SQS, or webhook payloads\n- Tools that return binary content (images, encrypted blobs)\n- Any deployment where false positive rate above 5% is unacceptable\n\nSee [LIMITATIONS §11](packages\u002FLIMITATIONS.md) for specific FP measurements and rationale.\n\n---\n\n## Cost transparency\n\nLayer 3 (LLM judge) calls Anthropic Haiku 4.5 on responses that fall in L2's uncertain zone. Estimates:\n\n| scenario | L3 call rate | estimated cost per request |\n|---|---|---|\n| Eval holdout (adversarial dataset) | ~91% | ~$0.0005 |\n| Real MCP traffic (expected) | ~20–40% | ~$0.0001–0.0002 |\n| 100 benign req\u002Fhr (steady state) | 20–40% | ~$0.002–0.004\u002Fhr total |\n\n- **~$0.0005 per uncertain-zone request** (Haiku 4.5 input + small output at current pricing)\n- **~$0.04\u002Fhr at 100 req\u002Fhr in production** (20–40% L3 call rate on real traffic)\n- **$0 without API key** — L1+L2 only, TPR drops to near 0% on novel out-of-distribution attacks and FP rate on protocol-encoded traffic rises to ~40%. See [Offline mode](#offline-mode-no-api-key) above. The eval harness refuses to run in this mode unless `--allow-degraded` is passed explicitly.\n\nThe high L3 rate in our eval (91%) reflects an adversarial dataset — nearly every entry is an attack that lands in L2's uncertain zone by design. Real MCP traffic (mostly clean tool output) has a much lower L3 call rate.\n\nSet `ANTHROPIC_API_KEY` before running. Without it, Vault emits a `WARNING: Layer 3 unavailable — degraded mode` message on startup and falls back to L1+L2 only.\n\n---\n\n## Why\n\nEvery MCP tool response is a direct write into your agent's context window. A malicious file, a compromised API, or a poisoned search result can redirect your agent mid-task — exfiltrating secrets, overwriting files, or pivoting to other tools. MCP has no transport-level protection against this class. Vault adds a layered detection pass; whether that pass is sufficient for your threat model depends on the gaps documented in [`packages\u002FLIMITATIONS.md`](packages\u002FLIMITATIONS.md).\n\n## How it works\n\nVault sits between your agent host and the upstream MCP server, intercepting every `tools\u002Fcall` response and running it through a layered detection pipeline:\n\n```\nAgent ──► Vault Proxy ──► MCP Server\n              │\n              ▼\n         ┌─────────────────────────────────────────────┐\n         │ L1 Regex (\u003C1ms)      heuristics + unicode    │\n         │ L2 Embeddings (~8ms) bge-small cosine sim    │\n         │ L3 LLM judge (~1s)   haiku-4.5 disambiguation│\n         └─────────────────────────────────────────────┘\n              │\n              ▼  clean → forward   malicious → block\u002Fwarn\n```\n\n**Layer 1 — Heuristics** (p50 0.02 ms, p99 0.53 ms[^holdout-l3]): Regex patterns for English instruction overrides, unicode tag smuggling (U+E0000–U+E007F), bidi-control characters (U+202A–U+202E, U+2066–U+2069), zero-width character density, HTML comment injection, long HTML-entity runs, and markdown link anchors. High-confidence matches short-circuit — no L2\u002FL3 cost. L1 is English-only; see LIMITATIONS §1.\n\n**Layer 2 — Embeddings** (p50 8.29 ms, p99 53.06 ms[^holdout-l3]): Cosine similarity against a curated corpus of 31 attack categories using `bge-small-en-v1.5` (runs entirely on-device, no network call, ~30 MB WASM). Matches within distance 0.35 block; borderline cases escalate to L3. The corpus is intentionally public; adversaries who paraphrase past distance 0.35 evade L2 — that class is for L3.\n\n**Layer 3 — LLM Judge** (~1 s when invoked): Claude Haiku 4.5 (or GPT-4o-mini with OpenAI key) resolves ambiguous cases. Only runs when L2 is uncertain — typically \u003C5% of requests. Requires `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`. **Without a key, L3 is disabled and TPR drops to the L1+L2 number reported above.**\n\n### Capability Firewall\n\nBeyond injection detection, Vault tracks taint: tool responses that entered the agent's context are marked, and if those tokens later appear in calls to sensitive tools (network requests, file writes, shell execution), the call is gated.\n\n```bash\nVAULT_CAPABILITY=1  # enable taint tracking + gate\nVAULT_CAPABILITY_MODE=block  # or: warn\n```\n\n### Manifest Verification\n\nVault fingerprints the tool manifest of every MCP server on first connect and alerts on any subsequent drift — new tools, changed schemas, version bumps. Supply-chain protection against a compromised server silently adding a `delete_all` tool.\n\n```bash\nVAULT_MANIFEST_CHECK=on   # default — warn on drift\nVAULT_MANIFEST_CHECK=strict  # treat drift as error\n```\n\n### Optional: on-chain attestation (opt-in)\n\nWhen an operator opts in (`VAULT_ATTEST=1` + a funded hot wallet), scan verdicts can be attested on-chain via [EAS](https:\u002F\u002Fattest.sh) on Base. Each MCP server then accumulates a public, append-only reputation score any agent can query before connecting. Off by default — no chain dependency for using the proxy.\n\n> A continuous attestation feed and public reputation registry are planned for **v0.3**. The proxy supports opt-in attestation today for operators who want it; the always-on hosted attester ships alongside v0.3 once we have real install traffic to back the numbers.\n\n---\n\n## Install\n\n```bash\n# npm \u002F npx (no install required)\nnpx @aimcpvault\u002Fmcp-proxy -- \u003Cyour MCP server command>\n\n# or install globally\nnpm install -g @aimcpvault\u002Fmcp-proxy\nmcp-proxy -- npx -y @modelcontextprotocol\u002Fserver-filesystem \u002Fdata\n```\n\n### Claude Code \u002F Claude Desktop integration\n\nThe fastest way is `vault init` — it auto-detects your existing MCP configs and wraps each server with the proxy:\n\n```bash\nnpx @aimcpvault\u002Fmcp-proxy && vault-init\n```\n\n`vault init` shows a diff preview before writing, backs up the original config to `\u003Cconfig>.vault-backup`, and is idempotent (re-running skips already-wrapped servers). To revert: `vault-init unwrap`.\n\nOr wrap by hand:\n\n```json\n\u002F\u002F ~\u002F.claude\u002Fmcp_settings.json (or claude_desktop_config.json)\n{\n  \"mcpServers\": {\n    \"filesystem\": {\n      \"command\": \"npx\",\n      \"args\": [\n        \"@aimcpvault\u002Fmcp-proxy\",\n        \"--\",\n        \"npx\", \"-y\", \"@modelcontextprotocol\u002Fserver-filesystem\", \"\u002Fpath\"\n      ]\n    }\n  }\n}\n```\n\n### Check a server's reputation\n\nThe `vault-check` binary ships inside the proxy package — no separate install needed:\n\n```bash\n# already installed if you have @aimcpvault\u002Fmcp-proxy\nnpx --package=@aimcpvault\u002Fmcp-proxy@next vault-check stdio:npx:@modelcontextprotocol\u002Fserver-filesystem\nnpx --package=@aimcpvault\u002Fmcp-proxy@next vault-check --all     # scores every server in your MCP config(s)\nnpx --package=@aimcpvault\u002Fmcp-proxy@next vault-check --json | jq .\n```\n\nA standalone `vault-check` binary (Homebrew tap, curl-piped install script) is planned for **v0.3** once we ship signed releases on GitHub.\n\nReputation comes from EAS attestations on Base, aggregated across every Vault deployment that scans the same server. Score range 0–1000 (higher = safer). The opt-in attestation path is documented above under \"Optional: on-chain attestation\"; a continuous public feed lands with v0.3.\n\n### Add a reputation badge to your MCP server's README\n\nIf you maintain an MCP server, embed its live Vault reputation score as a badge — same shape as shields.io. The badge updates automatically as new attestations land on-chain.\n\n```markdown\n![Vault Score](https:\u002F\u002Fvaultmcp.io\u002Fbadge\u002Fyour-server-name.svg)\n```\n\nReplace `your-server-name` with the identifier Vault uses for your server — typically the package name (e.g. `@modelcontextprotocol\u002Fserver-filesystem`) for npm-launched servers, or the full URL for HTTP\u002FSSE servers. Servers with no attestations yet render as a neutral \"unranked\" badge.\n\n### Public reputation API\n\nCORS-open, no-auth read endpoints backed by the same on-chain data:\n\n| Endpoint | Returns |\n|---|---|\n| `GET https:\u002F\u002Fvaultmcp.io\u002Fapi\u002Fscore\u002F:server` | Single server's score, scans, blocks, basescan link |\n| `GET https:\u002F\u002Fvaultmcp.io\u002Fapi\u002Fleaderboard?n=10` | Top N servers by scan count |\n| `GET https:\u002F\u002Fvaultmcp.io\u002Fapi\u002Fthreats\u002Frecent?n=20` | Recent ThreatRecord attestations |\n| `GET https:\u002F\u002Fvaultmcp.io\u002Fbadge\u002F:server.svg` | SVG reputation badge |\n\nAll endpoints accept an optional `?network=base|base-sepolia` query parameter. Cache headers are set; expect 60s edge cache.\n\n### HTTP\u002FSSE mode\n\n```bash\n# Proxy a remote MCP server over HTTP\nnpx @aimcpvault\u002Fmcp-proxy --transport http \\\n  --upstream https:\u002F\u002Fmcp.example.com\u002Fv1 \\\n  --port 8800\n```\n\n---\n\n## Configuration\n\nAll configuration is via environment variables:\n\n```bash\n# Detection\nVAULT_MODE=block              # block (default) | warn | log\nVAULT_LAYER2_THRESHOLD=0.35   # cosine distance cutoff for L2\nVAULT_LAYER3_PROVIDER=anthropic  # anthropic | openai | custom\nVAULT_LAYER3_MODEL=claude-haiku-4-5-20251001  # model override\nVAULT_LAYER3_TIMEOUT_MS=5000  # judge call timeout\n\n# Keys (BYO — Vault never stores or forwards them)\nANTHROPIC_API_KEY=sk-ant-...\nOPENAI_API_KEY=sk-...\n\n# Capability firewall\nVAULT_CAPABILITY=1\nVAULT_CAPABILITY_MODE=block\nVAULT_TAINT_MIN_OVERLAP=32\nVAULT_SENSITIVE_TOOL_PATTERNS=^my_custom_sensitive_tool\n\n# Manifest verification\nVAULT_MANIFEST_CHECK=on       # on | off | strict\n\n# Audit log\nVAULT_AUDIT_LOG=\u002Fvar\u002Flog\u002Fvault-mcp.jsonl\n\n# On-chain attestation (opt-in, off by default — full hosted feed lands in v0.3)\nVAULT_ATTEST=1\nVAULT_ATTESTER_PRIVATE_KEY=0x...  # fund with ~0.05 ETH on Base\nVAULT_EAS_ADDRESS=0x4200000000000000000000000000000000000021\nVAULT_SCAN_RECEIPT_SCHEMA=0x...   # register via packages\u002Fcontracts\nVAULT_THREAT_RECORD_SCHEMA=0x...\n\n# Telemetry (default on when URL set)\nVAULT_TELEMETRY=1\nVAULT_TELEMETRY_URL=https:\u002F\u002Fyour-collector.example.com\u002Fingest\nVAULT_TELEMETRY=0  # opt out\n```\n\n---\n\n## Audit log\n\nEvery verdict is written to an append-only JSONL file:\n\n```bash\nVAULT_AUDIT_LOG=\u002Fvar\u002Flog\u002Fvault-mcp.jsonl\n\n# View with the built-in CLI\nnpx vault-audit \u002Fvar\u002Flog\u002Fvault-mcp.jsonl\nnpx vault-audit --type detection --verdict malicious\nnpx vault-audit --tool read_file --since 1h\nnpx vault-audit --raw | jq .\n```\n\n---\n\n## Detection modes\n\n| Mode    | Behavior                                                    |\n|---------|-------------------------------------------------------------|\n| `block` | Malicious content is replaced with an error response (default) |\n| `warn`  | Original content is returned with a warning prepended       |\n| `log`   | Content passes through; detection recorded in audit log only |\n\n---\n\n## Privacy\n\nVault never sends raw content anywhere. The telemetry pipeline transmits only SHA-256 hashes of content and arguments, verdict labels, latency measurements, and pattern names. Raw text never leaves the proxy process.\n\nSee [PRIVACY.md](packages\u002Fproxy\u002FPRIVACY.md) for the full data inventory.\n\nTo disable telemetry entirely: `VAULT_TELEMETRY=0`.\n\n---\n\n## Project layout\n\n```\npackages\u002F\n  proxy\u002F          # @aimcpvault\u002Fmcp-proxy — the core proxy (this is what you install)\n  corpus\u002F         # @vaultmcp\u002Fcorpus — curated attack\u002Fclean embedding corpus\n  contracts\u002F      # @vaultmcp\u002Fcontracts — VaultReputation.sol + EAS schema registration\n  collector\u002F      # @vaultmcp\u002Fcollector — telemetry ingest + aggregation server\n  eval\u002F           # @vaultmcp\u002Feval — detection benchmarks vs. competitors\n  demo-site\u002F      # Next.js demo + live threat feed\n```\n\n---\n\n## Eval methodology — how to reproduce\n\nThe [Measured performance](#measured-performance) numbers above come from the harness and datasets committed in this repo. To reproduce:\n\n```bash\n# Clone and install\ngit clone https:\u002F\u002Fgithub.com\u002Fvaultmcp\u002Fvault.git\ncd vault\npnpm install\n\n# Run the eval (L1+L2 only, no API key required)\npnpm --filter @vaultmcp\u002Feval run eval -- --set both\n\n# To include L3, export an API key first\nexport ANTHROPIC_API_KEY=sk-ant-...\npnpm --filter @vaultmcp\u002Feval run eval -- --set both\n```\n\nOutputs land in `packages\u002Feval\u002Fresults\u002Feval-\u003Ctimestamp>.{md,json}`. Each run prints TPR, FPR, per-category breakdown, per-layer attribution, latency percentiles, and the worst false negatives and false positives by severity.\n\nThe holdout dataset lives at `packages\u002Feval\u002Fdatasets\u002Fholdout-attacks\u002F` — 188 entries across `published-papers`, `garak-probes`, `blog-pocs`, `owasp-llm`, `encoded-payloads`, `multi-turn`, and `roleplay-jailbreak`. The benign dataset (110 entries) lives at `packages\u002Feval\u002Fdatasets\u002Fbenign\u002F`. Both have `MANIFEST.md` files describing provenance.\n\n**Operators are encouraged to author their own attacks and submit pull requests.** Add a new file under `packages\u002Feval\u002Fdatasets\u002Fholdout-attacks\u002F` following the existing JSON schema, update the `MANIFEST.md`, and open a PR. The harness picks new files up automatically.\n\nThe self red-team (`packages\u002Feval\u002Fred-team\u002F`) is the other half of the honest-eval story: 38 hand-crafted bypass attempts, 9 of which still pass L1+L2 after our P2 fixes. They are categorized in [`packages\u002FLIMITATIONS.md`](packages\u002FLIMITATIONS.md).\n\n---\n\n## Development\n\n```bash\n# Prerequisites: Node 20+, pnpm 9+, Foundry 1.7+ (for contracts only)\n\n# Install\npnpm install\n\n# Run proxy in dev mode (wraps the MCP filesystem server)\npnpm --filter @aimcpvault\u002Fmcp-proxy dev -- npx -y @modelcontextprotocol\u002Fserver-filesystem \u002Ftmp\n\n# Run all tests\npnpm -r test\n\n# Typecheck\npnpm -r typecheck\n\n# Build all packages\npnpm -r build\n\n# Run eval benchmarks\npnpm --filter @vaultmcp\u002Feval run eval\n\n# Contracts (requires Foundry)\ncd packages\u002Fcontracts\nforge test\nforge build\n```\n\n---\n\n## Security\n\nVault is a defense-in-depth layer, not a complete solution. No regex or embedding model catches every attack — adversaries can craft payloads that evade any single detection strategy. Measured detection on our public v2 holdout is **100% TPR \u002F 0.0% FPR** (80 attacks, L3 enabled, 95.5%+ lower bound at 95% CI). Without L3, TPR is near 0% on novel out-of-distribution attacks — L3 is the detection backbone. The layered approach raises the bar; operators should treat Vault as one layer in a broader security posture.\n\nFor details:\n- [`packages\u002FLIMITATIONS.md`](packages\u002FLIMITATIONS.md) — measured gaps, red-team evidence, and which mitigations are planned vs. accepted.\n- [`packages\u002FSECURITY_MODEL.md`](packages\u002FSECURITY_MODEL.md) — threats Vault defends against, threats it does not, assumptions, and what an attacker still has to do.\n- [`SECURITY.md`](SECURITY.md) — vulnerability reporting policy.\n\nTo report a vulnerability: open a [GitHub Security Advisory](..\u002F..\u002Fsecurity\u002Fadvisories\u002Fnew) (preferred) or email the maintainer directly. We aim to triage within 48 hours and ship a patch within 7 days of a confirmed critical.\n\n---\n\n## License\n\nMIT\n","Vault 是一个针对 MCP（Model Context Protocol）工具响应的运行时安全代理，通过三层检测机制拦截并扫描每个工具响应以防止提示注入攻击。其核心技术特点包括支持多种大语言模型作为第三层检测的基础，如Anthropic、OpenAI兼容模型和Ollama等，并且能够在无互联网连接的情况下运行前两层检测。Vault特别适合需要增强MCP工具安全性的场景，例如在敏感数据处理、企业内部应用或任何对安全性有高要求的环境中使用。此外，Vault还提供了一个链上声誉检查功能，帮助用户评估与其配置服务器相关的信任度。",2,"2026-06-01 03:57:05","CREATED_QUERY"]