[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81813":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":12,"subscribersCount":12,"size":12,"stars1d":15,"stars7d":16,"stars30d":16,"stars90d":12,"forks30d":12,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":44,"readmeContent":45,"aiSummary":46,"trendingCount":12,"starSnapshotCount":12,"syncStatus":15,"lastSyncTime":47,"discoverSource":48},81813,"pluck","hunhee98\u002Fpluck","hunhee98","MCP-native code retrieval for AI agents — 84-88% fewer read tokens, BM25F + semantic search, AST chunks, session dedup",null,"Rust",37,0,34,1,2,3,6,45.8,"MIT License",false,"main",true,[24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43],"ai-agents","bm25","claude-code","cli","code-intelligence","code-search","codex","context-window","developer-tools","embeddings","llm","mcp","mcp-server","rag","ripgrep","rust","semantic-search","tantivy","token-optimization","tree-sitter","2026-06-12 04:01:35","\u003Cp align=\"right\">\n  \u003Ca href=\"README.ko.md\">한국어로 보기\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Ch2 align=\"center\">\n  \u003C!-- GPT IMAGE PROMPT: A sleek, modern logo for 'pluck', an AI code search tool. The logo should feature a stylized bird or feather motif, or a fast-moving abstract shape, with a clean tech-focused aesthetic. Use vibrant green and dark blue tones. Transparent background. -->\n  \u003Cimg width=\"30%\" alt=\"pluck logo\" src=\"assets\u002Fimages\u002Fpluck_logo.png\">\u003Cbr\u002F>\n  The MCP-native Code Retrieval Engine for AI Agents\u003Cbr\u002F>\n  \u003Csub>84-88 % fewer tokens on code reads · 71 % shorter CI logs · 0.07 ms warm search — every number gated by \u003Ca href=\"benchmarks\u002Fbaseline.json\">\u003Ccode>benchmarks\u002Fbaseline.json\u003C\u002Fcode>\u003C\u002Fa>\u003C\u002Fsub>\n\u003C\u002Fh2>\n\n\u003Cdiv align=\"center\">\n  \u003Ch2>\n    \u003Ca href=\"https:\u002F\u002Fcrates.io\u002Fcrates\u002Fpluck-mcp\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fpluck-mcp?color=%23007ec6&label=crates.io\" alt=\"Crates.io version\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhunhee98\u002Fpluck\u002Fblob\u002Fmain\u002FLICENSE\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green\" alt=\"License - MIT\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fwww.rust-lang.org\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Frust-1.75%2B-orange.svg\" alt=\"Rust\">\u003C\u002Fa>\n  \u003C\u002Fh2>\n\n[Quickstart](#quickstart) •\n[Why Pluck?](#why-pluck) •\n[MCP Tools](#mcp-tools) •\n[CLI](#standalone-cli-no-agent) •\n[Benchmarks](#performance--token-savings)\n\n\u003C\u002Fdiv>\n\n**pluck** is a local Rust daemon that replaces `cat` and `grep` as the default way AI agents read and search code. It exposes symbol-aware code reading and search to agents over the Model Context Protocol (MCP). Smart outlines cut eligible code-read tokens by **84-88 %**, CI logs compress by **71 %**, and warm search stays sub-millisecond — with a `--raw` fallback on every tool so the agent never loses capability by defaulting to pluck.\n\n```\nWithout pluck:  ls → grep → cat file1 → cat file2 → cat file3 → ...\nWith pluck:     pluck.plan \"fix auth-token expiry\"  → 3-5 next-call recommendations\n                pluck.search \"auth flow\"            → ranked chunks, BM25 + semantic\n                pluck.peek validate_token           → signature + callees only\n                pluck.symbol validate_token         → just that function's body\n                pluck.impact validate_token         → every caller, depth-capped\n                pluck.deps src\u002Fauth\u002Flogin.ts        → forward\u002Freverse import graph\n                pluck.digest \u003C cargo-build.log      → 71 % shorter, errors intact\n```\n\n## Quickstart\n\nPluck is designed to be the default retrieval tool for your AI coding agents.\n\n### 1. Paste the Agent Setup Prompt\n\n```text\nInstall and configure pluck for this repo.\n\nInstall \u002F register:\n- If pluck is missing, install pluck-mcp and pluck-cli with Cargo or Homebrew.\n- Register pluckd as an MCP server named `pluck` for this repository.\n- Prefer project-scoped config when the agent supports it.\n- Preserve existing MCP servers, rules, hooks, permissions, and comments.\n\nTo make pluck the first retrieval choice:\n- Use the strongest official mechanism this agent supports.\n- Claude Code: use project MCP config or\n  `claude mcp add --transport stdio --scope project pluck -- pluckd --repo \u003Crepo-root>`;\n  approve the `pluck` MCP server in settings\u002Fpermissions when needed; add\n  durable instructions in `CLAUDE.md`; use PreToolUse hooks or permission\n  deny rules only when they can safely redirect repo-local cat\u002Fgrep\u002Frg\u002Fread\n  retrieval to mcp__pluck__* tools.\n- Codex: use `codex mcp add pluck -- pluckd --repo \u003Crepo-root>` or\n  `~\u002F.codex\u002Fconfig.toml` \u002F trusted project `.codex\u002Fconfig.toml`; add the\n  pluck-first retrieval policy to `AGENTS.md`.\n- Cursor: use project `.cursor\u002Fmcp.json`; add an always-applied Project Rule\n  in `.cursor\u002Frules\u002F` or an `AGENTS.md` instruction for the workspace.\n\nPluck-first retrieval policy:\n- Use mcp__pluck__read before cat, head, tail, sed -n, or built-in file reads\n  for files inside the indexed repo.\n- Use mcp__pluck__search for conceptual code lookup.\n- Use mcp__pluck__grep before grep or rg for exact strings, regexes, TODOs,\n  and all textual matches inside the repo.\n- Use mcp__pluck__peek, mcp__pluck__symbol, mcp__pluck__expand,\n  mcp__pluck__impact, and mcp__pluck__deps for symbol shape, bodies, call\n  chains, refactor impact, and import relationships.\n- Use mcp__pluck__digest before pasting long cargo, npm, pytest, or GitHub\n  Actions logs into context.\n- Fall back to Bash or built-in reads only for binary files, paths outside the\n  repo, byte-exact shell pipelines, unsupported formats, or when pluck is\n  unavailable.\n\nVerify:\n- Restart or reload the agent if MCP changes require it.\n- Confirm the `pluck` MCP server is connected and mcp__pluck__* tools exist.\n- Run one repo code-search\u002Fread task and confirm the agent calls mcp__pluck__*\n  before Bash, grep\u002Frg, cat, or built-in file reads.\n- Show the files changed and the verification result.\n```\n\nFor an expanded version with safety checks and fallback instructions, use the\n[full agent install prompt](docs\u002FAGENT_INSTALL.md).\n\n### 2. Or set it up manually\n\n```bash\n# Daemon + standalone CLI from crates.io\ncargo install pluck-mcp pluck-cli\n\n# Or via Homebrew tap\nbrew tap hunhee98\u002Fpluck && brew install pluck\n```\n\n**Claude Code**\n```bash\npluck init --target claude --mode aggressive  # MCP + permissions + Bash retrieval block\n```\n*(Alternatively, you can manually enable it via `\u002Fplugin marketplace add hunhee98\u002Fpluck`)*\n\n**Codex**\n```bash\npluck init --target codex --mode strong  # MCP + AGENTS.md pluck-first policy\n```\n\n**Cursor**\n```bash\npluck init --target cursor --mode strong  # MCP + always-apply Cursor rule\n```\n\n## Why pluck?\n\nWhen AI agents use standard `cat` and `grep` to explore a codebase, they waste massive amounts of context window tokens. Re-reading the same file chunk, scrolling past unrelated functions, and re-paying tokens for identical imports on every read adds up to thousands of wasted tokens per session.\n\npluck solves this by providing an **agent-facing layer** for code search. Its core principle: **every retrieval call an agent makes should default to pluck.** Bash is only the fallback when pluck legitimately can't help (e.g., binary files, paths outside the repo).\n\n- **Smart Outline (`pluck.read`)**: Instead of dumping a 1,000-line file, it returns a token-efficient outline of signatures with tiny helper bodies inline. The agent can then fetch only the larger function bodies it needs.\n- **Session Dedup**: If an agent searches for \"auth\" and later searches for \"token\", any overlapping code chunks are replaced with a 1-token placeholder (`[already-shown: ...]`). The bytes are already in the agent's context; repeating them is pure waste.\n- **Lossless Default**: Stripping comments or dropping types hurts the agent's decision-making. pluck keeps the original bytes intact and makes lossy modes strictly opt-in.\n- **100% Capability Guarantee**: Every pluck tool has a `--raw` fallback that behaves exactly like `cat` or `grep` byte-for-byte.\n\n## How it works\n\npluck chunks files at the Abstract Syntax Tree (AST) level using Tree-sitter. When an agent queries, pluck ranks these chunks using a hybrid of keyword matching (BM25F over symbol\u002Fsignature\u002Fcontent) and semantic similarity (a static `model2vec`-style lookup, [`potion-code-16M`](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-code-16M), ~60 MB on disk — no transformer inference at runtime). Search expands natural-language BM25 queries with embedding-nearest terms from the indexed repo, then runs a two-stage cascade: BM25F first widens the candidate pool, embeddings rerank that pool, and a smaller semantic-rescue pass catches concept queries with weak lexical overlap. The RRF blend is picked continuously from the query embedding against natural-language and code centroids, so agents can search by concept (\"payment flow\") without losing precision on exact symbols.\n\n\u003C!-- GPT IMAGE PROMPT: A clean, modern architectural diagram showing how 'pluck' works. It should show 'Source files' going into 'Tree-sitter AST chunking', splitting into 'tantivy BM25F index' and 'static model2vec embedding (potion-code-16M)'. These feed into an 'in-RAM index' managed by 'pluckd MCP daemon'. An 'Agent query' comes in, goes through 'BM25 + semantic RRF', 'Noise cutoff', 'Session dedup', and finally returns a 'Ranked snippet'. Use a dark theme with neon green and blue accents. -->\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fimages\u002Farchitecture_overview.png\" alt=\"Architecture Overview\">\n\u003C\u002Fp>\n\n### Session dedup in action\n\n\u003C!-- GPT IMAGE PROMPT: A sequence diagram illustrating 'Session dedup'. An 'Agent' asks 'pluckd' for \"auth token\". pluckd returns Chunk A (420 tokens) and Chunk B (380 tokens). Later, the Agent asks for \"session expiry\". pluckd realizes Chunk A was already sent, so it returns a 1-token placeholder [already-shown: chunk A] along with a new Chunk C (340 tokens). The diagram should visually highlight the \"Saved 419 tokens\" aspect. -->\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fimages\u002Fsession_dedup_flow.png\" alt=\"Session Dedup Flow\">\n\u003C\u002Fp>\n\n## MCP Tools\n\nAgents call specific tools depending on what they need. Bash is the fallback, not the default.\n\n| Tool (wire name) | Replaces | Use when |\n|------------------|----------|----------|\n| `mcp__pluck__read` | `cat` | Read a code file (smart outline by default; `raw: true` for byte-exact) |\n| `mcp__pluck__grep` | `grep` \u002F `rg` | Keyword search (all ripgrep flags wrapped) |\n| `mcp__pluck__search` | — | Ranked-chunk search (BM25 + semantic RRF) |\n| `mcp__pluck__symbol` | `cat` + scroll | Read just that function\u002Fclass |\n| `mcp__pluck__peek` | — | Signature + direct callees only |\n| `mcp__pluck__expand` | many `cat`s | Symbol + callees up to N hops |\n| `mcp__pluck__impact` | grep + read each caller | Reverse call graph — \"who calls this symbol?\" |\n| `mcp__pluck__deps` | grep imports + read each file | File-level import graph — \"what does this file depend on \u002F who imports it?\" |\n| `mcp__pluck__digest` | piping `cargo build`\u002F`pytest`\u002FCI logs to `cat` | Compress verbose tool output (errors \u002F panics kept verbatim, progress lines collapsed) |\n| `mcp__pluck__plan` | speculative `search`\u002F`read` loop | Given a free-form task, recommend the next 3-5 retrieval calls + confidence indicator |\n\n## Standalone CLI (no agent)\n\nYou can also use pluck directly in your terminal:\n\n```bash\npluck index .\npluck search \"auth flow\" --repo .\npluck read src\u002Fauth\u002Flogin.ts        # smart outline\npluck read src\u002Fauth\u002Flogin.ts --raw  # byte-equivalent cat\n```\n\n## Performance & Token Savings\n\nEvery number on this page cites a frozen baseline row or a measured scenario. No projected \u002F aspirational percentages.\n\n\u003C!-- GPT IMAGE PROMPT: A bar chart comparing token usage per session between \"bash (rg+cat)\" and \"pluck\". The \"bash\" bar should be high and colored red or gray. The \"pluck\" bar should be lower and colored bright green. The title should be \"Tokens per session\". Clean, modern UI style. -->\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fimages\u002Ftoken_savings_chart.png\" alt=\"Token Savings Chart\">\n\u003C\u002Fp>\n\n### Gated engine metrics\n\nThese are the invariants in [`benchmarks\u002Fbaseline.json`](benchmarks\u002Fbaseline.json). Every commit that touches engine-core runs `scripts\u002Fregression-gate.py` and the gate fails the build if any of them drift past tolerance.\n\n| Metric | Value | Source row in `baseline.json` |\n|--------|-------|-------------------------------|\n| Chunker p50 (medium repo, 500 lines) | **1.05 ms** | `chunker_medium_ms_p50` |\n| Indexer throughput (medium, 500 files) | **2 747 files\u002Fs** | `indexer_files_per_sec_medium` |\n| Warm search p50 (medium) | **0.07 ms** | `warm_search_p50_ms_medium` |\n| File save → searchable p50 | **171 ms** | `freshness_p50_ms_medium` |\n| Session-dedup savings (5-query bench) | **23 %** | `session_dedup_session_savings_pct` |\n| `pluck.digest` log compression (median of 6 fixtures) | **71 %** | `digest_savings_pct` |\n\n### Eligible read-token savings\n\n`pluck.read` outline mode is where pluck stops agents from paying the `cat` tax: instead of dumping every line, it returns the file's symbol map, inlines tiny helper bodies, and lets the agent fetch larger bodies on demand.\n\n| Read workload | `cat` tokens | `pluck.read` tokens | Savings |\n|---------------|-------------:|--------------------:|--------:|\n| medium realistic (5 fns, ~120 lines) | 929 | 116 | **88 %** |\n| large realistic (25 fns, ~600 lines) | 4 549 | 556 | **88 %** |\n| xl realistic (100 fns, ~2 400 lines) | 18 124 | 2 320 | **87 %** |\n| class (1 class + 50 methods) | 8 608 | 1 302 | **85 %** |\n\nTiny files and `raw` reads are control cases: they are expected to show little or no savings because byte-exact fallback is the point.\n\n### Measured single-scenario token reduction\n\n`fix-auth-token-expiry`: same JIRA-style task, bash workflow (`rg -l` + several `cat`s) vs pluck workflow (`search` + `read` + `symbol`). Both runners arrive at the same fix:\n\n| Runner | Tokens spent | Source |\n|--------|--------------|--------|\n| bash (`rg + cat`) | 1 248 | [`fix-auth-token-expiry-1778750775.json`](benchmarks\u002Fresults\u002Ffix-auth-token-expiry-1778750775.json) |\n| **pluck** (`search + read + symbol`) | **931 (−25 %)** | same file |\n\nBroader LLM-in-the-loop measurements across `fix` \u002F `refactor` \u002F `explore` \u002F `search` \u002F `review` scenarios are roadmapped as v0.8.0 work. We'll publish those numbers when they exist, not before.\n\n### Feature Comparison\n\n| Capability | `cat` + `grep` \u002F `rg` | Other code-search tools | **pluck** |\n|------------|----------------------|-------------------------|-----------|\n| Hybrid BM25 + semantic ranking | ✗ | typically ✓ | ✓ |\n| AST-level chunks | ✗ | typically ✓ | ✓ |\n| Persistent daemon (MCP stdio) | — | ✗ (cold CLI per call) | **✓** |\n| Persistent on-disk index (mmap) | — | usually ✗ | ✗ — roadmapped (v0.7.0) |\n| Incremental reindex (file watcher) | — | usually ✗ | **✓ — 171 ms p50** |\n| **Session-scoped dedup** | — | ✗ | **✓ — 23 % savings on bench** |\n| **`--raw` cat\u002Fgrep byte parity** | — | ✗ | **✓** |\n| **Lossless default, lossy opt-in** | — | varies | **✓** |\n| `peek` (signature + direct callees) | ✗ | ✗ | **✓** |\n| Single-file outline (`pluck.read`) | ✗ | ✗ | **✓** |\n| Multi-hop `expand` (call graph) | ✗ | ✗ | **✓** |\n| Reverse call graph (`impact`) | ✗ | ✗ | **✓** |\n| File-level import graph (`deps`) | ✗ | ✗ | **✓** |\n| Build \u002F CI \u002F test log compression (`digest`) | ✗ | ✗ | **✓ — 71 % median** |\n| Exploration recommender (`plan`) | ✗ | ✗ | **✓** |\n\n## Roadmap\n\nVersioning details live in [`docs\u002FVERSIONING.md`](docs\u002FVERSIONING.md).\nMaintainer release flow lives in\n[`docs\u002FMAINTAINER_LOOP.md`](docs\u002FMAINTAINER_LOOP.md).\n\n- **v0.2.0 — shipped**: First crates.io publish, MCP tools, session dedup,\n  smart outline, and expanded surface — `digest`, `impact`, `deps`, `plan`.\n- **v0.3.0 — shipped**: Natural-language recall — 100-query suite across\n  tokio \u002F django \u002F next.js, query expansion, two-stage cascade, continuous\n  hybrid weighting, NDCG@10 measurement, and symbol\u002Fpath component ranking.\n- **v0.4.0 — active train**: Java + repo-format coverage — Java, HTML,\n  prompt-first agent install, TSX grammar fixes, CSS\u002FSCSS, Markdown\u002FMDX,\n  YAML\u002FJSON\u002FTOML, Dockerfile, and Shell landed; fixtures\u002Fgate hardening remain.\n- **v0.5.0**: Systems + JVM tier — C, C++, Kotlin, SQL, Terraform\u002FHCL.\n- **v0.6.0**: App-framework tier — Ruby, PHP, Swift, Vue, Svelte, Astro,\n  OpenAPI \u002F GraphQL.\n- **v0.7.0**: Scale + persistence — mmap index, schema versioning,\n  incremental embedding re-encode, memory\u002Fdisk caps.\n- **v0.8.0**: Adoption + observability — adoption counter, tool-description\n  A\u002FB harness, LLM-in-loop bench, multilingual tool descriptions.\n- **v0.9.0**: Workflow intelligence + ecosystem — JSON output, `diff`,\n  `history`, `profile`, Aider \u002F OpenHands \u002F Cursor \u002F Cline \u002F Continue.\n- **v1.0.0**: Stable default retrieval layer — stable MCP\u002FCLI contracts,\n  benchmark dashboard, release checklist, config migration, supply-chain review.\n\n## License\n\nMIT - See [LICENSE](LICENSE) for details.\n","pluck是一个为AI代理设计的本地代码检索引擎，它使用Model Context Protocol (MCP)来实现符号感知的代码阅读和搜索功能。该项目通过BM25F与语义搜索结合的方式，并利用抽象语法树（AST）片段进行高效检索，同时支持会话去重处理，显著减少了代码读取时所需的数据令牌数量达84-88%，并缩短了持续集成日志长度71%。适用于需要提高代码检索效率、减少资源消耗以及加速开发流程的场景，特别是在涉及大量代码库管理和维护的情况下表现尤为出色。项目采用Rust语言编写，确保了高性能与低延迟特性。","2026-06-11 04:06:48","CREATED_QUERY"]