[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81857":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":13,"stars7d":13,"stars30d":13,"stars90d":13,"forks30d":13,"starsTrendScore":13,"compositeScore":14,"rankGlobal":10,"rankLanguage":10,"license":15,"archived":16,"fork":16,"defaultBranch":17,"hasWiki":16,"hasPages":18,"topics":19,"createdAt":10,"pushedAt":10,"updatedAt":20,"readmeContent":21,"aiSummary":22,"trendingCount":13,"starSnapshotCount":13,"syncStatus":23,"lastSyncTime":24,"discoverSource":25},81857,"ferrocache","nickleodoen\u002Fferrocache","nickleodoen","A Distributed Semantic Cache Service for LLM applications - Multi-node, MCP-compatible, Written in Rust!","",null,"Rust",24,0,40,"MIT License",false,"main",true,[],"2026-06-12 04:01:35","# FerroCache\n\n**A Distributed Semantic Cache Service for LLM Applications**\n\nFerroCache is a standalone service that sits in front of your LLM calls and returns cached responses for semantically similar queries. Because it's a compiled Rust binary with an HTTP API, **any language can use it** — Python, Go, Node.js, Java, Ruby, anything that can make an HTTP request. LLM API calls are expensive; semantically similar queries should reuse cached answers instead of paying for a new completion. Unlike GPTCache, FerroCache is a service, not an in-process library — deploy it once, share the cache across your entire fleet, and the cache survives application restarts.\n\n[Documentation](https:\u002F\u002Fnickleodoen.github.io\u002Fferrocache) · [PyPI](https:\u002F\u002Fpypi.org\u002Fproject\u002Fferrocache) · [Docker](https:\u002F\u002Fgithub.com\u002Fnickleodoen\u002Fferrocache\u002Fpkgs\u002Fcontainer\u002Fferrocache) · [Changelog](CHANGELOG.md) · [Contributing](CONTRIBUTING.md)\n\n---\n\n## Features\n\n**Cache core**\n- [x] Semantic similarity search via HNSW (approximate nearest neighbor)\n- [x] Exact-match pre-filter — verbatim queries return in \u003C0.4ms\n- [x] Configurable cosine similarity threshold (default: 0.92)\n- [x] Embedding-model agnostic — bring your own vectors\n- [x] Per-entry TTL with background expiry reaper\n- [x] LRU eviction with configurable max entries per namespace\n- [x] `DELETE \u002Fentry\u002F:uuid` — targeted cache invalidation\n- [x] `POST \u002Fadmin\u002Finvalidate` — semantic radius invalidation\n\n**Namespace isolation**\n- [x] Model namespace partitioning — vectors from different models never compare\n- [x] Tenant isolation via `cache_scope` — one cache, many tenants\n- [x] Conversation scoping with two-level fallback (conversation → global)\n- [x] Auto-TTL on conversation namespaces\n\n**Durability & operations**\n- [x] Write-ahead log (WAL) with fsync — survives process crashes\n- [x] Atomic snapshots with WAL compaction\n- [x] Group-commit WAL batching — 2,600+ inserts\u002Fsec at concurrency 50\n- [x] Prometheus `\u002Fmetrics` endpoint\n- [x] Grafana dashboard (docker-compose overlay)\n- [x] `\u002Fadmin\u002Fentry-stats` — per-namespace access analytics\n\n**Distribution**\n- [x] Multi-node cluster via consistent hashing + chitchat gossip\n- [x] Synchronous write replication (configurable replication factor)\n- [x] Phi accrual failure detection (Cassandra-style)\n- [x] Automatic ring reassignment on node failure (zero data movement)\n- [x] Read repair — stale nodes heal through traffic\n\n**Security**\n- [x] Bearer token auth on HTTP API — opt-in via `FERROCACHE_AUTH_TOKEN`\n- [x] Mutual TLS between cluster nodes — opt-in via `cluster.tls.enabled`\n- [x] Constant-time token comparison (timing-attack safe)\n\n**Integrations**\n- [x] Python client (zero dependencies, stdlib only)\n- [x] OpenAI SDK drop-in wrapper (`wrap_openai`)\n- [x] Anthropic SDK drop-in wrapper (`wrap_anthropic`)\n- [x] LangChain cache backend (`FerrocacheCache`)\n- [x] LlamaIndex LLM wrapper (`FerrocacheLLM`)\n- [x] MCP server for Claude Desktop \u002F Claude Code\n- [x] Any language via HTTP — Go, Node.js, Java, Ruby, etc.\n\n---\n\n## Quick Install\n\n```bash\n# Docker (recommended)\ndocker run -p 3000:3000 ghcr.io\u002Fnickleodoen\u002Fferrocache:latest\n```\n\n```bash\n# Python client\npip install ferrocache\npip install ferrocache[openai]    # + OpenAI middleware\npip install ferrocache[all]       # everything\n```\n\n```bash\n# Build from source (Rust required)\ngit clone https:\u002F\u002Fgithub.com\u002Fnickleodoen\u002Fferrocache\ncd ferrocache && cargo build --release\n.\u002Ftarget\u002Frelease\u002Fferrocache\n```\n\n\u003Cdetails>\n\u003Csummary>▶ \u003Cstrong>Example Usage\u003C\u002Fstrong> (click to expand)\u003C\u002Fsummary>\n\n**Example 1 — Python (universal pattern, no framework):**\n\n```python\nfrom ferrocache import FerrocacheClient\nimport openai\n\nclient = FerrocacheClient(\"http:\u002F\u002Flocalhost:3000\")\nyour_openai = openai.OpenAI()\n\ndef ask(question: str, embedding: list[float]) -> str:\n    # Check cache first\n    hit = client.query(embedding=embedding, threshold=0.92, model_id=\"gpt-4o-mini::1536\")\n    if hit[\"hit\"]:\n        return hit[\"response\"]  # no LLM call needed\n\n    # Cache miss — call the LLM\n    answer = your_openai.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": question}],\n    ).choices[0].message.content\n\n    client.insert(\n        embedding=embedding,\n        response=answer,\n        query_text=question,\n        model_id=\"gpt-4o-mini::1536\",\n    )\n    return answer\n```\n\n**Example 2 — Drop-in OpenAI wrapper (one line change):**\n\n```python\nfrom openai import OpenAI\nfrom ferrocache.middleware import wrap_openai\n\nclient = wrap_openai(OpenAI())  # that's it\n\nresponse = client.chat.completions.create(\n    model=\"gpt-4o-mini\",\n    messages=[{\"role\": \"user\", \"content\": \"What is the refund policy?\"}],\n)\nprint(response._ferrocache_hit)  # True on cache hit\n```\n\n**Example 3 — Tenant isolation (multi-tenant SaaS):**\n\n```python\n# Different tenants never share cache entries\nclient.insert(\n    embedding=emb,\n    response=answer,\n    query_text=\"...\",\n    model_id=\"gpt-4o-mini::1536\",\n    cache_scope=\"tenant_abc\",\n)\nresult = client.query(embedding=emb, threshold=0.92, model_id=\"gpt-4o-mini::1536\", cache_scope=\"tenant_abc\")  # hits\nresult = client.query(embedding=emb, threshold=0.92, model_id=\"gpt-4o-mini::1536\", cache_scope=\"tenant_xyz\")  # miss\n```\n\n→ Full documentation with examples for all integrations: [nickleodoen.github.io\u002Fferrocache](https:\u002F\u002Fnickleodoen.github.io\u002Fferrocache)\n\n\u003C\u002Fdetails>\n\n---\n\n## Architecture\n\n![FerroCache Architecture](docs\u002Fassets\u002Farchitecture.png)\n\n- Your app hits any node. Queries route to the correct shard via consistent hashing on the embedding vector. Writes replicate synchronously to N nodes.\n- Nodes discover each other via gossip (chitchat). Ring membership updates propagate in ~2 seconds. No Zookeeper, no etcd, no coordinator.\n- Node failures are detected by phi accrual (Cassandra-style). Failed nodes' ring arcs fold to their replica neighbor automatically.\n\n---\n\n## Benchmarks\n\n**The right comparison for FerroCache is Redis, not GPTCache.**\n\nGPTCache is a Python library — it runs inside your process and its \"latency\" is a function call, not a network call. FerroCache is a service — like Redis, it has a network boundary by design, which is what lets it be shared across your entire fleet.\n\n**FerroCache performance** (Apple M4 Pro, release build):\n\n| Operation | p50 | p95 | p99 |\n|---|---|---|---|\n| Query hit (HTTP round-trip) | 0.44ms | 0.51ms | 0.54ms |\n| Query miss (HTTP round-trip) | 0.42ms | 0.48ms | 0.50ms |\n| Insert (includes WAL fsync) | 7.95ms | 8.36ms | 8.71ms |\n| Exact-match pre-filter | 0.38ms | — | — |\n\nInsert throughput: **2,600+ ops\u002Fsec at concurrency 50** (group-commit WAL).\n\n**Feature comparison vs GPTCache:**\n\n| | FerroCache | GPTCache |\n|---|---|---|\n| Architecture | Service (HTTP) | Library (in-process) |\n| Multi-node cluster | ✅ | ❌ |\n| Shared across fleet | ✅ | ❌ (per-process) |\n| WAL durability | ✅ (fsync) | ❌ (in-memory) |\n| Survives app restart | ✅ | ❌ |\n| Tenant isolation | ✅ `cache_scope` | ❌ |\n| Conversation scoping | ✅ | ❌ |\n| Exact-match pre-filter | ✅ | ❌ |\n| TTL per entry | ✅ | ⚠️ partial |\n| LRU eviction | ✅ | ✅ |\n| Any language client | ✅ | ❌ (Python only) |\n| Prometheus metrics | ✅ | ❌ |\n| Memory (data path) | ~7 MB \u002F 50 entries | ~7 MB \u002F 50 entries |\n\n> GPTCache query p50 is 0.082ms because it's an in-process function call. FerroCache query p50 is 0.44ms because it's an HTTP request — the same reason Redis is \"slower\" than a Python dict.\n\n---\n\n## Configuration\n\nAll keys default to single-node mode. Override via `ferrocache.toml` in the working directory or `FERROCACHE_*` env vars (env wins). Nested keys use `__` as a section separator; lists are comma-separated.\n\n**Core**\n\n| Key | Type | Default | Env var |\n|---|---|---|---|\n| `port` | u16 | `3000` | `FERROCACHE_PORT` |\n| `node_id` | string? | random UUID | `FERROCACHE_NODE_ID` |\n| `wal_path` | string | `.\u002Fferrocache.wal` | `FERROCACHE_WAL_PATH` |\n\n**HNSW**\n\n| Key | Type | Default | Env var |\n|---|---|---|---|\n| `hnsw.max_nb_connection` | usize | `16` | `FERROCACHE_HNSW__MAX_NB_CONNECTION` |\n| `hnsw.ef_construction` | usize | `200` | `FERROCACHE_HNSW__EF_CONSTRUCTION` |\n| `hnsw.ef_search` | usize | `32` | `FERROCACHE_HNSW__EF_SEARCH` |\n| `hnsw.default_threshold` | f32 | `0.92` | `FERROCACHE_HNSW__DEFAULT_THRESHOLD` |\n| `hnsw.max_entries_per_namespace` | usize? | `None` (unlimited) | `FERROCACHE_HNSW__MAX_ENTRIES_PER_NAMESPACE` |\n\n**Eviction & TTL**\n\n| Key | Type | Default | Env var |\n|---|---|---|---|\n| `expire_scan_interval_secs` | u64 | `60` | `FERROCACHE_EXPIRE_SCAN_INTERVAL_SECS` |\n| `conversation_ttl_seconds` | u64? | `None` | `FERROCACHE_CONVERSATION_TTL_SECONDS` |\n\n**Cluster**\n\n| Key | Type | Default | Env var |\n|---|---|---|---|\n| `cluster.enabled` | bool | `false` | `FERROCACHE_CLUSTER__ENABLED` |\n| `cluster.seed_nodes` | list\u003Cstring> | `[]` | `FERROCACHE_CLUSTER__SEED_NODES` |\n| `cluster.replication_factor` | usize | `2` | `FERROCACHE_CLUSTER__REPLICATION_FACTOR` |\n| `cluster.read_repair_enabled` | bool | `true` | `FERROCACHE_CLUSTER__READ_REPAIR_ENABLED` |\n| `cluster.dead_node_removal_enabled` | bool | `true` | `FERROCACHE_CLUSTER__DEAD_NODE_REMOVAL_ENABLED` |\n\n**Security**\n\n| Key | Type | Default | Env var |\n|---|---|---|---|\n| `auth_token` | string? | `None` (auth off) | `FERROCACHE_AUTH_TOKEN` |\n| `cluster.tls.enabled` | bool | `false` | `FERROCACHE_CLUSTER__TLS__ENABLED` |\n\n**Performance**\n\n| Key | Type | Default | Env var |\n|---|---|---|---|\n| `wal_batch_size` | usize | `256` | `FERROCACHE_WAL_BATCH_SIZE` |\n| `wal_batch_timeout_ms` | u64 | `1` | `FERROCACHE_WAL_BATCH_TIMEOUT_MS` |\n\nFull reference at [Getting Started Docs](https:\u002F\u002Fnickleodoen.github.io\u002Fferrocache\u002Fgetting-started\u002Fconfiguration).\n\n---\n\n## Production Cluster\n\nRun a 3-node cluster when you need the cache to survive a node failure without any application-side changes.\n\n```bash\ndocker compose up -d --build\nsleep 5\n.\u002Ftests\u002Fcluster_integration.sh   # 44 assertions over the live cluster\ndocker compose down -v\n```\n\nExternal ports `3001`\u002F`3002`\u002F`3003` map to the three nodes. An insert sent to any node is replicated to `replication_factor` owners along the ring; a query sent to any node is forwarded to the owning shard.\n\n---\n\n## SDK Integrations\n\n**OpenAI** — drop-in wrapper proxies attribute access; only the chat-completion method is intercepted.\n\n```python\nfrom openai import OpenAI\nfrom ferrocache.middleware import wrap_openai\n\nclient = wrap_openai(OpenAI(), cache_scope=\"tenant_abc\")\nresp = client.chat.completions.create(\n    model=\"gpt-4o-mini\",\n    messages=[{\"role\": \"user\", \"content\": \"What is the capital of France?\"}],\n)\nprint(resp.choices[0].message.content, resp._ferrocache_hit)\n```\n\n→ [OpenAI Integration Docs](https:\u002F\u002Fnickleodoen.github.io\u002Fferrocache\u002Fintegrations\u002Fopenai\u002F)\n\n**Anthropic** — same pattern for the Anthropic SDK.\n\n```python\nfrom anthropic import Anthropic\nfrom ferrocache.middleware import wrap_anthropic\n\nclient = wrap_anthropic(Anthropic())\nresp = client.messages.create(\n    model=\"claude-haiku-4-5\",\n    max_tokens=512,\n    messages=[{\"role\": \"user\", \"content\": \"Briefly: what is HNSW?\"}],\n)\n```\n\n→ [Anthropic Integration Docs](https:\u002F\u002Fnickleodoen.github.io\u002Fferrocache\u002Fintegrations\u002Fanthropic\u002F)\n\n**LangChain** — register as the global LLM cache.\n\n```python\nfrom langchain.globals import set_llm_cache\nfrom ferrocache.langchain import FerrocacheCache\n\nset_llm_cache(FerrocacheCache(cache_scope=\"tenant_abc\"))\n```\n\n→ [LangChain Integration Docs](https:\u002F\u002Fnickleodoen.github.io\u002Fferrocache\u002Fintegrations\u002Flangchain\u002F)\n\n**LlamaIndex** — wrap any LlamaIndex-compatible LLM.\n\n```python\nfrom llama_index.llms.openai import OpenAI\nfrom ferrocache.llamaindex import FerrocacheLLM\n\nllm = FerrocacheLLM(inner=OpenAI(model=\"gpt-4o-mini\"), cache_scope=\"tenant_abc\")\n```\n\n→ [LlamaIndex Integration Docs](https:\u002F\u002Fnickleodoen.github.io\u002Fferrocache\u002Fintegrations\u002Fllamaindex\u002F)\n\n**MCP server** — exposes semantic caching as tools for Claude Desktop \u002F Claude Code.\n\n```bash\npip install -r clients\u002Fpython\u002Fmcp_requirements.txt\npython3 -m ferrocache.mcp_server      # speaks JSON-RPC over stdio\n```\n\n→ [MCP Server Docs](https:\u002F\u002Fnickleodoen.github.io\u002Fferrocache\u002Fintegrations\u002Fmcp\u002F)\n\n---\n\n## Contributing\n\nFerroCache is actively developed and welcomes contributions. Three areas where contributions would have the most impact:\n\n**1. Embedding model integrations**\nFerroCache is embedding-agnostic by design — the client computes the vector. But most users want a default that just works. Adding first-class support for Voyage AI, Cohere, and local Ollama models to the Python client's auto-embed path would lower the barrier to adoption significantly.\nGood first issue: add `ferrocache[voyage]` extra with a Voyage AI embed_fn.\n\n**2. Async Python client**\nThe Python client and all middleware wrappers are synchronous. Modern Python LLM applications are async-native (LangChain LCEL, the async Anthropic client, FastAPI). An `AsyncFerrocacheClient` built on `httpx.AsyncClient` would unblock this entire class of users.\nGood first issue: implement `AsyncFerrocacheClient` mirroring the sync client's API.\n\n**3. Load testing and real-world benchmarks**\nThe current benchmarks run on synthetic FAQ workloads. Real-world hit rate data on production query distributions (MS MARCO, customer support logs, coding assistant queries) would help users calibrate their threshold and make the project more credible to evaluators.\nGood first issue: publish a benchmark notebook using the MS MARCO dataset.\n\n→ See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions, code style, and the PR process.\n→ Open issues are labeled [`good first issue`](https:\u002F\u002Fgithub.com\u002Fnickleodoen\u002Fferrocache\u002Fissues).\n\n---\n\n## Security\n\n```bash\n# Bearer token auth on the public HTTP API\nexport FERROCACHE_AUTH_TOKEN=\"$(openssl rand -hex 32)\"\n\n# Mutual TLS between cluster nodes\nexport FERROCACHE_CLUSTER__TLS__ENABLED=true\n```\n\nWith auth on, `\u002Fhealth` and `\u002Fmetrics` stay open; all data routes require `Authorization: Bearer \u003Ctoken>`. With mTLS on, FerroCache binds a second listener on `internal_port` (default `port + 1000`) requiring a client cert chained to the cluster CA. Public-port TLS is expected to be terminated by a reverse proxy. See [docs\u002Fsecurity.md](docs\u002Fsecurity.md) for the full threat model.\n\n---\n\n## Development\n\n```bash\ncargo test                        # unit tests (~222 pass)\ncargo clippy --all-targets -- -D warnings\nmake cluster-test                 # docker compose + integration script (44 assertions)\nmake benchmark-vs-gptcache        # FerroCache vs GPTCache\n```\n\nCI runs `check`\u002F`test`\u002F`clippy`\u002F`fmt` plus the docker-compose cluster integration on every push (`.github\u002Fworkflows\u002Fci.yml`).\n","FerroCache 是一个为大型语言模型应用设计的分布式语义缓存服务。它使用Rust编写，支持多节点部署，并兼容MCP协议。核心功能包括通过HNSW进行语义相似度搜索、精确匹配预过滤、可配置的余弦相似度阈值以及基于TTL的LRU淘汰策略等。此外，FerroCache还提供了命名空间隔离、写前日志持久化、同步写复制等特性来增强数据安全性和系统可靠性。该服务特别适合需要频繁调用LLM API的应用场景，如聊天机器人、智能客服等，能够有效降低API调用成本并提高响应速度。",2,"2026-06-11 04:06:59","CREATED_QUERY"]