[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81001":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":12,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":14,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":10,"rankLanguage":10,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":17,"hasPages":17,"topics":19,"createdAt":10,"pushedAt":10,"updatedAt":20,"readmeContent":21,"aiSummary":22,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":23,"discoverSource":24},81001,"zerikai_memory","KikeVen\u002Fzerikai_memory","KikeVen","A standalone local-only Python MCP server that gives any IDE persistent, workspace-isolated memory. works with any IDE supporting MCP servers","https:\u002F\u002Fzerikai.com\u002Fabout.html",null,"Python",30,2,0,1.43,"MIT License",false,"main",[],"2026-06-12 02:04:09","# Zerikai Memory\n\nA standalone local-only Python MCP server that gives any IDE persistent, workspace-isolated memory. Combines ChromaDB (local vector store), Ollama (free local summarisation), and DeepSeek (cloud synthesis) with automatic cost-aware routing.\n\n![memory](zerikai_memory.png)\n\n---\n\n## Table of Contents\n\n1. [What Is Zerikai Memory?](#what-is-zerikai-memory)\n2. [How It Works](#how-it-works)\n3. [Cost Savings Explained](#cost-savings-explained)\n4. [Project Structure](#project-structure)\n5. [Prerequisites](#prerequisites)\n6. **[Installation](#installation)** *\n7. [IDE Registration](#ide-registration)\n8. [Workspace Setup (per project)](#workspace-setup-per-project)\n9. [Day-to-Day Usage](#day-to-day-usage)\n10. [Auto-Routing Reference](#auto-routing-reference)\n11. [Project Brief Structure](#project-brief-structure)\n12. [MCP Tools Reference](#mcp-tools-reference)\n13. [Monitoring & Logs](#monitoring--logs)\n14. [Auxiliary Scripts](#auxiliary-scripts)\n15. [Embedding-Docstring Skill](#embedding-docstring-skill)\n16. [DeepSeek KV Cache Optimisation](#deepseek-kv-cache-optimisation)\n17. [Security & Data Privacy](#security--data-privacy)\n18. [Troubleshooting](#troubleshooting)\n19. [Notice](#notice)\n\n---\n\n## What Is Zerikai Memory?\n\nZerikai Memory is a **local MCP (Model Context Protocol) server** that provides persistent, workspace-isolated memory to your IDE's AI assistant. It solves a core problem with AI-assisted development: every new chat session starts cold, forcing you to re-explain your project context, decisions, and conventions, wasting tokens and time.\n\nBy storing compressed, semantically searchable summaries of your codebase and architectural decisions, Zerikai Memory enables your IDE's AI to:\n\n- **Recall decisions** made in previous sessions instantly.\n- **Understand your codebase** without raw file dumps into the chat window.\n- **Share context across IDEs**, work in VS Code, then switch to Cursor, with no re-explanation.\n- **Dramatically reduce API costs** through smart local\u002Fcloud routing and DeepSeek KV caching.\n\nThe server runs **entirely on your local machine**. Each IDE connects via STDIO to its own server process, with direct filesystem access for workspace scanning.\n\n---\n\nMost recent updates are listed in the changelog below. Click to expand:\n\n\u003Cdetails>\n\n**\u003Csummary>New Updates 2026-05-27\u003C\u002Fsummary>**\n\n- **IDE Agent Rules:** New [`agent_rules\u002Fide_agent_rules.md`](agent_rules\u002Fide_agent_rules.md) with two behavioral rules — *Universal-Brain First* (query memory before raw file searches) and *Source Discipline* (always surface `file:line` citations with confidence scores). Includes per-IDE setup instructions for VS Code Copilot, pi.dev, Google Antigravity, and Claude Desktop. Applied rules prevent agents from skipping memory or fabricating answers.\n- **README:** New `### IDE Agent Rules` subsection under IDE Registration summarising the rules and linking to the file. `agent_rules\u002F` added to the project structure tree.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n**\u003Csummary>Updates 2026-05-17\u003C\u002Fsummary>**\n\n- **Inline source citations:** Replaced the `## Sources` Markdown table with plain-text `#file:line (distance)` citations. Renders in every IDE without broken tables; clickable in VS Code Copilot. Tested against Copilot, Claude Desktop, Antigravity, and pi with documented agent behavior differences.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n**\u003Csummary>Updates 2026-05-13\u003C\u002Fsummary>**\n\n- **Lexical re-ranking in `query_memory`:** New hybrid search step: after semantic retrieval, results are reordered by keyword overlap in entity names and docstrings. Solves false positives where functions with shared vocabulary (e.g. \"tree-sitter\", \"extract\") crowd out the correct match. `ENABLE_LEXICAL_RERANK=true` activates it; `LEXICAL_RERANK_WEIGHT` (default 0.05) controls boost per keyword hit. Pure reorder, nothing dropped. Default off.\n- **Agent-aware tool descriptions:** All 15 MCP tool docstrings reviewed and tuned for AI agent consumption (Copilot, Claude Desktop, Antigravity). Agents now receive priority directives, anti-pattern hints, and \"when not to use\" guidance directly in the tool schema, reducing trial-and-error probing.\n- **`save_to_memory` docstring rewritten:** Leads with use-case semantics (*\"Manually save an architectural decision, fact, or technical note\"*) instead of implementation details. Adds explicit routing hint: *\"it's not for code files, `scan_workspace` handles those.\"*\n- **Priority directives now explicit:** `get_brief` says *\"Use this FIRST on any new workspace.\"* `query_memory` says *\"Use this BEFORE reasoning from priors.\"* `list_memory` warns *\"not to answer code questions, use `query_memory`.\"* `resolve_workspace` identifies itself as *\"a helper tool for agents that don't have filesystem context.\"*\n- **Irreversible operations flagged:** `merge_workspaces` and `purge_usage_data` both carry a *\"Cannot be undone\"* warning visible to the agent before execution.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n**\u003Csummary>Updates 2026-05-12\u003C\u002Fsummary>**\n\n- **Parallel brief synthesis:** All 9 brief sections now fire simultaneously via `asyncio.gather`. Brief generation dropped *from ~90 seconds to ~20-30 seconds*.\n- Skip bare `.py` files: New `SKIP_BARE_PY_FILES` toggle in `.env`. Skips `.py` files with no functions or classes (`admin.py`, `urls.py`, `settings.py`) to avoid DeepSeek calls on boilerplate.\n Default off.\n- **HTML comment indexing:** _extract_html now captures `\u003C!-- -->` comments as docstrings for the elements that follow. Comments are searchable and appear in inline source citations.\n- **Embedding-docstring skill:** Updated to cover HTML comments in addition to *Python*, *JavaScript*, and *TypeScript* docstrings.\n- **Brief timing corrected:** Status messages updated from \"about 90 seconds\" to \"about 20 seconds.\"\n- **Primary Conventions prompt tightened:** Briefs no longer include filler sections like Naming Conventions or Testing infrastructure.\n- **use_cloud default:** `synthesize_deep_brief` now defaults to cloud mode.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n**\u003Csummary>Update - 2026-05-11\u003C\u002Fsummary>**\n\n- **Inline source citations**: Every `query_memory` response prepends inline `#file:line (distance)` citations — plain text, cross-agent compatible, clickable in VS Code Copilot.\n- **Full docstrings embedded**: `_clean_docstring` no longer truncates to first sentence; the LLM sees complete function descriptions for richer answers.\n- **`show_sources` toggle**: Callers can enable or disable inline source citations per query; defaults to on.\n- **Fire-and-forget brief synthesis**: `scan_workspace` returns immediately; brief generates in background, no more MCP timeouts.\n- **Tighter distance threshold**: Default `QUERY_DISTANCE_THRESHOLD=1.0` in `.env`, eliminating false positives.\n- **Embedding-docstring skill**: A companion skill (`embedding-docstring\u002FSKILL.md`) that audits docstrings for embedding quality: technology naming, routing documentation, guarantees, and size limits.\n\n\u003C\u002Fdetails>\n\n---\n\n## How It Works\n\n```\nYour IDE  ──►  MCP Server (main.py)  ──►  ChromaDB (.brain\u002Fvector_db\u002F)\n                     │                         ↑ semantic retrieval\n                     ├── Ollama (local)    ─── Used in Hybrid & Local modes\n                     └── DeepSeek (cloud)  ─── Used in Hybrid & Cloud modes\n```\n\nWhen you ask your AI assistant a question:\n\n1. The MCP server receives the query.\n2. It performs a **vector search** against ChromaDB to retrieve the most relevant entities (function signatures, docstrings, file summaries) from your codebase, with an optional lexical re-rank if `ENABLE_LEXICAL_RERANK=true`.\n3. The auto-router decides whether to send the query to **Ollama** (local, free) or **DeepSeek** (cloud, billed).\n4. The synthesised answer is returned to your IDE ; enriched with workspace context, without bloating your chat window.\n\nYou never call MCP tools directly. You speak to your AI assistant in natural language and it calls the tools on your behalf.\n\n### Lexical Re-ranking\n\nWhen `ENABLE_LEXICAL_RERANK=true` in `.env`, a lightweight hybrid step runs between semantic retrieval and LLM synthesis. Pure vector search can miss exact function or class names that share vocabulary with unrelated code. The reranker boosts results where query keywords appear in entity names or docstrings, without dropping anything.\n\n```\nQuery → embed → ChromaDB top-N → lexical re-rank → LLM synthesis\n                                      ↑\n                           Weighted boost on keyword hits in\n                           entity name + docstring text.\n                           Nothing is dropped — pure reorder.\n```\n\n1. **Semantic retrieval** — ChromaDB returns the top-N results by L2 distance.\n2. **Distance threshold** — Results above `QUERY_DISTANCE_THRESHOLD` are dropped.\n3. **Lexical re-rank** — Survivors are re-scored: `(1\u002Fdist) + (keyword_hits ×\n   LEXICAL_RERANK_WEIGHT)`. A function named `extract_entities` with a matching\n   docstring outranks a generic file summary that happens to be semantically close.\n4. **LLM synthesis** — The reordered context is passed to DeepSeek or Ollama for\n   the final answer.\n\n### Workspace Identity\n\nYou do not specify your project name or path in chat. Your IDE automatically attaches metadata about your currently active workspace to every message. The server maintains a **Workspace Registry** (SQLite) that maps each workspace folder to a persistent UUID and human-friendly display name.\n\nThe AI assistant can resolve any workspace identifier: UUID, short-UUID, or display name, so you never pass raw file paths for routine queries.\n\n---\n\n## Cost Savings Explained\n\nDeepSeek is invoked in three places: query synthesis (when auto-routed for long or architectural queries), brief synthesis (9 section calls totalling ~ \\$0.003 per full regeneration), and file scanning when in cloud mode (~ \\$0.000167 per file). In hybrid mode, routine queries and file scans run on Ollama at \\$0. The Project Brief is a fixed prefix across queries, so DeepSeek caches it at \\$0.0028\u002FM tokens (hit) vs \\$0.14\u002FM (miss), 50x cheaper after the first query. Code files are parsed locally by tree-sitter at zero API cost regardless of mode. All IDEs share the same .brain\u002F directory, so context saved in one is instantly available in another with no re-explanation cost. Every query_memory response includes inline #file:line citations with entity name and L2 distance. This metadata is already stored during scanning at no extra API cost.\n\n---\n\n## Project Structure\n\n```\nzerikai_memory\u002F\n├── .brain\u002F                       # Created on first run: do NOT commit\n│   ├── server.log                # Rotating log file (5 MB cap, 2 backups)\n│   ├── zerikai.db                # SQLite: Workspace Registry & token tracking\n│   ├── vector_db\u002F                # ChromaDB: one sub-collection per workspace\n│   └── contexts\u002F                 # Per-workspace project briefs (.md files)\n├── .env                          # API keys: never commit\n├── .memignore                    # Files\u002Fdirs excluded from memory indexing\n├── agent_rules\u002F                  # IDE agent behavior rules (universal-brain usage)\n├── code_indexer.py               # Deterministic tree-sitter extraction logic\n├── config.py                     # Configuration & routing thresholds\n├── drop_memory.py                # Cleanup utility (registry + vectors + files)\n├── main.py                       # MCP server entry point\n├── requirements.txt\n└── skill\u002F                        # Companion skills (embedding-docstring, etc.)\n```\n\n---\n\n## Prerequisites\n\n| Dependency | Purpose | Link |\n|---|---|---|\n| Python 3.11+ | Runtime | [python.org](https:\u002F\u002Fpython.org) |\n| Ollama | Free local summarisation (hybrid\u002Flocal modes) | [ollama.com](https:\u002F\u002Follama.com) |\n| DeepSeek API key | Cloud synthesis (hybrid\u002Fcloud modes) | [platform.deepseek.com](https:\u002F\u002Fplatform.deepseek.com) |\n\n---\n\n## Installation\n\n### Step 1: Clone and create the virtual environment\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fyour-username\u002Fzerikai_memory.git\ncd zerikai_memory\n\n# Python 3.13+\npython -3.13 -m venv venv\n\n# Windows\n.\\venv\\Scripts\\activate\n\n# macOS \u002F Linux\nsource venv\u002Fbin\u002Factivate\n\npip install -r requirements.txt\n```\n\n### Step 2: Configure `.env`\n\nConfigure via `MEMORY_MODE` in your `.env` file.\n\n| Mode | LLM Strategy | Best For |\n|---|---|---|\n| `cloud` | DeepSeek for everything | **Recommended. Cheaper, no Ollama needed, best brief quality.** |\n| `hybrid` | Ollama (scans\u002Froutine) + DeepSeek (architecture\u002Fbriefs) | Privacy-sensitive users who want free local lookups |\n| `local` | Ollama for everything | 100% privacy & $0 cost, lower quality |\n\n**Recommendation:** Start with `cloud`. You only need a DeepSeek API key -- no Ollama installation, no GPU requirements, no local model management. DeepSeek v4-flash is cheap (\\$0.14\u002FM input tokens) and brief synthesis runs at ~\\$0.003 per full regeneration.\n\n**Get a DeepSeek API key at [platform.deepseek.com](https:\u002F\u002Fplatform.deepseek.com), then add it to `.env`:**\n\n\u003Cdetails>\n\n**\u003Csummary>Expand to view .env\u003C\u002Fsummary>**\n\n```.env\nDEEPSEEK_API_KEY=your_deepseek_key_here\n\n# Memory Mode controls which LLM is used for operations:\n# - \"cloud\": Use DeepSeek for all operations (scan, brief, queries) - highest quality, tracked usage\n# - \"hybrid\": Use Ollama for file scanning, DeepSeek for briefs and escalated queries\n# - \"local\": Use Ollama for everything (free, but lower quality briefs)\nMEMORY_MODE=cloud\n\n# Enable token tracking and cost reporting (SQLite database at .brain\u002Ftoken_usage.db)\n# Set to \"false\" to disable tracking\nENABLE_TOKEN_TRACKING=true\n\n# Enable deepseek-v4-pro for complex architectural queries (design, architecture, tradeoffs)\n# v4-pro is 3x more expensive than v4-flash (currently $0.435\u002FM vs $0.14\u002FM input)\n# After May 31 2026, v4-pro will be 6x more expensive ($1.74\u002FM vs $0.14\u002FM)\n# Recommended: keep this \"false\" unless you need maximum reasoning capability\nENABLE_DEEPSEEK_PRO=false\n\n# Semantic search relevance cutoff for query_memory (L2 distance).\n# Lower = stricter. Watch \"best dist=X.XX\" in server.log to calibrate.\n# Typical: \u003C0.8 strong match, 0.8-1.5 related, >1.5 noise.\nQUERY_DISTANCE_THRESHOLD=1.0\n\n# When True, .py files that produce zero tree-sitter entities (no functions or\n# classes found) are skipped during scanning instead of sent to DeepSeek for\n# LLM summarisation. Saves API calls on files like admin.py, urls.py, settings.py,\n# wsgi.py that have only variable assignments and module-level code.\n# Default: false (existing behaviour — all such files are LLM-summarised).\n# Set to \"true\" to skip them.\nSKIP_BARE_PY_FILES=true\n\n# Enable lexical re-ranking in query_memory.\n# When true, results passing the distance threshold are reordered by a\n# weighted combination of semantic distance and keyword overlap in entity\n# name and docstring text. Nothing is dropped — pure reorder.\n# Default: false (existing pure-semantic behaviour preserved).\nENABLE_LEXICAL_RERANK=true\n\n# Weight applied per keyword hit during lexical re-ranking.\n# The 1\u002Fdist spread across the valid-hit band (0.85–0.98) is ~0.156.\n# Keep this value below that spread to avoid keyword hits overriding\n# a genuinely closer semantic result.\n# Recommended starting point: 0.05 (one hit = +0.05, two hits = +0.10).\nLEXICAL_RERANK_WEIGHT=0.05\n```\n\n\u003C\u002Fdetails>\n\n### Step 3: Pull a local Ollama model (hybrid\u002Flocal mode only)\n\n**Note:** `OLLAMA_HOST` is optional. If your system has `OLLAMA_HOST=0.0.0.0` set (common on server installs), the server corrects it to `http:\u002F\u002F127.0.0.1:11434` for client connections.\n\nDownload and install [Ollama](https:\u002F\u002Follama.com) for your OS. Then pull a model:\n\n> Only required for `MEMORY_MODE=hybrid` or `MEMORY_MODE=local`. Not needed for cloud mode.\n\n### Step 4: Verify the installation\n\nOpen a terminal in your project root (virtual environment activated) and run:\n\n```bash\npython -c \"from main import scan_workspace, query_memory; print('OK')\"\n```\n\nYou should see the server startup banner followed by `OK`.\n\n---\n\n## IDE Registration\n\nThe server starts **once** when the IDE loads and stays running. Tool calls are messages to that process, there is no restart per call.\n\n### VS Code (Copilot \u002F Cline)\n\n1. Press `Ctrl+Shift+P` → **MCP: Add Local Server**\n2. Choose **STDIO**\n3. Set command to: `C:\\path\\to\\zerikai_memory\\venv\\Scripts\\python.exe C:\\path\\to\\zerikai_memory\\main.py`\n\n> Replace `C:\\path\\to\\zerikai_memory` with the actual absolute path. Double backslashes are required for valid JSON on Windows.\n\n**To see other registrations click on the collapsed section below:**\n\n\u003Cdetails>\n\n**\u003Csummary>Google Antigravity\u003C\u002Fsummary>**\n\nEdit `mcp_config.json` directly:\n\n```json\n\"universal-brain\": {\n  \"command\": \"C:\\\\path\\\\to\\\\zerikai_memory\\\\venv\\\\Scripts\\\\python.exe\",\n  \"args\": [\n    \"C:\\\\path\\\\to\\\\zerikai_memory\\\\main.py\"\n  ],\n  \"disabled\": false\n}\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n**\u003Csummary>Cursor\u003C\u002Fsummary>**\n\nAdd to `.cursor\u002Fmcp.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"universal-brain\": {\n      \"command\": \"\u002Fpath\u002Fto\u002Fzerikai_memory\u002Fvenv\u002Fbin\u002Fpython\",\n      \"args\": [\"\u002Fpath\u002Fto\u002Fzerikai_memory\u002Fmain.py\"]\n    }\n  }\n}\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n**\u003Csummary>Claude Desktop\u003C\u002Fsummary>**\n\n**Windows:** `%APPDATA%\\Claude\\claude_desktop_config.json`\n**macOS:** `~\u002FLibrary\u002FApplication Support\u002FClaude\u002Fclaude_desktop_config.json`\n\n```json\n{\n  \"mcpServers\": {\n    \"universal-brain\": {\n      \"command\": \"C:\\\\path\\\\to\\\\zerikai_memory\\\\venv\\\\Scripts\\\\python.exe\",\n      \"args\": [\n        \"C:\\\\path\\\\to\\\\zerikai_memory\\\\main.py\"\n      ]\n    }\n  }\n}\n```\n\n\u003C\u002Fdetails>\n\n**On macOS\u002FLinux, use forward slashes:** `\u002Fpath\u002Fto\u002Fzerikai_memory\u002Fvenv\u002Fbin\u002Fpython`\n\n### IDE Agent Rules\n\nAfter registering the MCP server, configure your IDE's agent instructions so it uses `universal-brain` correctly. The full guide lives at [`agent_rules\u002Fide_agent_rules.md`](agent_rules\u002Fide_agent_rules.md) with setup steps for VS Code Copilot, pi.dev, Google Antigravity, and Claude Desktop.\n\n**What the rules enforce:**\n\n- **Universal-Brain First** — Before any codebase exploration, the agent queries `universal-brain` via `query_memory`. Raw file searches are only used when memory has no relevant context, and the agent must state that it escalated.\n- **Source Discipline** — Every answer surfaces the full source citation (`file.py:line`) and confidence score from `universal-brain`. If memory has no answer, the agent says so instead of fabricating.\n\n> Apply these rules **after** registering the MCP server — they configure how the agent behaves once connected, not how it connects.\n\n---\n\n## Workspace Setup (per project)\n\n### 1. Setup the `.memignore` file\n\nWorks like `.gitignore`: one pattern per line. `scan_workspace` reads this file and skips matching paths.\n\nEach project should have its own `.memignore` in its root directory. Forgetting to configure it before the first scan is the most common reason to use `drop_memory.py` and start fresh:\n\n\u003Cdetails>\n\n**\u003Csummary>Sample .memignore\u003C\u002Fsummary>**\n\n```gitignore\n# Directories (trailing slash required)\n.git\u002F\nnode_modules\u002F\nvenv\u002F\n__pycache__\u002F\n.brain\u002F\ndist\u002F\nbuild\u002F\n\n# File\u002FFolder patterns\n**\u002Ftest\u002F\n**\u002Ftests\u002F\n.env\n*.log\n*.lock\n*.pyc\n```\n\n\u003C\u002Fdetails>\n\n### 2. Register a new project\n\nIn a new chat session in your IDE, call the universal-brain MCP and ask:\n\n```\n\"Set up memory for this project\"\n```\n\nThe assistant calls `init_workspace`, which registers the folder and creates a pending brief file at:\n\n```\n.brain\u002Fcontexts\u002F\u003Cworkspace_id>.md\n```\n\n> `init_workspace` is idempotent, running it multiple times is safe and returns the existing registration.\n\n### 3. Scan and index the workspace\n\nTell your assistant:\n\n```\n\"Scan and index the workspace.\"\n```\n\nThe assistant calls `scan_workspace`. This triggers the **Post-Scan Auto-Briefing**:\n\n1. Walks the directory (respecting `.memignore`). Supported code files are parsed into deterministic `tree-sitter` entities, while other text files get compressed summaries.\n2. Performs **iterative synthesis**: queries memory for up to 75 relevant `tree-sitter` nodes\u002Fsummaries per section across 9 project brief sections.\n3. Uses the auto-router (DeepSeek in Hybrid\u002FCloud modes) to synthesise a complete, accurate **Project Brief**.\n4. Saves the brief to `.brain\u002Fcontexts\u002F\u003Cworkspace_id>.md` and locks it to protect your DeepSeek KV cache prefix.\n\n> **Cache Stability Policy:** Normal daily scans do **not** regenerate the project brief. The brief is only generated on the first scan or when explicitly forced.\n\n### 4. Force a brief refresh (when needed)\n\nIf you make a major architectural pivot, tell your assistant:\n\n```\n\"Rescan the workspace and force a refresh of the project brief.\"\n```\n\nThe assistant calls `scan_workspace(force_refresh_brief=True)`.\n\n---\n\n## Day-to-Day Usage\n\n### Scan\n\n```\n\"Scan the workspace.\"\n```\n\n`scan_workspace` is **idempotent and self-cleaning**:\n\n- Uses deterministic hashing to overwrite existing file records (no duplicates).\n- Automatically **purges stale memories** for files deleted or added to `.memignore` since the last scan.\n- Does **not** regenerate the project brief, preserving your KV cache.\n\n### Common natural-language commands\n\n| You say | What happens |\n|---|---|\n| *\"Remember that we're using Redis for session caching\"* | `save_to_memory` is called |\n| *\"What did we decide about auth?\"* | `query_memory` → Ollama (local, instant) |\n| *\"Refactor the data layer, what are our constraints?\"* | `query_memory` → DeepSeek (auto-escalated) |\n| *\"List what's in memory for this project\"* | `list_memory` |\n| *\"What projects do you know about?\"* | `list_workspaces` |\n| *\"Show me the project brief.\"* | `get_brief` → displays `.brain\u002Fcontexts\u002F\u003Cid>.md` |\n\n### Retrieve the project brief\n\n```\n\"Show me the project brief.\"\n```\n\nThe assistant calls `get_brief`, which reads the `.md` file from `.brain\u002Fcontexts\u002F` and displays its content. If no brief exists, it suggests running `init_workspace` and `scan_workspace` first.\n\n---\n\n## Auto-Routing Reference\n\nRouting is fully automatic based on query characteristics. You can override it explicitly.\n\n| Condition | Engine | Cost |\n|---|---|---|\n| Short, specific query | Ollama | Free |\n| Query ≥ 40 words | DeepSeek v4-flash | \\$0.0028\u002FM cached tokens |\n| Contains architectural keywords (`refactor`, `architect`, `design`, `audit`…) | DeepSeek v4-pro | \\$0.003625\u002FM cached tokens (75% off until May 31) |\n| `use_cloud=True` (explicit override) | DeepSeek |: |\n| `use_cloud=False` (explicit override) | Ollama | Free |\n\n---\n\n## Project Brief Structure\n\nEach workspace gets an auto-generated project brief optimised for DeepSeek KV caching. The brief is 1,000–1,200 tokens, the sweet spot for cache stability and accuracy.\n\nThe brief is synthesised using **DeepSeek v4-flash** (or Ollama in local mode), generating 15 semantic search results per section for accuracy.\n\n### 9-Section Structure\n\n| # | Section | Content |\n|---|---|---|\n| 1 | **Overview** | High-level summary of type, purpose, and domain |\n| 2 | **Technical Stack** | Backend, Database, API integrations, Libraries |\n| 3 | **Core Architecture** | Frontend, Backend, Data\u002FProcessing layers |\n| 4 | **Primary Conventions** | Code, docs, error handling, and schema rules |\n| 5 | **Purpose** | Business problem solved and core objectives |\n| 6 | **Key Files & Directories** | Entry points and routers with specific purposes |\n| 7 | **Development & Testing** | Setup, running, testing, and deployment instructions |\n| 8 | **Data Flow & Request Lifecycle** | Request trace from entry point to data layer |\n| 9 | **Future Roadmap** | Planned features, improvements, and TODOs from code |\n\n**Benefits:**\n\n- 1,000–1,200 tokens → optimal cache stability.\n- 10× cost savings via DeepSeek cache hits (identical prefix across queries).\n- Semantic search friendly → accurate context retrieval.\n- Human-readable → can be manually reviewed and edited.\n\n---\n\n## MCP Tools Reference\n\nYou never call these tools directly, your AI assistant calls them based on your natural language instructions. This reference is for understanding what the server can do.\n\n### Workspace Management\n\n| Tool | Description |\n|---|---|\n| `init_workspace` | Registers a project folder, assigns a UUID, and creates a pending brief file. Idempotent; safe to run multiple times. |\n| `list_workspaces` | Lists all known workspaces that have a brief or stored memories. |\n| `resolve_workspace` | Resolves a workspace identifier (UUID, short-UUID, or display name) to its filesystem path. |\n| `merge_workspaces` | Consolidates duplicate workspace IDs into one. **Irreversible.** |\n| `debug_workspace_id` | Diagnostic tool; shows what workspace ID would be generated from a given path. |\n\n### Memory & Briefs\n\n| Tool | Description |\n|---|---|\n| `scan_workspace` | Walks the directory, respects `.memignore`, and saves all readable text files to persistent memory. Idempotent and self-cleaning. |\n| `save_to_memory` | Manually saves an architectural decision, fact, or technical note with an optional category tag. |\n| `list_memory` | Lists stored memories for a workspace, optionally filtered by category. |\n| `query_memory` | Retrieves relevant context via vector search and synthesises an answer via Ollama or DeepSeek (auto-routed). Returns inline `#file:line (distance)` citations — plain text that renders in every IDE, clickable in VS Code Copilot. Defaults to on; set `show_sources=False` for clean output. |\n| `get_brief` | Retrieves the current project brief from `.brain\u002Fcontexts\u002F`. |\n| `update_brief` | Manually updates the markdown content of a project brief. |\n\n### Usage & Diagnostics\n\n| Tool | Description |\n|---|---|\n| `get_token_usage` | Returns DeepSeek API token usage and cost statistics. |\n| `get_cost_report` | Generates a cost breakdown by operation type. |\n| `get_cache_stats` | Shows cache hit\u002Fmiss rates by operation type. |\n| `purge_usage_data` | Deletes historical token tracking records. |\n\n---\n\n## Monitoring & Logs\n\nAll server activity is written to **`.brain\u002Fserver.log`** with a 5 MB rotating cap and 2 rolling backups.\n\n### What is logged\n\n| Event | Level |\n|---|---|\n| Server startup (DB path, model, mode) | `INFO` |\n| Memory saved (workspace, category, preview) | `INFO` |\n| Auto-route decision (reason) | `INFO` |\n| DeepSeek cache hit \u002F miss stats | `INFO` |\n| `scan_workspace`: each file saved or skipped | `INFO` \u002F `DEBUG` |\n| Any tool failure | `ERROR` |\n\n### Live tail\n\n```powershell\n# Windows PowerShell\nGet-Content .brain\\server.log -Wait -Tail 30\n```\n\n```bash\n# macOS \u002F Linux\ntail -f .brain\u002Fserver.log\n```\n\n### Filter errors only\n\n```powershell\n# Windows PowerShell\nSelect-String -Path .brain\\server.log -Pattern \"ERROR\"\n```\n\n```bash\n# macOS \u002F Linux\ngrep \"ERROR\" .brain\u002Fserver.log\n```\n\n---\n\n## Auxiliary Scripts\n\n### `drop_memory.py`: Wipe a workspace\n\nUse this when you need to completely reset the AI's memory for a specific project, for example, if you forgot to configure `.memignore` before the first scan and the AI indexed a large `logs\u002F` directory.\n\nThe script deletes:\n\n- The ChromaDB vector collection for the workspace.\n- The associated `.brain\u002Fcontexts\u002F\u003Cworkspace_id>.md` brief file.\n- The workspace registry entry in `zerikai.db`.\n\n**Usage:**\n\n```bash\n# Windows\n.\\venv\\Scripts\\python.exe drop_memory.py \"Workspace Name\"\n# or by UUID\n.\\venv\\Scripts\\python.exe drop_memory.py workspace-uuid\n\n# macOS \u002F Linux\nvenv\u002Fbin\u002Fpython drop_memory.py \"Workspace Name\"\n```\n\nFind workspace names and IDs with `list_workspaces` or by listing `.brain\u002Fcontexts\u002F`.\n\nAfter wiping, fix your `.memignore`, then re-run `init_workspace` and `scan_workspace`.\n\n---\n\n## Embedding-Docstring Skill\n\nThe embedding-docstring skill (`embedding-docstring\u002FSKILL.md`) is a companion skill that helps maintain docstring quality across any codebase. It audits functions, methods, and classes for embedding-optimized docstrings that are rich, dense, and keyword-accurate so semantic search retrieves them correctly.\n\n### What it checks\n\n- **Technology names**: If the code imports `redis`, the docstring should say \"Redis\", not \"key-value store\". The embedding matches words, not concepts.\n- **Routing \u002F branches**: \"Uses tree-sitter for code files, falls back to LLM summarization\": decision logic must be documented.\n- **Guarantees**: Idempotency, atomicity, ordering, or \"no guarantees\" stated explicitly.\n- **Side effects**: What the function writes, calls, or mutates beyond its return value.\n- **Size limit**: Prose body above `Args:`\u002F`Returns:` capped at 4 lines or 400 characters, whichever is shorter.\n\n### How to use it\n\nIn any workspace, tell your assistant:\n\n```\naudit docstrings in api_handler.py using the embedding-docstring skill\n```\n\nor for a single function:\n\n```\noptimize the docstring for authenticate_user for vector search\n```\n\nThe skill reads the source, applies the checklist, flags violations with line numbers, and proposes before\u002Fafter diffs for approval. It works with Python, JavaScript, TypeScript, and any language with docstring conventions.\n\n### Why it exists\n\nDocstrings that are too short, too vague, or missing technology names starve semantic search. The LLM can only synthesize from what's embedded. The skill ensures every docstring carries enough keyword density to be findable.\n\n## DeepSeek KV Cache Optimisation\n\nCaching is **automatic**, no flags required.\n\nThe server structures every API call to maximise hit rate:\n\n- **System message** = fixed role instruction + stable project brief. This prefix is identical on every call for the same workspace → cached after the first call of a session.\n- **User message** = retrieved context snippets + query. This changes every call → never cached (by design).\n\nA well-populated 600-token project brief means paying **\\$0.0028\u002FM tokens** (cache hit) instead of **\\$0.14\u002FM** (cache miss) on your largest token block, a **50× saving** on every query after the first, using v4-flash pricing.\n\n> **Cache Protection:** Do not force-refresh the project brief during normal development. The brief is intentionally locked after the first scan to keep the system message prefix identical and cache hits active.\n\n---\n\n## Security & Data Privacy\n\nAll memory data and API keys stay on your machine.\n\n### `.gitignore` requirements\n\n```gitignore\n.env       # Contains DEEPSEEK_API_KEY\n.brain\u002F    # Contains local vector DB and project briefs\n```\n\n> **Warning:** Never commit your `.brain\u002F` folder or `.env` file to version control.\n\n### Key security properties\n\n- **Workspace isolation:** Each project gets its own ChromaDB sub-collection, separate SQLite records, and a separate brief file. Queries for one workspace never return data from another.\n- **Deterministic hashing:** File records use deterministic IDs, re-scanning overwrites existing records rather than duplicating them.\n- **Local-first by default:** In `hybrid` and `local` modes, the majority of operations never leave your machine.\n\n---\n\n## Troubleshooting\n\n### The server fails to start\n\n**Symptoms:** IDE shows MCP connection error; no startup banner in logs.\n\n**Check:**\n\n1. Virtual environment is activated and `pip install -r requirements.txt` completed without errors.\n2. `.env` exists with a valid `DEEPSEEK_API_KEY` (even in `local` mode, the file must exist).\n3. Python version is 3.11+: `python --version`\n4. Path to `main.py` in IDE config uses absolute paths and correct separators for your OS.\n\n---\n\n### Ollama not responding\n\n**Symptoms:** `hybrid` or `local` mode queries fail or time out.\n\n**Check:**\n\n1. Ollama is running: open a browser to `http:\u002F\u002F127.0.0.1:11434`, you should see a response.\n2. The model is pulled: `ollama list` should show `llama3.2` (or your configured model).\n3. If your system has `OLLAMA_HOST=0.0.0.0` set as an environment variable, the server corrects this automatically. If issues persist, unset it or set it explicitly to `http:\u002F\u002F127.0.0.1:11434`.\n\n---\n\n### Memory is stale or contains irrelevant files\n\n**Symptoms:** The AI recalls information from deleted files or ignored directories.\n\n**Solution:** `scan_workspace` is self-cleaning, run it again and it will automatically purge stale records. If the problem is structural (wrong files indexed from the start), use `drop_memory.py` to wipe and re-index with a corrected `.memignore`.\n\n---\n\n### DeepSeek cache hit rate is low\n\n**Symptoms:** `get_cache_stats` shows a high miss rate; costs are higher than expected.\n\n**Causes:**\n\n- The project brief was recently force-refreshed, resetting the cache prefix.\n- `ENABLE_DEEPSEEK_PRO=true` is set: v4-pro and v4-flash have separate caches.\n- The first query of a new session is always a miss (the cache warms on subsequent calls).\n\n**Fix:** Avoid force-refreshing the brief unless you have made a major architectural change. Keep `ENABLE_DEEPSEEK_PRO=false` unless you specifically need pro-level reasoning.\n\n---\n\n### Duplicate workspace IDs\n\n**Symptoms:** `list_workspaces` shows the same project registered under multiple UUIDs (common when the project is opened from different paths, e.g., absolute vs. relative).\n\n**Fix:**\n\n```\n\"Merge workspaces \u003Csource-uuid> into \u003Ctarget-uuid>\"\n```\n\nThe assistant calls `merge_workspaces`. This is **irreversible**; the source workspace is deleted after the merge.\n\nvisit [zerikai for more](http:\u002F\u002Fzerikai.com)\n\n---\n\n## Notice\n\n>This project is provided \"as-is\" for personal use\u002Freference. I am not accepting Pull Requests or code contributions at this time. AI-generated PRs will be closed and users may be blocked.\n\n---\n\n## License\n\nMIT\n","Zerikai Memory 是一个本地独立的 Python MCP 服务器，为任何支持 MCP 服务器的 IDE 提供持久化、工作区隔离的记忆功能。它通过结合 ChromaDB（本地向量存储）、Ollama（免费本地摘要）和 DeepSeek（云合成）来实现自动成本感知路由，从而使得开发过程中 AI 助手能够快速回忆之前的决策、理解代码库，并在不同 IDE 间无缝切换时保持上下文一致，显著降低 API 成本。此项目适用于需要增强 IDE 内 AI 辅助能力的开发者，特别是那些希望减少重复解释项目背景和节省云端资源开销的场景。","2026-06-11 04:03:08","CREATED_QUERY"]