[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-84202":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":15,"stars7d":16,"stars30d":16,"stars90d":13,"forks30d":13,"starsTrendScore":17,"compositeScore":13,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":10,"trendingCount":13,"starSnapshotCount":13,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},84202,"llm-wiki-memory","ctxr-dev\u002Fllm-wiki-memory","ctxr-dev","Local, git-versioned memory for AI coding agents. No RAG, no Docker, no external service. Capture, compile, recall over a local LLM wiki with on-device embeddings and an MCP server.","",null,"JavaScript",75,0,55,10,20,40,"MIT License",false,"main",true,[23,24,25,26,27,28,29,30,31,32],"agent-memory","andrej-karpathy","andrej-karpathy-llm-wiki","andrej-karpathy-skills","claude-code","embeddings","llm-wiki","memory","rag","semantic-search","2026-06-12 02:04:38","\u003Cdiv align=\"center\">\n\n![LLM Wiki Memory](docs\u002Fassets\u002Fbanner.svg)\n\n### Persistent local memory for AI coding agents. Your agent remembers every session, learns from its mistakes, and gets smarter the longer you work with it.\n\nClaude Code, Cursor, Codex, and every other MCP client forget everything when a session ends. LLM Wiki Memory fixes that: it captures your conversations, compiles them into durable project knowledge and lessons your agent applies next time, and recalls the right context through a local MCP server. Memory lives on your machine as plain Markdown in an [LLM wiki](https:\u002F\u002Fgithub.com\u002Fctxr-dev\u002Fskill-llm-wiki) versioned in git, searched with local embeddings, and consolidated offline while you sleep.\n\n\u003Csamp>\u003Cb>No RAG stack. No vector database. No Docker. No cloud. Install with one prompt and your agent never starts from zero again.\u003C\u002Fb>\u003C\u002Fsamp>\n\n\u003Cbr\u002F>\n\u003Cbr\u002F>\n\n[![tests](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTESTS-925_PASSING-0D0D14?style=for-the-badge&labelColor=5EFFC0)](#testing)\n[![node](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNODE-%E2%89%A5_20-0D0D14?style=for-the-badge&logo=nodedotjs&logoColor=0D0D14&labelColor=5EF6FF)](https:\u002F\u002Fnodejs.org)\n[![license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLICENSE-MIT-0D0D14?style=for-the-badge&labelColor=FCEE0A)](LICENSE)\n[![MCP](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMCP-STDIO_SERVER-0D0D14?style=for-the-badge&logo=anthropic&logoColor=0D0D14&labelColor=5EF6FF)](https:\u002F\u002Fmodelcontextprotocol.io)\n\n[![recall](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRECALL-BGE_ON--DEVICE-0D0D14?style=for-the-badge&labelColor=FCEE0A)](https:\u002F\u002Fhuggingface.co\u002FXenova\u002Fbge-large-en-v1.5)\n[![infra](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FZERO_INFRA-NO_DOCKER_%C2%B7_NO_RAG-0D0D14?style=for-the-badge&labelColor=FF003C)](#why-a-wiki-instead-of-rag)\n[![built on](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBUILT_ON-%40ctxr%2Fskill--llm--wiki-0D0D14?style=for-the-badge&labelColor=5EF6FF)](https:\u002F\u002Fgithub.com\u002Fctxr-dev\u002Fskill-llm-wiki)\n[![github stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fctxr-dev\u002Fllm-wiki-memory?style=for-the-badge&logo=github&logoColor=0D0D14&color=0D0D14&labelColor=FCEE0A&label=STARS)](https:\u002F\u002Fgithub.com\u002Fctxr-dev\u002Fllm-wiki-memory\u002Fstargazers)\n\n\u003C\u002Fdiv>\n\n## Install\n![](docs\u002Fassets\u002Fline-bold.svg)\n\nPaste this one-liner into your AI coding agent (copy button on the right) — it covers both a **fresh install** and an **update** of an existing one. The full procedure lives in [`AI-INSTALL-PROMPT.md`](AI-INSTALL-PROMPT.md); the agent fetches and follows it:\n\n```text\nSet up llm-wiki-memory in this project: fetch https:\u002F\u002Fraw.githubusercontent.com\u002Fctxr-dev\u002Fllm-wiki-memory\u002Fmain\u002FAI-INSTALL-PROMPT.md and follow it EXACTLY (it covers fresh install and update; if already installed, the same file is local at @.llm-wiki-memory\u002Fsrc\u002FAI-INSTALL-PROMPT.md).\n```\n\nOr run it yourself — fresh install:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fctxr-dev\u002Fllm-wiki-memory .\u002F.llm-wiki-memory\u002Fsrc\n.\u002F.llm-wiki-memory\u002Fsrc\u002Fbootstrap.sh                    # add --commit-memory to commit the wiki\n.\u002F.llm-wiki-memory\u002Fsrc\u002Fbootstrap.sh --schedule daily   # optional: hourly cron \u002F launchd\n```\n\nUpdate an existing install:\n\n```bash\ngit -C .llm-wiki-memory\u002Fsrc fetch origin\n# Runbooks you have NOT applied yet — READ THESE FIRST, oldest → newest:\ngit -C .llm-wiki-memory\u002Fsrc diff --name-only HEAD origin\u002Fmain -- docs\u002Freleases | grep 'update-prompt\\.md$' | sort\ngit -C .llm-wiki-memory\u002Fsrc merge --ff-only origin\u002Fmain\n( cd .llm-wiki-memory\u002Fsrc && npm install --no-audit --no-fund )\n.\u002F.llm-wiki-memory\u002Fsrc\u002Fbootstrap.sh   # idempotent; runbooks may add one-shot steps + verification\n```\n\nThe bootstrap is **idempotent** — re-running preserves your edits to `.env` and your rule files.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>What bootstrap does (8 steps)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n1. Installs dependencies in `.\u002F.llm-wiki-memory\u002Fsrc`.\n2. Auto-detects the LLM provider: `claude` CLI → `codex` CLI → `ANTHROPIC_API_KEY` → `OPENAI_API_KEY` → `MEMORY_LLM_BASE_URL` → ollama at `:11434` → `mock` (with a stderr warning).\n3. Writes `.\u002F.llm-wiki-memory\u002Fsettings\u002F.env` (preserves your edits on re-run).\n4. Merges hooks into `.claude\u002Fsettings.json` and the stdio server into `.mcp.json`.\n5. Renders vendor-neutral configs into `.agents\u002F` and discipline rules into `.agents\u002Frules\u002F`, `.claude\u002Fskills\u002F`, `.claude\u002Frules\u002F`, `.cursor\u002Frules\u002F`.\n6. Materialises the hosted wiki at `.\u002F.llm-wiki-memory\u002Fwiki` (with the layout template that declares `consolidate: refine | none` per category) and validates it.\n7. Adds `\u002F.llm-wiki-memory` to `.gitignore` (`--commit-memory` commits the wiki instead).\n8. Optionally installs the hourly compile + consolidate cron via a wrapper script (`--schedule daily`).\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Register with a non-Claude client\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n```bash\n.\u002F.llm-wiki-memory\u002Fsrc\u002Fscripts\u002Fmcp-config.sh cursor          # .cursor\u002Fmcp.json\n.\u002F.llm-wiki-memory\u002Fsrc\u002Fscripts\u002Fmcp-config.sh codex           # ~\u002F.codex\u002Fconfig.toml\n.\u002F.llm-wiki-memory\u002Fsrc\u002Fscripts\u002Fmcp-config.sh claude-desktop  # claude_desktop_config.json\n.\u002F.llm-wiki-memory\u002Fsrc\u002Fscripts\u002Fmcp-config.sh all\n```\n\n\u003C\u002Fdetails>\n\n## Highlights\n![](docs\u002Fassets\u002Fline-bold.svg)\n\n![01](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F01-ZERO_INFRA-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**Everything lives in a local `.llm-wiki-memory\u002F` folder.** No vector DB, no container, no API service to run.\n\n![02](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F02-GIT_VERSIONED-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**Every memory is a markdown leaf with full history**, maintained by [`@ctxr\u002Fskill-llm-wiki`](https:\u002F\u002Fgithub.com\u002Fctxr-dev\u002Fskill-llm-wiki). Every change commits itself to the wiki's own repo with what, when, and why in the message (one commit per save, flush, compile, or consolidate run), so `git log` alone explains how your memory evolved. Disable via `wiki.autoCommit`; your project repo is never touched.\n\n![03](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F03-WRITE_GATED-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**Self-improvement lessons save only with explicit user consent.** Three layers of enforcement: discipline instructions, a Claude Code hook enabled by default (disable via `gate.claudeHookEnabled`), and an airtight MCP server-side gate (covers Cursor, Codex, generic clients).\n\n![04](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F04-FAULT_TOLERANT-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**Long sessions are chunked and distilled in pieces** (header-aware → paragraph fallback → hard cut), so a 100K-char transcript never single-passes its way into a CLI timeout. Failed runs persist a full-body stash + structured audit; one `cli.mjs redistill` retries with no data loss.\n\n![05](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F05-PROVIDER_CHAIN-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**A YAML-declared provider chain** (anthropic API → openai API → claude CLI → codex CLI → cursor CLI) and per-provider model fallback lists let a deprecated model or a missing CLI cascade automatically — without inlining model names in code.\n\n![06](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F06-OFFLINE_CONSOLIDATION-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**An hourly cron + a search-driven orchestrator** deduplicate near-identical leaves, archive stale entries, and optionally rewrite bodies via the same LLM the rest of the pipeline uses. Never hard-deletes; always reversible.\n\n![07](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F07-SELF_HEALING-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**Health is judged per entity, not per run**: every cron tick keeps a slim attempt entry (last `consolidate.attemptsKeep` runs) plus a full sharded log under `state\u002Flogs\u002F\u003Cyyyy>\u002F\u003Cmm>\u002F` for deep diagnosis. A failure that resolves on a later tick stays silent; an entity still failing after `consolidate.escalateAfterAttempts` consecutive runs (or one error signature recurring across many entities) escalates into a redacted skeleton issue report at `issues\u002F\u003Cyyyy>\u002F\u003Cmm>\u002F\u003Cdd>\u002F\u003Csignature>.\u003Cversion>.md` that your next session surfaces and offers to investigate — ready to copy upstream or turn into a fix PR.\n\n![08](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F08-LOCAL_RECALL-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**Transformer embeddings rank queries on-device** (default `Xenova\u002Fbge-large-en-v1.5`). One setting swaps in a lighter model — or falls back to a lexical scorer with no model download.\n\n![09](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F09-LAYOUT_DECLARED-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**Every category declares its consolidation eligibility** in `\u003Cwiki>\u002F.layout\u002Flayout.yaml` (`consolidate: refine | none`). No magic defaults — author intent is always in plain view.\n\n![10](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F10-ONE_PROMPT_INSTALL-0D0D14?style=flat-square&labelColor=FCEE0A)\n\n**Paste one prompt into your agent or run one script.** Idempotent.\n\n## Why a wiki instead of RAG\n![](docs\u002Fassets\u002Fline-bold.svg)\n\nRAG memory stacks are powerful but heavy: a vector database, a container, an embedding service, ongoing ops. For small and medium projects that overhead is rarely worth it, yet you still want the agent to remember everything and improve itself across sessions.\n\n`llm-wiki-memory` gives you that loop with a local hosted wiki as the substrate. Every category stays a nested tree (never a flat pile of files): non-daily categories nest by the metadata facets you search by; daily by date; an additional `subject` axis scatters leaves by what they're about. Git history and validation come free, and the tree stays readable by humans. Recall runs on local embeddings — nothing leaves your machine.\n\n## How it works\n![](docs\u002Fassets\u002Fline-bold.svg)\n\n```mermaid\n%%{init: {\"theme\":\"base\",\"flowchart\":{\"curve\":\"linear\"},\"themeVariables\":{\"lineColor\":\"#00B8C4\",\"primaryColor\":\"#0D0D14\",\"primaryTextColor\":\"#FCEE0A\",\"primaryBorderColor\":\"#FCEE0A\",\"secondaryColor\":\"#16161E\",\"tertiaryColor\":\"#16161E\",\"clusterBkg\":\"#16161E\",\"clusterBorder\":\"#00B8C4\",\"edgeLabelBackground\":\"#0D0D14\",\"textColor\":\"#00B8C4\"}}}%%\nflowchart TD\n    S[AI session]\n    S -- \"pre\u002Fpost-compact, session-end hooks\" --> FL[flush: extract typed atoms]\n    S -- \"ExitPlanMode hook\" --> PL[plans tree]\n    FL --> DA[daily tree]\n\n    DA -- \"hourly cron-job + session-start hook\" --> CMP[compile: promote daily atoms]\n    CMP --> KSI[knowledge + self_improvement trees]\n    CMP -. supersedes daily source .-> DA\n\n    KSI -- \"hourly cron-job + skill rule\" --> CN[consolidate: search-driven refinement]\n    CN --> MG[dedup + LLM merge near-duplicates]\n    CN --> RF[staleness + LLM semantic refresh]\n    CN --> HK[orphan \u002F compress \u002F GC \u002F index]\n    MG --> KSI\n    RF --> KSI\n    HK --> KSI\n\n    CR[hourly cron tick] --> LG[state\u002F.consolidate-attempts.log]\n    LG -. \"cron-health surfaces unresolved errors\" .-> S\n\n    AG[Agent recall calls] --> EM[embed.mjs: local embeddings]\n    EM --> KSI\n    AG --> PL\n```\n\n**The loop in one sentence:** session hooks capture typed atoms into `daily\u002F`; the hourly cron promotes them into `knowledge\u002F` and `self_improvement\u002F` (compile) and then refines those trees over time (consolidate); every recall hits the same embedding index; every cron attempt logs its outcome so the next session can surface unresolved failures.\n\n## Capture pipeline — chunked & recoverable\n![](docs\u002Fassets\u002Fline-bold.svg)\n\nThe flush worker (PostCompact \u002F SessionEnd hooks) chunks oversized transcripts and runs each chunk through a provider\u002Fmodel chain. A clean \"nothing durable\" verdict writes **no leaf at all** (the breadcrumb log keeps visibility); a partial or total failure preserves the full body to a stash so `cli.mjs redistill` can re-attempt later with no data loss.\n\n```mermaid\n%%{init: {\"theme\":\"base\",\"flowchart\":{\"curve\":\"linear\"},\"themeVariables\":{\"lineColor\":\"#00B8C4\",\"primaryColor\":\"#0D0D14\",\"primaryTextColor\":\"#FCEE0A\",\"primaryBorderColor\":\"#FCEE0A\",\"secondaryColor\":\"#16161E\",\"tertiaryColor\":\"#16161E\",\"clusterBkg\":\"#16161E\",\"clusterBorder\":\"#00B8C4\",\"edgeLabelBackground\":\"#0D0D14\",\"textColor\":\"#00B8C4\"}}}%%\nflowchart TD\n    SRC[\"source.body\u003Cbr\u002F>(redacted, ≤MAX_CHARS)\"]\n    SRC --> CK{\"size > chunk\u003Cbr\u002F>threshold?\"}\n    CK -- no --> SP[single-pass distill]\n    CK -- yes --> CH[\"chunk by:\u003Cbr\u002F>1. ### User\u002FAssistant headers\u003Cbr\u002F>2. paragraph breaks\u003Cbr\u002F>3. hard cut (last resort)\"]\n    CH --> MAP[\"map: distill each chunk\u003Cbr\u002F>via provider chain\"]\n    MAP --> RED[\"reduce: LLM merge atoms\u003Cbr\u002F>(depth-capped, deterministic fallback)\"]\n    SP --> WR[\"write daily leaf\u003Cbr\u002F>+ audit frontmatter\"]\n    RED --> WR\n    MAP -.->|any chunk failed| STASH[\"state\u002Ffailed-distill-*.json\u003Cbr\u002F>(full body + audit)\"]\n    MAP -.->|all chunks failed| RAW[\"raw-fallback leaf\u003Cbr\u002F>(FULL body, fenced as UNTRUSTED)\"]\n    STASH -.->|\"cli.mjs redistill\"| CH\n```\n\nThe audit fields recorded on every leaf — `chunks_total`, `chunks_succeeded`, `failed_chunks`, `provider_chain_tried`, `final_provider` — make every distillation reproducible from frontmatter alone. Redistilled leaves carry `redistilled_from`, `redistill_attempts`, and `original_outcome`.\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Failure mode | What used to happen | What happens now |\n| --- | --- | --- |\n| One CLI call exceeds 120 s | Whole session lost; last 8 K tail preserved in a non-recoverable leaf | Each chunk has its own budget; failed chunk(s) stashed for retry |\n| Model deprecated mid-run | Hard fail (the `claude-sonnet-4-X` string was inlined in code) | Provider's model list iterates to the next entry; if exhausted, chain moves to next provider |\n| `claude` \u002F `codex` CLI not installed | Hard fail | Chain transparently fast-fails to the next provider |\n| Distillation produced no atoms | \"nothing-durable\" marker file written | **No leaf written.** Breadcrumb log only |\n| Redistill races a live worker | Both writers raced → one silently overwrote the other; stash deleted | Per-session lock → `ESESSIONBUSY`; stash preserved |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n## Memory write-gate (read-freely, write-gated)\n![](docs\u002Fassets\u002Fline-bold.svg)\n\nSelf-improvement lessons are **propose-then-confirm**: the agent NEVER calls `save_lesson` (or `save_to_dataset(dataset=\"self_improvement\", ...)` \u002F `write_memory(datasetId=\"self_improvement\", ...)`) on its own. It proposes the save in chat, waits for an explicit user yes in the same turn, then calls the tool with `userRequested: true`. The server refuses gated writes without the flag.\n\nThree enforcement layers, defence-in-depth:\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Layer | Where | What it does | Why |\n| --- | --- | --- | --- |\n| **Instructions (probabilistic)** | MCP `initialize` + rule files in `.agents\u002Frules\u002F`, `.claude\u002Frules\u002F`, `.cursor\u002Frules\u002F` | Tells the model the rule, the wording to propose, and the consent contract. | Reaches *every* MCP client (Claude Code, Cursor, Codex, generic). Not airtight on its own — the model could still ignore it — which is why the next two layers exist. |\n| **Claude Code hook (deterministic, Claude Code only)** | `PreToolUse` hook on the three gated writers; enabled by default, `gate.claudeHookEnabled: false` makes it a no-op | Inspects the latest user turn for explicit save phrases. Matches → `allow`. No match → `ask` (Claude Code prompts the user yes\u002Fno). Also denies direct `Write`\u002F`Edit` to `~\u002F.claude\u002Fprojects\u002F\u003Cworkspace>\u002Fmemory\u002F`. | Stops a mis-instructed model BEFORE the call leaves the client. Adds a one-click user gate when needed. |\n| **MCP server-side gate (deterministic, every client)** | `save_lesson` \u002F `save_to_dataset` \u002F `write_memory` handlers in the local stdio MCP server | Refuses calls without `userRequested: true`. Also refuses when `path:` lands the write under `self_improvement\u002F...` from a non-gated `dataset:` claim (closes the path-bypass). | The airtight bottom layer. Works for Cursor, Codex, Claude Desktop, generic MCP clients — they don't have hooks, so the server is the only deterministic checkpoint. |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n**Reconciliation:** layers are independent and additive. Any one of them can refuse a save. The model can NOT bypass them: it can't suppress the discipline (sent at `initialize`), can't disable the Claude Code hook from inside a tool call, and can't forge the `userRequested` flag (the only legitimate-bypass path is the internal `withSystemMaintenance` async frame that consolidate uses for its own bookkeeping — entered only by the orchestrator's own code, never by a client request body).\n\nKnowledge, plans, investigations, daily, and tracker-issue writes are **not** gated — their routing rules apply directly. Set `gate.selfImprovementEnabled: false` in `settings.yaml` to disable the server-side check as an operator escape hatch (the other two layers still apply). Set `gate.claudeHookEnabled: false` to disable the Claude Code hook the same way: it exits with no decision and the normal permission flow applies.\n\n## Consolidate (offline refinement)\n![](docs\u002Fassets\u002Fline-bold.svg)\n\nThe `consolidate` orchestrator runs hourly via the cron (chained after `compile`) and at session end via a hook-less skill rule. It walks the **layout-declared** `consolidate: refine` categories and refines each leaf against its similarity cluster.\n\n### Where local-embedding runs · where LLM runs · why each pass exists\n\n```mermaid\n%%{init: {\"theme\":\"base\",\"flowchart\":{\"curve\":\"linear\"},\"themeVariables\":{\"lineColor\":\"#00B8C4\",\"primaryColor\":\"#0D0D14\",\"primaryTextColor\":\"#FCEE0A\",\"primaryBorderColor\":\"#FCEE0A\",\"secondaryColor\":\"#16161E\",\"tertiaryColor\":\"#16161E\",\"clusterBkg\":\"#16161E\",\"clusterBorder\":\"#00B8C4\",\"edgeLabelBackground\":\"#0D0D14\",\"textColor\":\"#00B8C4\"}}}%%\nflowchart TD\n    A[\"active leaves in layout-declared\u003Cbr\u002F>'consolidate: refine' categories\"]\n    A --> B{\"per-leaf loop\"}\n\n    B -- \"(1) LOCAL EMBEDDING\u003Cbr\u002F>searchMemoryFiltered + cosine\" --> C[\"similarity cluster\u003Cbr\u002F>top-K above scoreThreshold\"]\n\n    C --> D[\"dedup-by-sha256\u003Cbr\u002F>(deterministic, exact body hash)\"]\n    C --> E[\"dedup-by-lesson-key\u003Cbr\u002F>(deterministic, error_pattern key)\"]\n    C --> F[\"dedup-by-cosine\u003Cbr\u002F>(deterministic, cosine ≥ 0.97)\"]\n    D --> G[\" (keeper, loser) candidates \"]\n    E --> G\n    F --> G\n\n    G -- \"(2) LLM CALL (if provider available)\" --> H[\"llm-merge-near-duplicates\u003Cbr\u002F>LLM rewrites keeper body from both inputs\"]\n    G -- \"LLM disabled \u002F unreachable\" --> I[\"archive loser as-is (deterministic fallback)\"]\n    H --> I\n\n    I --> J{more leaves?}\n    J -- yes --> B\n    J -- no --> K[\"corpus passes\"]\n\n    K --> L[\"staleness-flag\u003Cbr\u002F>(deterministic, atom-type + age)\"]\n    L -- \"(3) LLM CALL (if provider available)\" --> M[\"llm-semantic-refresh\u003Cbr\u002F>LLM: keep \u002F rewrite \u002F archive\"]\n    L -- \"LLM disabled\" --> N[\"leave flag for next run\"]\n\n    M --> O[\"prune-orphan-leaves\u003Cbr\u002F>(deterministic, no inbound link + old)\"]\n    N --> O\n    O --> P[\"compress-archived\u003Cbr\u002F>(deterministic, preserves sha256)\"]\n    P --> Q[\"housekeeping\u003Cbr\u002F>prune-empty-ancestors \u002F gc-embeddings \u002F index-rebuild\"]\n```\n\n**(1) Local embedding** lights up only inside the per-leaf cluster lookup. The bge model runs on-device; nothing leaves your machine to find which leaves are similar. Cosine similarity (a pure math op) then ranks the cluster — also local.\n\n**(2) LLM call · merge near-duplicates** runs once per `(keeper, loser)` pair found by any of the three dedup passes — but only when an LLM provider is reachable. The LLM sees both bodies + frontmatter, decides whether to merge them into one fresher body or leave the keeper as-is. If the provider is missing or the call fails, consolidate falls back to \"archive the loser unchanged\" so the run never blocks.\n\n**(3) LLM call · semantic refresh** runs once per stale-flagged leaf, capped at `consolidate.refreshMaxPerRun`. The LLM sees the leaf + its current cluster context and chooses keep \u002F rewrite \u002F archive. The deterministic staleness-flag pass nominates candidates; the LLM only acts when it can.\n\nWhy each pass:\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Pass | Why it exists |\n| --- | --- |\n| `dedupe-by-sha256` | Same file content was written twice (race between compile runs, manual re-save). Cheapest dedup. |\n| `dedupe-by-lesson-key` | Same failure pattern logged with different wording. Catches semantic duplicates the byte hash misses. |\n| `dedupe-by-cosine` | Near-paraphrases that drifted across edits. The cosine-against-cluster check is the safety net for \"we already said this\". |\n| `llm-merge-near-duplicates` | When two leaves overlap, the keeper shouldn't just survive — it should be the synthesis of both. The LLM produces that synthesis from the structured pair. |\n| `staleness-flag` | Long-untouched leaves are candidates for review. The flag is the deterministic gate to a more expensive LLM revisit. |\n| `llm-semantic-refresh` | A bug-root-cause may be fixed; a feedback-rule may be reversed. The LLM judges current relevance against fresh context and updates the leaf accordingly. |\n| `prune-orphan-leaves` | Leaves with no inbound link and no recall hits in a year contribute noise to recall. Archive (reversibly). |\n| `compress-archived` | An archived body sitting in git forever is dead weight; truncate to the gist + footer pointing at the original sha256 in frontmatter. |\n| `prune-empty-ancestors` \u002F `gc-embeddings` \u002F `index-rebuild` | Structural hygiene. Empty dirs, orphan embedding-cache entries, ancestor `index.md` regens. |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n### Keeping knowledge accurate as your code drifts\n\nA memory store that only ever GROWS becomes a graveyard. Bug root-causes get fixed permanently. Feedback rules get reversed. Pattern-gotchas survive an API rename and start pointing at functions that no longer exist. Without a way to revisit aged knowledge, recall starts surfacing leaves that contradict the current codebase — and your agent confidently gives advice that was correct two quarters ago.\n\n`consolidate`'s answer is a deliberate two-step pipeline. The cheap deterministic step nominates candidates; the expensive LLM step judges them.\n\n```mermaid\n%%{init: {\"theme\":\"base\",\"flowchart\":{\"curve\":\"linear\"},\"themeVariables\":{\"lineColor\":\"#00B8C4\",\"primaryColor\":\"#0D0D14\",\"primaryTextColor\":\"#FCEE0A\",\"primaryBorderColor\":\"#FCEE0A\",\"secondaryColor\":\"#16161E\",\"tertiaryColor\":\"#16161E\",\"clusterBkg\":\"#16161E\",\"clusterBorder\":\"#00B8C4\",\"edgeLabelBackground\":\"#0D0D14\",\"textColor\":\"#00B8C4\"}}}%%\nflowchart TD\n    A[\"all active leaves in\u003Cbr\u002F>refine-eligible categories\"]\n    A --> B{\"atom_type eligible\u003Cbr\u002F>(self-improvement-lesson \u002F bug-root-cause \u002F\u003Cbr\u002F>feedback-rule \u002F pattern-gotcha)\u003Cbr\u002F>AND last_recalled_at &gt; N months?\"}\n    B -- \"no\" --> SKIP[\"leave as-is\"]\n    B -- \"yes (deterministic, ~1ms\u002Fleaf)\" --> F[\"memory.stale = true\u003Cbr\u002F>(reversible — clears on next recall)\"]\n    F --> CAP{\"first N of stale-flagged\u003Cbr\u002F>(sorted by last_recalled_at desc;\u003Cbr\u002F>cap = consolidate.refreshMaxPerRun)\"}\n    CAP -- \"overflow\" --> CARRY[\"carries to next hourly tick\"]\n    CAP -- \"within cap\" --> LLM[\"LLM reads leaf body\u003Cbr\u002F>+ current similarity cluster\u003Cbr\u002F>(local embeddings provide the cluster)\"]\n    LLM --> K[\"keep: still relevant\u003Cbr\u002F>→ clear stale flag\"]\n    LLM --> R[\"rewrite: rule still applies,\u003Cbr\u002F>specifics drifted\u003Cbr\u002F>→ replace body, stamp last_refreshed_at\"]\n    LLM --> AR[\"archive: obsolete\u003Cbr\u002F>→ status:archived, reversible via enable_document\"]\n    LLM --> FB[\"fallback (provider unreachable\u003Cbr\u002F>or schema invalid after retries)\u003Cbr\u002F>→ leave flag, retry next tick\"]\n```\n\n**Step 1 — staleness-flag (deterministic).** Pure file-metadata rule: atom_type in the eligible set + `max(last_recalled_at, frontmatter.updated)` older than `consolidate.staleAfterMonths` (default 6). No LLM, no body inspection — just a flag. It also flips OFF: a single recall hit on a previously-stale leaf clears the flag on the next run, so freshly-relevant content un-flags itself automatically.\n\n**Step 2 — llm-semantic-refresh (LLM, capped, runs on the stale-flagged subset only).** For each candidate, the LLM sees the leaf's body, its frontmatter, and a small bundle of *currently-active* leaves on the same topic (the similarity cluster — pulled via local embeddings, no network). It returns one of four verdicts:\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Verdict | What happens | When the LLM picks this |\n| --- | --- | --- |\n| **keep** | `memory.stale` cleared; body untouched. | The content is still factually correct; the staleness flag was a false positive (low recall ≠ low relevance). The reset means the next 6-month window restarts cleanly. |\n| **rewrite** | Body replaced with the LLM's synthesis; `memory.last_refreshed_at` stamped; `memory.stale` cleared. | The rule still applies but specifics drifted — file paths renamed, library upgraded, API moved, dependency replaced. The lesson survives; the references update. |\n| **archive** | `disableDocument` — `memory.status: archived`, `memory.consolidated_at` stamped. File stays on disk + in git for recovery. | The bug got fixed permanently. The convention was reversed. The gotcha became obsolete after a refactor. Reversible at any time via `enable_document`. |\n| **fallback** | The flag persists; the next hourly cron tick retries. | The LLM provider is unreachable, the response didn't satisfy the schema after `consolidate.llmMaxRetries` attempts, or the model hallucinated the leaf id. Bias is always toward NOT destroying content. |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n**Why an LLM, and not a deterministic rule?** The flag is structural (\"when was this leaf last touched?\"); the verdict is semantic (\"is what this leaf SAYS still true?\"). No deterministic rule can read a `bug-root-cause` body and decide whether the bug was fixed in v1.4.2; no rule can tell that a `pattern-gotcha` about an `apply` factory still applies after a team-wide migration to `def resource(...)` smart constructors. Reading the leaf body **in current context** and producing a *trinary* decision (keep \u002F rewrite \u002F archive) is exactly the kind of judgment an LLM does well — and exactly what a deterministic policy can't reach without becoming either too aggressive (\"archive everything aged\" — loses live knowledge) or too timid (\"never touch anything\" — the wiki ages into noise).\n\n**Why capped per run?** `consolidate.refreshMaxPerRun` (default 25) bounds the LLM call budget per hourly tick. A corpus with 100 stale-flagged leaves makes 25 calls this hour, 25 the next, and so on — steady progress without billing surprises. Recently-recalled leaves are processed first (they're more likely to be load-bearing in active work), so the budget always lands on the highest-leverage candidates.\n\n**Why opt-out exists.** Set `consolidate.llmPassesEnabled: false` in `settings.yaml` to keep the deterministic flag but skip the LLM verdict. The flag still gets set; nothing acts on it. Useful for cost-sensitive setups, sealed environments, or running consolidate purely for dedup + housekeeping. You can flip it back on later — the flags accumulated in the meantime become this-run's working set.\n\n**Net effect on the wiki's shape.**\n- Recall keeps finding **correct, current** advice instead of two-year-old reruns.\n- Leaf count plateaus instead of growing forever (archives count toward \"compressed\", not \"live\").\n- Knowledge that's still right is left alone (`keep`); knowledge that drifted is updated in place (`rewrite`); knowledge that's obsolete moves out of the active set (`archive`) but stays recoverable.\n- Every change is reversible — the wiki is its own git repo, and `consolidate` uses `disableDocument` exclusively. There is no `deleteDocument` path inside the orchestrator; the user is the only one who can hard-delete, and only via the explicit MCP tool.\n- The next hourly tick reads the now-cleaner corpus, so the cluster quality for dedup + refresh **compounds**: less noise to dedup against, sharper similarity scores, fewer false positives, more confident verdicts.\n\n### Layout decides which trees are eligible\n\nEvery category in `\u003Cwiki>\u002F.layout\u002Flayout.yaml` must say `consolidate: refine` or `consolidate: none` — **no defaults applied**. `consolidate: none` categories (plans, investigations, daily by default — owned by other lifecycles) are never walked by per-leaf passes. The orchestrator refuses to run with a clear error envelope if any category lacks the field.\n\n### Pass parameters at a glance\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Pass | Knob (default) |\n| --- | --- |\n| `dedupe-by-cosine` | `consolidate.cosineThreshold` (`0.97`; `0.995` on lexical fallback) |\n| `dedupe-*` cluster scope | `consolidate.clusterTopK` (`12`) + `consolidate.clusterScoreThreshold` (`0.75`) |\n| `staleness-flag` window | `consolidate.staleAfterMonths` (`6`) |\n| `llm-semantic-refresh` cap | `consolidate.refreshMaxPerRun` (`25`) |\n| `prune-orphan-leaves` TTL | `consolidate.orphanTtlDays` (`365`) |\n| `compress-archived` body cap \u002F age | `consolidate.archiveBodyMax` (`1200`) \u002F `consolidate.archiveAgeDays` (`30`) |\n| LLM passes on\u002Foff + retry | `consolidate.llmPassesEnabled` (`on`) \u002F `consolidate.llmMaxRetries` (`2`) |\n| Throttle | `consolidate.intervalDays` (`1`) |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n### Self-healing operation\n\nEach hourly cron tick runs `cli.mjs cron-job`. Logging is two-tier: a slim attempt entry (timings, exit codes, totals, a pointer to the full log) appends to `state\u002F.consolidate-attempts.log` (last `consolidate.attemptsKeep` runs), and the complete record of the run — redacted stdout\u002Fstderr plus the full per-entity consolidate report — lands at `state\u002Flogs\u002F\u003Cyyyy>\u002F\u003Cmm>\u002Fcron-\u003Cts>.json`, pruned after `consolidate.fullLogRetentionDays`. The internal `--if-due` throttle bounds the heavy lifting to once per `consolidate.intervalDays`. When daily docs are pending but no LLM provider is reachable, `compile` exits `69` (`EX_UNAVAILABLE`): the tick records a FAILED attempt (so `cron-health` flips `healthy:false` immediately and self-clears on the next good tick) while consolidate's deterministic passes still run. The scheduled job's PATH is baked by bootstrap (your login PATH plus well-known CLI install dirs), and provider spawns append the same dirs at runtime, so launchd\u002Fcron's minimal PATH can no longer hide the provider CLIs.\n\nHealth is judged per ENTITY across runs, not per tick: a failure that a later tick resolves stays silent, while an entity still failing after `consolidate.escalateAfterAttempts` consecutive attempts — or one error signature recurring across several distinct entities, which smells like a code bug — escalates. Provider availability itself is tracked the same way: persistent provider-unavailable compile aborts and consolidate LLM-skips accrue as the synthetic entities `system:compile-llm-providers` \u002F `system:consolidate-llm-providers` and escalate after the same threshold; the first healthy tick resolves the episode. Escalation deterministically writes a redacted skeleton issue report to `issues\u002F\u003Cyyyy>\u002F\u003Cmm>\u002F\u003Cdd>\u002F\u003Csignature>.\u003Cversion>.md` (episodes version on recurrence; resolution flips `status: resolved` in place, files are never auto-pruned). The SessionStart hook (`cli.mjs cron-health` for hook-less agents) surfaces open escalations with a one-line summary and the newest report path, and offers to investigate; copy the report to the [llm-wiki-memory issues](https:\u002F\u002Fgithub.com\u002Fctxr-dev\u002Fllm-wiki-memory\u002Fissues) or use it to draft a fix PR.\n\n### Determinism\n\nDeterministic passes produce byte-identical state across two runs on the same wiki + frozen clock. LLM passes are reproducible via `MEMORY_LLM_MOCK_FILE` \u002F `MEMORY_LLM_MOCK_RESPONSE` for tests. Locking is shared with `compile.mjs`, so they never race; the cron-job wrapper sequences them.\n\nNever hard-deletes — every archival uses `disableDocument` (status flip), recoverable via `enable_document`.\n\n## Works with your agent\n![](docs\u002Fassets\u002Fline-bold.svg)\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| MCP client | Hooks (Claude Code only) | MCP tools | Write-gate enforcement |\n| --- | :---: | :---: | --- |\n| **Claude Code** | ✅ session-start \u002F pre-compact \u002F post-compact \u002F session-end \u002F exit-plan-mode \u002F pre-tool-use | ✅ | instructions + hook + server (full three-layer) |\n| **Cursor** | ✗ | ✅ | instructions + server |\n| **Codex \u002F OpenAI** | ✗ | ✅ | instructions + server |\n| **Claude Desktop** | ✗ | ✅ | instructions + server |\n| **Any MCP client** | ✗ | ✅ | instructions + server |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\nHook-driven auto-capture is Claude Code only; every other client gets the same MCP tools + the same discipline. Hook-less clients invoke `cli.mjs cron-health` at session start (per the rule rendered into `.agents\u002Frules\u002F`) to surface unresolved cron failures.\n\nThe **LLM provider** that extracts typed atoms during capture \u002F compile \u002F consolidate is set in `.llm-wiki-memory\u002Fsettings\u002F.env` and is independent of the client:\n\n[![claude](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fclaude_CLI-✓-0D0D14?style=flat-square&labelColor=D97757)](#) [![codex](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcodex_CLI-✓-0D0D14?style=flat-square&labelColor=555555)](#) [![cursor](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcursor--agent_CLI-✓-0D0D14?style=flat-square&labelColor=777777)](#) [![anthropic](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fanthropic_API-✓-0D0D14?style=flat-square&labelColor=D97757)](#) [![openai](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fopenai_API-✓-0D0D14?style=flat-square&labelColor=6B57C9)](#) [![openai-compatible](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fopenai--compatible-✓-0D0D14?style=flat-square&labelColor=2EA043)](#) [![mock](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fmock-test--only-0D0D14?style=flat-square&labelColor=888888)](#)\n\n`openai-compatible` covers ollama, vLLM, lm-studio, llama.cpp server, and litellm proxies — point `MEMORY_LLM_BASE_URL` at a local endpoint and `OPENAI_API_KEY` becomes optional on loopback \u002F RFC1918. The provider is auto-detected at install; explicit `--provider` or a user-edited `settings\u002Fsettings.yaml` chain always wins.\n\n**Provider chain + model fallback** are declared in `.\u002F.llm-wiki-memory\u002Fsettings\u002Fsettings.yaml` (materialised by bootstrap). Each API provider has a `models: [...]` list tried newest-first on `model_not_found` \u002F `404` errors; the cross-provider `chain: [...]` advances on timeout \u002F unavailable. CLI providers (claude \u002F codex \u002F cursor) defer to whatever their binary is logged into — model names live ONLY in YAML, never in code.\n\n## MCP tools\n![](docs\u002Fassets\u002Fline-bold.svg)\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Tool | Purpose |\n| --- | --- |\n| `recall_lessons` | Recall self-improvement lessons before a task (fall-back ladder drops `error_pattern`, then `language`, then `task_type`). |\n| `search_memory` | Cross-category embedding search with metadata pre-filtering. |\n| `save_lesson` | **Write-gated.** Persist a lesson after explicit user yes (requires `userRequested: true`). |\n| `save_to_dataset` | Upsert a plan, investigation, knowledge artefact, or other category by name. Write-gated when `dataset=\"self_improvement\"`. |\n| `write_memory` | Create a memory leaf, optionally superseding an existing one. Write-gated when `datasetId=\"self_improvement\"`. |\n| `consolidate_memory` | Run the deterministic + LLM consolidation passes. System-maintenance; not write-gated. |\n| `disable_document` \u002F `enable_document` \u002F `delete_document` | Archive (reversible) or remove a leaf. |\n| `audit_memory` | Surface duplicate keys, missing metadata, and cleanup candidates. |\n| `list_datasets`, `get_memory_config`, `reload_provider`, `reload_layout` | Inspect categories, config, LLM provider, and force-refresh caches. |\n| `validate_layout`, `validate_topology`, `test_path_compiler` | Layout + topology + placement-compiler sanity checks. |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n## Configuration\n![](docs\u002Fassets\u002Fline-bold.svg)\n\nSettings live in **two** files in `.\u002F.llm-wiki-memory\u002Fsettings\u002F`:\n\n- **`.env`** — secrets, provider switches, deployment paths, workspace identity, test seams. Things that genuinely need shell precedence. See [`templates\u002Fenv.example`](templates\u002Fenv.example).\n- **`settings.yaml`** — every other knob, nested by concern: `consolidate`, `flush`, `hook`, `embed`, `recall`, `compile`, `gc`, `gate`, `providers`, `crossCuttingAreas`. See [`templates\u002Fsettings.yaml`](templates\u002Fsettings.yaml).\n\nThe `.env` file's strict subset overrides the YAML where it overlaps (e.g. `MEMORY_LLM_PROVIDER` collapses the YAML chain). **As of the [2026-06-03 v2 release](docs\u002Freleases\u002F2026\u002F06\u002F03\u002Fv2\u002Fupdate-prompt.md)**, every `MEMORY_*` env var that's NOT on the strict allow-list is a silent no-op — application config moved into `settings.yaml`. The runbook covers the migration.\n\nStrict-subset `.env` keys:\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Key | Default | Meaning |\n| --- | --- | --- |\n| `ANTHROPIC_API_KEY` \u002F `OPENAI_API_KEY` | (unset) | Provider API keys (only needed for API providers). |\n| `MEMORY_LLM_PROVIDER` | auto | `claude` \u002F `codex` \u002F `cursor` \u002F `anthropic` \u002F `openai` \u002F `openai-compatible` \u002F `mock`. When set, collapses the YAML chain to this one provider. |\n| `MEMORY_LLM_MODEL` | (unset) | Provider-agnostic model override; prepends to the head provider's models list. |\n| `MEMORY_LLM_BASE_URL` | (unset) | OpenAI-compatible local endpoint (ollama, vLLM, lm-studio, llama.cpp, litellm). |\n| `MEMORY_LLM_TIMEOUT_MS` | `120000` | Per-call CLI\u002FAPI timeout. |\n| `MEMORY_DATA_DIR` \u002F `LLM_WIKI_MEMORY_ROOT` \u002F `MEMORY_EMBED_CACHE` \u002F `MEMORY_SETTINGS_PATH` | derived | Deployment paths. |\n| `MEMORY_DEFAULT_PROJECT_MODULE` | basename(workspace) | Workspace identity (scopes recall). |\n| `MEMORY_LLM_MOCK_*` | (unset) | Test seams for the mock provider. |\n| `MEMORY_MCP_SERVER_NAME` | `llm-wiki-memory` | MCP server name advertised at initialize. |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\nHighlights from `settings.yaml`:\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Section.key | Default | Meaning |\n| --- | --- | --- |\n| `flush.chunkTargetK` | `5` | Target chunk count for map-reduce distillation. |\n| `flush.chunkParallelism` | `1` | Concurrent chunks distilled at once. |\n| `flush.reduceMaxChars` | `30000` | Reduce-step input cap (tree-recurse above this; depth cap 16). |\n| `flush.reduceModelPromote` | `true` | Use one-tier-stronger model for the reduce step. |\n| `embed.model` | `Xenova\u002Fbge-large-en-v1.5` | Embedding model — see the model comparison below. |\n| `embed.backend` | `transformers` | `transformers` (on-device bge) or `lexical` (no model download). |\n| `gate.selfImprovementEnabled` | `true` | Operator escape hatch for the server-side write-gate. |\n| `gate.claudeHookEnabled` | `true` | Enable or disable the Claude Code PreToolUse write-gate hook (no-op when false). |\n| `consolidate.intervalDays` | `1` | Throttle for `consolidate --if-due`. |\n| `consolidate.llmPassesEnabled` | `true` | Disable to run deterministic-only consolidation. |\n| `consolidate.attemptsKeep` | `50` | Slim cron attempt entries kept in `state\u002F.consolidate-attempts.log`. |\n| `consolidate.fullLogRetentionDays` | `90` | Days before sharded full run logs (`state\u002Flogs\u002Fyyyy\u002Fmm\u002F`) are pruned. |\n| `consolidate.escalateAfterAttempts` | `3` | Consecutive per-entity failures before an escalation issue report is written. |\n| `wiki.autoCommit` | `true` | Auto-commit every wiki change to the wiki's own git repo (one commit per logical operation). |\n| `consolidate.cosineThreshold` | `0.97` | Dedup threshold (auto-bumped to `0.995` on the lexical fallback). |\n| `recall.touchEnabled` | `true` | Whether `searchMemoryFiltered` stamps `last_recalled_at` on hits. |\n| `providers.chain` | `[]` → auto-detect | Cross-provider fallback chain. |\n| `providers.\u003Capi-provider>.models` | (template ships) | Per-provider model fallback list (newest-first). |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Full schema\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nSee [`templates\u002Fsettings.yaml`](templates\u002Fsettings.yaml) for the complete annotated set with every knob in each of the nine config sections plus the top-level `crossCuttingAreas` list.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Choosing an embedding model\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nRecall ranks queries with an on-device [transformers.js](https:\u002F\u002Fgithub.com\u002Fxenova\u002Ftransformers.js) model, set by `embed.model` in `settings.yaml`. The default `Xenova\u002Fbge-large-en-v1.5` gives the best routing quality; lighter models trade some accuracy for a much smaller download. Sizes below are the **quantized** ONNX weights transformers.js downloads by default (full-precision is ≈ 4× larger), lightest first:\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Model | Dim | Download | Notes |\n| --- | :---: | :---: | --- |\n| `Xenova\u002Fall-MiniLM-L6-v2` | 384 | ~25 MB | Smallest and fastest. Modest retrieval quality. |\n| `Xenova\u002Fbge-small-en-v1.5` | 384 | ~35 MB | Strong quality for a small download. |\n| `Xenova\u002Fbge-base-en-v1.5` | 768 | ~110 MB | Noticeably better routing than `small`. |\n| `Xenova\u002Fbge-large-en-v1.5` | 1024 | ~340 MB | **Default.** Best routing quality. |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\nSet a lighter model in `settings.yaml`:\n\n```yaml\nembed:\n  model: Xenova\u002Fbge-small-en-v1.5\n```\n\nChanging the model invalidates the embedding cache automatically. Stay within the MiniLM \u002F BGE \u002F GTE \u002F mxbai families: they're mean-pooled with no query prefix, which is how this engine embeds. Prefix-based models (e5, nomic) underperform here because the engine doesn't add the `query:` \u002F `search_document:` prefixes they expect.\n\n\u003C\u002Fdetails>\n\n## Manual commands\n![](docs\u002Fassets\u002Fline-bold.svg)\n\n```bash\ncd .llm-wiki-memory\u002Fsrc\n\n# Inspect what consolidate WOULD do (no mutations).\nnode scripts\u002Fcli.mjs consolidate --dry-run --force --json | jq\n\n# Run consolidate for real (bypass the daily throttle).\nnode scripts\u002Fcli.mjs consolidate --force --json | jq '.totals'\n\n# Full cron-job (compile + consolidate + attempt log entry).\nnode scripts\u002Fcli.mjs cron-job\n\n# Inspect cron health (what SessionStart shows you on a failure).\nnode scripts\u002Fcli.mjs cron-health | jq\n\n# Inspect the per-run report + the attempt log history.\ncat ..\u002Fstate\u002F.consolidate.json | jq\ncat ..\u002Fstate\u002F.consolidate-attempts.log | jq -s 'reverse | .[:5]'\n\n# The classic ops trio.\nnode scripts\u002Fcli.mjs init       # materialise or repair the wiki shell\nnode scripts\u002Fcli.mjs validate   # skill-llm-wiki validate\nnode scripts\u002Fcli.mjs heal       # classify state and name the next command\n\n# Recall \u002F search from the terminal.\nnode scripts\u002Fcli.mjs recall \"\u003Cquery>\"\nnode scripts\u002Fcli.mjs search \"\u003Cquery>\"\n\n# Resolved paths + LLM provider + skill location.\nnode scripts\u002Fcli.mjs where\n\n# Recover a failed distillation. Reads either the stash (from a recent\n# failure) or the in-leaf raw fallback (for older leaves with no stash).\nnode scripts\u002Fcli.mjs redistill --leaf \u003Cpath>      # one daily leaf\nnode scripts\u002Fcli.mjs redistill --session \u003Cid>     # newest stash for a session\nnode scripts\u002Fcli.mjs redistill --all              # every pending stash\n```\n\nSchedule the hourly cron (or remove it):\n\n```bash\n.\u002F.llm-wiki-memory\u002Fsrc\u002Fbootstrap.sh --schedule daily   # cron on Linux, launchd on macOS, hourly\n.\u002F.llm-wiki-memory\u002Fsrc\u002Fbootstrap.sh --schedule off     # remove\n```\n\nThe cron entry calls a generated wrapper (`state\u002Fcron-daily.sh`) — safe across workspaces whose paths contain single-quotes, percents, or spaces.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Architecture (responsibility matrix)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n![](docs\u002Fassets\u002Fline-thin.svg)\n\n| Path | Role |\n| --- | --- |\n| `scripts\u002Flib\u002Fwiki-store.mjs` | Storage seam: every document is a wiki leaf. Drives the skill for index-rebuild \u002F validate \u002F heal \u002F rebuild. Hosts the recall-touch instrumentation and `getConsolidateLayout()` reader. |\n| `scripts\u002Flib\u002Fembed.mjs` | Transformer embeddings, cosine, content-hash cache (lexical fallback). The only retrieval engine. |\n| `scripts\u002Flib\u002Frecall.mjs` | `recall_lessons` ladder, `search_memory`, `save_lesson`. |\n| `scripts\u002Flib\u002Fllm.mjs` | LLM provider dispatch (claude \u002F codex \u002F anthropic \u002F openai \u002F openai-compatible \u002F mock) + `health()` probe + `isLocalEndpoint` heuristic. |\n| `scripts\u002Flib\u002Fllm-callJSON.mjs` | Prompt-file + variable-interpolation + zod-schema-validated LLM JSON-call wrapper. Used by compile + consolidate. |\n| `scripts\u002Flib\u002Fmaintenance-tag.mjs` | AsyncLocalStorage-backed `withSystemMaintenance` frame for the server-side gate exemption. |\n| `scripts\u002Flib\u002Fdiscipline.mjs` | Single source of the memory discipline (MCP `instructions` + the SessionStart context). |\n| `scripts\u002Flib\u002Flayout-validator.mjs` | Zod schema for `\u003Cwiki>\u002F.layout\u002Flayout.yaml`. |\n| `scripts\u002Flib\u002Fwiki-cli.mjs` | Wrapper around the `skill-llm-wiki` bin (bottom-up `index-rebuild-one`). |\n| `scripts\u002Fconsolidate.mjs` | Search-driven AutoDream consolidation orchestrator. |\n| `scripts\u002Fcron-job.mjs` | Hourly cron entry point + structured attempt log + `cronHealth`. |\n| `scripts\u002Fcompile.mjs` | LLM-driven daily → knowledge \u002F self_improvement promotion. |\n| `scripts\u002Fhooks\u002F*` | Claude Code lifecycle hooks (capture, gate, plan-sync, embed-gc, session-start). |\n| `mcp-server\u002Findex.mjs` | Local stdio MCP server. |\n| `templates\u002F`, `bootstrap.sh`, `scripts\u002Fmcp-config.sh` | Install and multi-client registration. |\n![](docs\u002Fassets\u002Fline-thin.svg)\n\nFull per-concern responsibility split (this package vs the underlying engine) and known smells: [**ARCHITECTURE.md**](ARCHITECTURE.md).\n\n\u003C\u002Fdetails>\n\n## Testing\n![](docs\u002Fassets\u002Fline-bold.svg)\n\n```bash\nnpm test           # unit suite\nnpm run test:e2e   # full lifecycle against the real skill-llm-wiki CLI (LLM stubbed)\n```\n\n**925 tests** in total. The unit suite covers the chunker (header\u002Fparagraph\u002Fhard-cut boundaries, surrogate-safe cuts), the provider+model chain (model-not-found iteration, cross-provider fallback, provenance accumulation), the map-reduce flow (depth cap, shrink check, partial-failure stash, in-leaf recovery), the redistill CLI, the wiki auto-commit layer (batching, repo-safety probe, injection guards), and the entity-level self-healing pipeline (escalations, episode-versioned issue reports, log retention, provider-availability tracking: compile's EX_UNAVAILABLE exit, synthetic `system:` entities, the hybrid cron PATH builder), word-boundary truncation, the facet vocabulary collector, and the LLM-only cosine merge band. The e2e suite builds a wiki from scratch in a temp directory and asserts genesis, daily capture, lesson + knowledge + plan + investigation absorption, compile promotion + dedup, recall, tree-growth integrity, and idempotency — against the real `skill-llm-wiki` CLI with mocked LLM responses.\n\n## Requirements\n![](docs\u002Fassets\u002Fline-bold.svg)\n\nNode 20 or newer, and git. No Docker, no Python. The embedding model downloads on first recall (set `embed.backend: lexical` in `settings.yaml` to skip it entirely).\n\n## License\n![](docs\u002Fassets\u002Fline-bold.svg)\n\n[MIT](LICENSE)\n",2,"2026-06-11 04:12:34","CREATED_QUERY"]