[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1382":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":42,"readmeContent":43,"aiSummary":44,"trendingCount":16,"starSnapshotCount":16,"syncStatus":45,"lastSyncTime":46,"discoverSource":47},1382,"synthadoc","axoviq-ai\u002Fsynthadoc","axoviq-ai","Synthadoc: An open-source LLM knowledge compilation engine that turns raw documents into structured, local-first wikis. A transparent, human-readable alternative to traditional RAG, which can be self-managed and self-improved without the use of any tools.","",null,"Python",449,44,13,1,0,86,133,175,258,94.96,"GNU Affero General Public License v3.0",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41],"agent-skills","agentic-ai","cli-tool","domain-adaptation","enterprise","enterprise-solutions","knowledge-graph","local-llm","obsidian-md","obsidian-plugin","personal-knowledge-management","pkm","rag-alternative","synthetic","synthetic-data","2026-06-12 04:00:09","# Synthadoc\n\n```\n      .-+###############+-.\n    .##                   ##.\n   ##    .----.   .----.    ##\n  ##    \u002F######\\ \u002F######\\    ##\n  ##    |######| |######|    ##\n  ##    | [SD] | | wiki |    ##\n  ##    |######| |######|    ##\n  ##    \\######\u002F \\######\u002F    ##\n   ##    '----'   '----'    ##\n    '##                   ##'\n      '-+###############+-'\n\n       S Y N T H A D O C\n    Community Edition  v0.3.0\n  ────────────────────────────────\n  Domain-agnostic LLM wiki engine\n```\n\n[![CI](https:\u002F\u002Fgithub.com\u002Faxoviq-ai\u002Fsynthadoc\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Faxoviq-ai\u002Fsynthadoc\u002Factions\u002Fworkflows\u002Fci.yml)\n[![Coverage](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdynamic\u002Fjson?url=https%3A%2F%2Fraw.githubusercontent.com%2Faxoviq-ai%2Fsynthadoc%2Fmain%2Fdocs%2Fbadges.json&query=%24.coverage&label=Coverage&suffix=%25&color=brightgreen)](https:\u002F\u002Fgithub.com\u002Faxoviq-ai\u002Fsynthadoc\u002Factions\u002Fworkflows\u002Fci.yml)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-AGPL--3.0-blue.svg)](https:\u002F\u002Fgithub.com\u002Faxoviq-ai\u002Fsynthadoc\u002Fblob\u002Fmain\u002FLICENSE)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.11%2B-yellow.svg)](https:\u002F\u002Fwww.python.org\u002F)\n[![Skills](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdynamic\u002Fjson?url=https%3A%2F%2Fraw.githubusercontent.com%2Faxoviq-ai%2Fsynthadoc%2Fmain%2Fdocs%2Fbadges.json&query=%24.skills&label=Skills&color=purple)](https:\u002F\u002Fgithub.com\u002Faxoviq-ai\u002Fsynthadoc\u002Ftree\u002Fmain\u002Fsynthadoc\u002Fskills)\n[![Hooks](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdynamic\u002Fjson?url=https%3A%2F%2Fraw.githubusercontent.com%2Faxoviq-ai%2Fsynthadoc%2Fmain%2Fdocs%2Fbadges.json&query=%24.hooks&label=Hook%20events&color=teal)](https:\u002F\u002Fgithub.com\u002Faxoviq-ai\u002Fsynthadoc\u002Ftree\u002Fmain\u002Fhooks)\n[![CLI](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdynamic\u002Fjson?url=https%3A%2F%2Fraw.githubusercontent.com%2Faxoviq-ai%2Fsynthadoc%2Fmain%2Fdocs%2Fbadges.json&query=%24.cli_commands&label=CLI%20commands&color=darkblue)](https:\u002F\u002Fgithub.com\u002Faxoviq-ai\u002Fsynthadoc)\n[![Obsidian](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdynamic\u002Fjson?url=https%3A%2F%2Fraw.githubusercontent.com%2Faxoviq-ai%2Fsynthadoc%2Fmain%2Fdocs%2Fbadges.json&query=%24.obsidian_commands&label=Obsidian%20commands&color=blueviolet)](https:\u002F\u002Fgithub.com\u002Faxoviq-ai\u002Fsynthadoc\u002Ftree\u002Fmain\u002Fobsidian-plugin)\n[![Version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCommunity%20Edition-v0.3.0-orange.svg)](https:\u002F\u002Fgithub.com\u002Faxoviq-ai\u002Fsynthadoc)\n\n**Document version: v0.3.0**\n\n**Engineered for solo users and enterprises alike, providing a domain-specific knowledge base that scales seamlessly while maintaining accuracy through autonomous self-optimization.**\n\n> Built for individuals, small teams, and large organizations who need a knowledge base that stays accurate as documents accumulate.\n\nSynthadoc reads your raw source documents — PDFs, spreadsheets, PPTs, web pages, images, videos, Word files, TXTs — and uses an LLM to synthesize them into a persistent, structured wiki. Cross-references are built automatically, contradictions are detected and surfaced, orphan pages are flagged, and every answer cites its sources. Outputs are stored as local Markdown files, ensuring seamless integration and autonomous management within [Obsidian](https:\u002F\u002Fobsidian.md) or any wiki-compliant ecosystem.\n\n---\n\n## Who Is It For?\n\nSynthadoc scales from a single researcher to a company-wide knowledge platform:\n\n\n| Team size               | Typical use case                                                                                                                                                                                                                                                                                    |\n| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| **Solo \u002F 1–2 people**  | Personal research wiki, freelance knowledge base, indie hacker documentation - run it free on Gemini Flash or a local Ollama model with zero ongoing cost                                                                                                                                           |\n| **Small team (3–20)**  | Centralized internal knowledge base for startups and departments that aggregates diverse individual data sources into a unified, high-integrity wiki. The system automatically resolves contradictions and scales autonomously, ensuring organizational intelligence grows in tandem with your team |\n| **Medium \u002F enterprise** | Compliance-sensitive knowledge bases that must stay local; per-department wikis on separate ports; audit trail for every ingest and cost event; hook system for CI\u002FCD integration; OpenTelemetry for ops dashboards                                                                                 |\n\nNo cloud account. No vendor lock-in. The wiki is plain Markdown — open it in any editor, back it up with git, sync it with any cloud drive.\n\n---\n\n## Inspiration and Vision\n\n> *\"The LLM should be able to maintain a wiki for you.\"*\n> — Andrej Karpathy, [LLM Wiki gist](https:\u002F\u002Fgist.github.com\u002Fkarpathy\u002F442a6bf555914893e9891c11519de94f)\n\nMost knowledge-management tools retrieve and summarize at query time. Synthadoc inverts this: it **compiles knowledge at ingest time**. Every new source enriches and cross-links the entire corpus, not just appends a new chunk. The wiki is the artifact — readable, editable, and browsable without any tool running.\n\n**Long-term alignment:**\n\n\n| Direction                | How Synthadoc moves there                                                                                                                                                                                     |\n| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Agent orchestration      | Orchestrator dispatches parallel IngestAgent, QueryAgent, LintAgent sub-agents with cost guards and retry backoff                                                                                             |\n| Sub-agent skills\u002Fplugins | Featuring a 3-tier lazy-load capability system, the platform allows for the injection of custom skills and hooks via a plug-and-play interface, ensuring core stability is never compromised during extension |\n| LLM wiki vs. RAG         | Pre-compiled structured knowledge beats query-time synthesis for contradiction detection, graph traversal, and offline access                                                                                 |\n| CLI \u002F HTTP               | A unified interface via CLI and RESTful endpoints, the system streamlines full-spectrum integration: from data ingestion and querying to automated linting, security auditing, and job orchestration          |\n| Local-first              | All data stays on your machine; localhost-only network binding; no cloud dependency except the LLM API itself                                                                                                 |\n| Provider choice          | LLM backends including free-tier Gemini and Groq, paid Anthropic\u002FOpenAI\u002FDeepSeek\u002FMiniMax, local Ollama, and coding-tool CLI providers (Claude Code, Opencode) — no API key required if you already have a subscription |\n\n---\n\n## Problems Addressed\n\n### 1. RAG conflates contradictions; Synthadoc surfaces them\n\nWhen two sources disagree, vector search returns both and the LLM silently blends them. Synthadoc detects the conflict during ingest, flags the page with `status: contradicted`, preserves both claims with citations, and either auto-resolves (if confidence ≥ threshold) or queues the conflict for human review.\n\n### 2. Knowledge fragments; Synthadoc links it\n\nRAG chunks are isolated. Synthadoc builds `[[wikilinks]]` between related pages during every ingest pass. The resulting graph is visible in Obsidian's Graph view and queryable with Dataview.\n\n### 3. Orphan knowledge has no address; Synthadoc finds it\n\nPages that exist but are referenced by nothing are surfaced by the lint system, with ready-to-paste index entries so you can quickly integrate them.\n\n### 4. Re-synthesis is expensive; Synthadoc caches it\n\nA 3-layer cache (embedding, LLM response, provider prompt cache) means repeated lint runs on unchanged pages cost near-zero tokens.\n\n### 5. Knowledge is locked in tools; Synthadoc escapes it\n\nEvery page is a plain Markdown file with YAML frontmatter. No proprietary format. Open the folder in any editor, put it in git, sync it with any cloud drive.\n\n### 6. Wiki structure decays as content grows; Synthadoc regenerates it\n\nAs the wiki accumulates pages the `index.md` table of contents, domain scope (`purpose.md`), and LLM behaviour guidelines (`AGENTS.md`) can drift out of sync with actual content. The `scaffold` command re-generates all three from the current wiki state using the LLM — creating category-aware index entries, refreshed scope boundaries, and updated terminology guidelines — without touching pages already linked in the index. Run it once after initial install to get a rich scaffold, then schedule it weekly as the wiki grows.\n\n### Business values\n\n\n| Value                 | How                                                                                 |\n| --------------------- | ----------------------------------------------------------------------------------- |\n| **Faster onboarding** | New team members query the wiki instead of digging through documents                |\n| **Audit trail**       | Every ingest recorded in`audit.db` with source hash, token count, and timestamp     |\n| **Cost control**      | Configurable soft-warn and hard-gate thresholds; 3-layer cache reduces repeat spend |\n| **Compliance**        | Local-first — source documents and compiled knowledge never leave your machine     |\n| **Extensibility**     | Hooks fire on every event; custom skills load without a server restart              |\n\n---\n\n## Why Synthadoc?\n\n### Competitive advantages\n\n\n| Capability                   | Synthadoc                                                      | Typical RAG | NotebookLM | Notion AI |\n| ---------------------------- | -------------------------------------------------------------- | ----------- | ---------- | --------- |\n| Ingest-time synthesis        | **Yes**                                                        | No          | Partial    | No        |\n| Contradiction detection      | **Yes**                                                        | No          | No         | No        |\n| Orphan page detection        | **Yes**                                                        | No          | No         | No        |\n| Persistent wikilink graph    | **Yes**                                                        | No          | No         | No        |\n| Local-first (no cloud data)  | **Yes**                                                        | Varies      | No         | No        |\n| Custom skill plugins         | **Yes**                                                        | Limited     | No         | No        |\n| Obsidian integration         | **Yes**                                                        | No          | No         | No        |\n| Cost guard + audit trail     | **Yes**                                                        | No          | No         | No        |\n| Hook \u002F CI integration        | **Yes** (2 events)                                             | No          | No         | No        |\n| Offline browsable artifact   | **Yes**                                                        | No          | No         | No        |\n| Multi-wiki isolation         | **Yes**                                                        | No          | No         | No        |\n| Web search → wiki pages     | **Yes**                                                        | No          | No         | No        |\n| Multiple LLMs support       | **Yes** (Gemini, Groq, MiniMax, DeepSeek, Anthropic, OpenAI, Ollama) | No          | No         | No        |\n| Auto wiki overview page      | **Yes**                                                        | No          | No         | No        |\n| Resumable job queue + retry  | **Yes**                                                        | No          | No         | No        |\n| Query decomposition          | **Yes** (parallel sub-queries)                                 | No          | No         | No        |\n| Knowledge gap detection      | **Yes**                                                        | No          | No         | No        |\n| Web search decomposition     | **Yes** (parallel Tavily)                                      | No          | No         | No        |\n| Semantic re-ranking (vector) | **Yes** (optional fastembed)                                   | Varies      | No         | No        |\n| Scaffold automation          | **Yes**                                                        | No          | No         | No        |\n| Coding tool as LLM provider  | **Yes** (Claude Code, Opencode — no API key)                   | No          | No         | No        |\n| YouTube transcript ingest    | **Yes** (standard + Shorts, no API key, timestamped)           | No          | No         | No        |\n| Multilingual \u002F CJK queries   | **Yes** (Chinese, Japanese, Korean — no false gaps)            | Limited     | No         | No        |\n\n### Key differentiators vs. RAG\n\nRAG chunks documents and retrieves them at query time. Synthadoc **compiles** knowledge: every new source is synthesized into the existing wiki graph at ingest time.\n\n- **Contradictions are caught, not blended.** When two sources disagree, Synthadoc flags the page — RAG silently averages both claims.\n- **Knowledge is linked, not scattered.** `[[wikilinks]]` connect related pages into a navigable graph visible in Obsidian and queryable with Dataview.\n- **The artifact outlives the tool.** Close the server, open the wiki folder in any Markdown editor — the knowledge is all there, human-readable, no proprietary format.\n- **Cost-efficient at scale.** Two-step ingest with cached analysis means repeated ingest of similar sources costs near-zero tokens. Three cache layers stack for lint and query too.\n- **Ingest is durable, not fragile.** Every ingest request becomes a queued job with automatic retry and a persistent audit record. Batch a hundred documents and resume after a crash — no work is lost.\n\n---\n\n## Architecture\n\n![Synthadoc Architecture](docs\u002Fpng\u002Farchitecture.png)\n\nFor full architecture details, data models, API reference, and plugin development guide see **[docs\u002Fdesign.md](docs\u002Fdesign.md)**.\n\n---\n\n## What's Included\n\nSee [docs\u002Fdesign.md — Appendix A: Release Feature Index](docs\u002Fdesign.md#appendix-a--release-feature-index) for a full feature list by version.\n\n---\n\n## Installation\n\n### Prerequisites\n\n\n| Requirement    | Version | Notes                               |\n| -------------- | ------- | ----------------------------------- |\n| Python         | 3.11+   |                                     |\n| Node.js        | 18+     | Obsidian plugin build only          |\n| Git            | any     |                                     |\n| LLM API key    | —      | At least one required — unless using Claude Code or Opencode (see below) |\n| Tavily API key | —      | Optional — web search feature only |\n\n**LLM API key — at least one required** (unless using Claude Code or Opencode — see the last two rows below):\n\n\n| Provider         | Free tier                                     | Vision          | Get key                                                       |\n| ---------------- | --------------------------------------------- | --------------- | ------------------------------------------------------------- |\n| **Gemini Flash** | Yes — 15 RPM \u002F 1M tokens\u002Fday, no credit card | Yes             | [aistudio.google.com](https:\u002F\u002Faistudio.google.com\u002Fapp\u002Fapikey) |\n| Groq             | Yes — rate-limited                           | No              | [console.groq.com](https:\u002F\u002Fconsole.groq.com\u002Fkeys)             |\n| Ollama           | Yes — runs locally, no key                   | Model-dependent | [ollama.com](https:\u002F\u002Follama.com)                              |\n| MiniMax          | No — pay-per-token                           | Yes             | [platform.minimax.io](https:\u002F\u002Fplatform.minimax.io\u002F)           |\n| DeepSeek         | No — pay-per-token (very cheap text rates)   | No              | [platform.deepseek.com](https:\u002F\u002Fplatform.deepseek.com\u002Fapi_keys) |\n| Anthropic        | No                                            | Yes             | [console.anthropic.com](https:\u002F\u002Fconsole.anthropic.com\u002F)       |\n| OpenAI           | No                                            | Yes             | [platform.openai.com](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys)   |\n| **Claude Code**  | Included with subscription — no API key      | No              | Set `provider = \"claude-code\"` in config.toml                 |\n| **Opencode**     | Included with subscription — no API key      | No              | Set `provider = \"opencode\"` in config.toml                    |\n\n**Tavily API key (optional — enables web search):**\nGet a free key at [tavily.com](https:\u002F\u002Ftavily.com). Without it, web search jobs will fail but all other features work normally.\n\n---\n\n### Step 1 — Clone and install\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fpaulmchen\u002Fsynthadoc.git\ncd synthadoc\npip3 install -e \".[dev]\"\n```\n\n### Step 2 — Run the Python test suite\n\nValidate that the Python engine builds and all tests pass before proceeding:\n\n```bash\npytest --ignore=tests\u002Fperformance\u002F -q\n```\n\nExpected: all tests pass, 0 failures. If any fail, check the error output before continuing.\n\nPerformance benchmarks (optional — Linux\u002FmacOS, measures SLOs):\n\n```bash\npytest tests\u002Fperformance\u002F -v --benchmark-disable\n```\n\n### Step 3 — Build and test the Obsidian plugin\n\n```bash\ncd obsidian-plugin\nnpm install\nnpm run build    # produces main.js\nnpm test         # runs Vitest unit tests\n```\n\n### Step 4 — Set your API keys\n\n**At least one LLM API key is required** — unless you use Claude Code or Opencode as your provider, in which case no separate API key is needed (see [Coding tool CLI providers](docs\u002Fdesign.md#coding-tool-cli-providers--no-api-key-needed)).\n\nSynthadoc defaults to **Gemini Flash** as the LLM provider — it's free, requires no\ncredit card, and offers 1 million tokens per day. Get a key at\n**aistudio.google.com\u002Fapp\u002Fapikey** (click \"Create API key\").\n\nWeb search uses **Tavily** (`TAVILY_API_KEY`) — optional, only needed for\n`synthadoc ingest \"search for: …\"` jobs.\n\n```bash\n# macOS \u002F Linux — add to ~\u002F.bashrc or ~\u002F.zshrc to persist\nexport GEMINI_API_KEY=AIza…          # default — free tier, 1M tokens\u002Fday\nexport GROQ_API_KEY=gsk_…            # alternative free tier — 100K tokens\u002Fday\nexport ANTHROPIC_API_KEY=sk-ant-…    # paid — highest quality\nexport MINIMAX_API_KEY=…             # paid — text rates (image support)\nexport DEEPSEEK_API_KEY=…            # paid — text rates (no image support)\nexport TAVILY_API_KEY=tvly-…         # web search (optional)\n\n# Windows cmd — current session only\nset GEMINI_API_KEY=AIza…\nset GROQ_API_KEY=gsk_…\nset ANTHROPIC_API_KEY=sk-ant-…\nset MINIMAX_API_KEY=…\nset DEEPSEEK_API_KEY=…\nset TAVILY_API_KEY=tvly-…\n\n# Windows cmd — permanent (open a new cmd window after running)\nsetx GEMINI_API_KEY AIza…\nsetx GROQ_API_KEY gsk_…\nsetx ANTHROPIC_API_KEY sk-ant-…\nsetx MINIMAX_API_KEY …\nsetx DEEPSEEK_API_KEY sk-…\nsetx TAVILY_API_KEY tvly-…\n```\n\nTo switch provider, edit `[agents]` in `\u003Cwiki-root>\u002F.synthadoc\u002Fconfig.toml` and restart\n`synthadoc serve`. See [Appendix — Switching LLM providers](docs\u002Fuser-quick-start-guide.md#appendix-c--switching-llm-providers) for step-by-step instructions.\n\n### Step 5 — Verify\n\n```bash\nsynthadoc --version\n```\n\n### Step 6 — Install a demo wiki, then start the engine\n\nA **wiki** is a self-contained, structured knowledge base — a folder of Markdown pages linked by topic, maintained and cross-referenced automatically by Synthadoc. Think of it as a living document that grows smarter with every source you feed it: each ingest pass adds new pages, updates existing ones, and flags contradictions. For your own work, you can build and grow a domain-specific wiki — whether that's market research, a technical knowledge base, or a team handbook — and query it in plain English or other languages at any time.\n\nA wiki must be installed before the engine can serve it. The fastest way to get started is the **History of Computing** demo, which ships with 13 pre-built pages and sample source files — no LLM API key required to browse it.\n\n**Install the demo wiki:**\n\n```bash\n# Linux \u002F macOS\nsynthadoc install history-of-computing --target ~\u002Fwikis --demo\n\n# Windows (cmd.exe)\nsynthadoc install history-of-computing --target %USERPROFILE%\\wikis --demo\n```\n\n**Then start the engine:**\n\n```bash\n# Foreground — keeps the terminal; logs stream to the console\nsynthadoc serve -w history-of-computing\n\n# Background — releases the terminal; logs go to the wiki log file\nsynthadoc serve -w history-of-computing --background\n```\n\nThe server binds to `http:\u002F\u002F127.0.0.1:7070` by default (port is set in `\u003Cwiki-root>\u002F.synthadoc\u002Fconfig.toml`). Leave it running while you work — the Obsidian plugin, CLI ingest commands, and query commands all talk to it.\n\nTo stop a background server:\n\n```bash\n# Linux \u002F macOS\nkill \u003CPID>\n\n# Windows (cmd)\ntaskkill \u002FPID \u003CPID> \u002FF\n```\n\nThe PID is printed when the background server starts and saved to `\u003Cwiki-root>\u002F.synthadoc\u002Fserver.pid`.\n\n---\n\n## Quick-Start Guide\n\nThe **History of Computing** demo includes 13 pre-built pages, raw source files covering clean-merge, contradiction, and orphan scenarios, and a full walkthrough of key Synthadoc feature.\n\n**Full step-by-step walkthrough: [docs\u002Fuser-quick-start-guide.md](docs\u002Fuser-quick-start-guide.md)**\n\nThe guide covers:\n\n1. Verify the demo server started (banner, health check)\n2. Install Dataview in Obsidian\n3. Install the Synthadoc plugin and open the vault\n4. Review wiki structure and key files (index, purpose, AGENTS.md, dashboard)\n5. Query the pre-built wiki — including knowledge gap detection\n6. Batch ingest all demo source files\n7. Resolve a contradiction\n8. Fix an orphan page\n9. Web search ingestion with automatic decomposition\n10. Ingest a YouTube video\n11. Enrich the wiki with scaffold (regenerate\u002Fupdate index, purpose, AGENTS.md)\n12. Audit features (token cost, history, events)\n13. Schedule recurring operations\n\n---\n\n## Creating Your Own Wiki\n\nUnlike the demo (which ships with pre-built pages), your own wiki starts from a domain description and grows as you feed it sources. Three commands are all you need to get started:\n\n```bash\nsynthadoc install market-condition-canada --target ~\u002Fwikis --domain \"Market conditions and trends in Canada\"\nsynthadoc use market-condition-canada   # set as the default wiki — no -w needed from here on\nsynthadoc serve\n```\n\n`--domain` is a free-text description of the subject area — the LLM uses it to generate four domain-aware starter files via scaffold:\n\n\n| File                | Purpose                                                                     |\n| ------------------- | --------------------------------------------------------------------------- |\n| `wiki\u002Findex.md`     | Table of contents — domain-relevant categories with`[[wikilinks]]`         |\n| `wiki\u002Fpurpose.md`   | Scope declaration — tells the ingest agent what belongs and what to ignore |\n| `AGENTS.md`         | LLM behaviour guidelines — tone, terminology, and synthesis style          |\n| `wiki\u002Fdashboard.md` | Live Dataview dashboard — orphan pages, contradictions, page count         |\n\nOpen the wiki folder in Obsidian as a new vault and install both the Dataview and Synthadoc plugins (required once per wiki). The Quick-Start Guide covers this setup in detail — see [docs\u002Fuser-quick-start-guide.md](docs\u002Fuser-quick-start-guide.md).\n\n**Recommended growth loop:**\n\n**1. Seed with web searches** — pull in real content for the topics you care about:\n\n```bash\nsynthadoc ingest \"search for: Economy, employment and labour market analysis in Toronto GTA\"\nsynthadoc ingest \"search for: Bank of Canada interest rate outlook 2025\"\nsynthadoc jobs list   # watch progress\n```\n\nEach search fans out into up to 20 parallel URL ingest jobs. Query decomposition and web search decomposition (see below) make broad topics yield much richer results than a single search.\n\n**2. Lint and query** — check for contradictions and verify the wiki answers your questions:\n\n```bash\nsynthadoc lint run\nsynthadoc lint report\nsynthadoc query \"What are the current employment trends in the Toronto GTA?\"\n```\n\n**3. Re-run scaffold** — after pages accumulate, scaffold regenerates a richer index that reflects actual content. Pages already linked in `index.md` are never overwritten:\n\n```bash\nsynthadoc scaffold\n```\n\n**4. Schedule recurring updates** — keep the wiki fresh automatically:\n\n```bash\nsynthadoc schedule add --op \"ingest\" --source \"search for: Toronto GTA economic indicators latest\" --cron \"0 2 * * *\"\nsynthadoc schedule add --op \"scaffold\" --cron \"0 4 * * 0\"\n```\n\n### How decomposition works\n\nBoth `query` and web search `ingest` automatically split complex inputs into focused parallel sub-tasks — a compound question becomes multiple BM25 retrievals merged before synthesis; a broad search topic becomes multiple focused Tavily keyword searches whose results are merged and deduplicated. Both fall back gracefully if the LLM decomposition call fails.\n\nSee [docs\u002Fdesign.md — Query decomposition and web search decomposition](docs\u002Fdesign.md#query-decomposition) for the full design.\n\n### Semantic re-ranking (vector search)\n\nBM25 keyword search is the default. Optional vector re-ranking (`BAAI\u002Fbge-small-en-v1.5` cosine similarity) improves recall on conceptually related queries — enable it by installing `fastembed` and setting `[search] vector = true` in config. The ~130 MB model is downloaded once; BM25 stays active as fallback.\n\nSee [docs\u002Fdesign.md — Semantic re-ranking](docs\u002Fdesign.md#semantic-re-ranking) for configuration options and performance notes.\n\n### Knowledge gap workflow\n\nWhen a query returns thin or empty results, the wiki doesn't yet cover the topic. Fill the gap with a targeted web search ingest, wait for jobs, then re-query. Each ingest cycle makes the wiki denser — future queries need the web less.\n\nSee [docs\u002Fdesign.md — Knowledge gap workflow](docs\u002Fdesign.md#knowledge-gap-workflow) for the full pattern.\n\nSee [docs\u002Fdesign.md](docs\u002Fdesign.md) for a full description of how ingest, contradiction detection, and orphan tracking work under the hood.\n\n---\n\n## Configuration\n\nYou do not need to configure anything to run the demo. The demo wiki ships with its own settings and sensible built-in defaults cover everything else. Set your API key env var, run `synthadoc serve`, and go.\n\nFor the full configuration reference — layer precedence, global vs. per-project config, all keys and defaults — see [Appendix E — Configuration](docs\u002Fuser-quick-start-guide.md#appendix-e--configuration) in the Quick-Start Guide, or [docs\u002Fdesign.md — Configuration](docs\u002Fdesign.md#configuration) for the complete technical reference.\n\n---\n\n## Command Reference by Use Case\n\n### Setting up a wiki\n\n```bash\n# Create a new empty wiki (LLM scaffold runs automatically if API key is set)\nsynthadoc install my-wiki --target ~\u002Fwikis --domain \"Machine Learning\"\n\n# Create a wiki on a specific port (useful when running multiple wikis)\nsynthadoc install my-wiki --target ~\u002Fwikis --domain \"Machine Learning\" --port 7071\n\n# Install the demo (includes pre-built pages and raw sources — no LLM call needed)\nsynthadoc install history-of-computing --target ~\u002Fwikis --demo\n\n# List available demo templates\nsynthadoc demo list\n```\n\n### Switching the active wiki\n\n```bash\n# Set a wiki as the default so -w is not required for any subsequent command\nsynthadoc use my-wiki\n\n# Check which wiki is currently active\nsynthadoc use\n\n# Clear the saved default (revert to requiring -w on every command)\nsynthadoc use --clear\n```\n\n### Refreshing wiki scaffold\n\nAfter install, you can re-run the LLM scaffold at any time to regenerate domain-specific content (index categories, AGENTS.md guidelines, purpose.md scope). Pages already linked in `index.md` are protected and preserved.\n\n```bash\n# Regenerate scaffold for an existing wiki\nsynthadoc scaffold -w my-wiki\n\n# Schedule weekly refresh (runs every Sunday at 4 AM)\nsynthadoc schedule add --op \"scaffold\" --cron \"0 4 * * 0\" -w my-wiki\n```\n\n`config.toml` and `dashboard.md` are never modified by `scaffold`.\n\n### Running the server\n\n```bash\n# Start HTTP API + job worker (foreground — terminal stays attached)\nsynthadoc serve -w my-wiki\n\n# Detach to background — banner shown, then shell is released\n# All logs go to \u003Cwiki>\u002F.synthadoc\u002Flogs\u002Fsynthadoc.log\nsynthadoc serve -w my-wiki --background\n\n# Custom port\nsynthadoc serve -w my-wiki --port 7071\n\n# Verbose debug logging to console\nsynthadoc serve -w my-wiki --verbose\n```\n\n### Ingesting sources\n\n```bash\n# Single file or URL\nsynthadoc ingest report.pdf -w my-wiki\nsynthadoc ingest https:\u002F\u002Fexample.com\u002Farticle -w my-wiki\n\n# Entire folder (parallel, up to max_parallel_ingest at a time)\nsynthadoc ingest --batch raw_sources\u002F -w my-wiki\n\n# Manifest file — ingest a curated list of sources in one shot.\n# sources.txt: one entry per line; each line is either an absolute file path\n# (PDF, DOCX, PPTX, MD, …) or a URL. Blank lines and # comments are ignored.\n# Each entry becomes a separate job in the queue, processed sequentially.\n#\n# Example sources.txt:\n#   \u002Fhome\u002Fuser\u002Fdocs\u002Fresearch-paper.pdf\n#   \u002Fhome\u002Fuser\u002Fslides\u002Fkeynote.pptx\n#   https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FAlan_Turing\n#   # this line is ignored\nsynthadoc ingest --file sources.txt -w my-wiki\n\n# Force re-ingest (bypass deduplication and cache)\nsynthadoc ingest --force report.pdf -w my-wiki\n\n# Web search — triggers a Tavily search, then ingests each result URL as a child job.\n# Prefix the query with any recognised intent: \"search for:\", \"find on the web:\",\n# \"look up:\", or \"web search:\"  (prefix is stripped before the search is sent)\n# Requires TAVILY_API_KEY to be set.\n#\n# Note: web search content is NOT saved to raw_sources\u002F. The flow is direct:\n#   query → Tavily → URLs → each URL fetched → wiki pages written\n# raw_sources\u002F is for user-provided local files (PDF, DOCX, PPTX, etc.) only.\n# The wiki pages themselves are the persistent output of a web search.\nsynthadoc ingest \"search for: Bank of Canada interest rate decisions 2024\" -w my-wiki\nsynthadoc ingest \"find on the web: unemployment trends Ontario Q1 2025\" -w my-wiki\n\n# Limit how many URLs are enqueued (default 20, overrides [web_search] max_results)\nsynthadoc ingest \"search for: quantum computing basics\" --max-results 5 -w my-wiki\n\n# Multiple web searches at once via a manifest file\n# web-searches.txt:\n#   search for: Bank of Canada interest rate decisions 2024\n#   find on the web: unemployment trends Ontario Q1 2025\n#   look up: Toronto housing market affordability index\nsynthadoc ingest --file web-searches.txt -w my-wiki\n\n# YouTube video — transcript extracted automatically, no API key needed.\n# The video must have captions (auto-generated or manual).\n# Check: open the video on YouTube → ... → Show transcript.\nsynthadoc ingest \"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=O5nskjZ_GoI\" -w my-wiki\nsynthadoc ingest \"https:\u002F\u002Fyoutu.be\u002FO5nskjZ_GoI\" -w my-wiki\n\n# YouTube URLs returned by web search are also routed automatically:\n# if Tavily returns a YouTube URL, the transcript is ingested instead of the page HTML.\nsynthadoc ingest \"search for: history of computing lecture\" -w my-wiki\n```\n\nEach YouTube wiki page opens with an **executive summary** — what the video is about,\nthe main topics covered, and the key takeaway — followed by the full timestamped transcript\nfor precise citation.\n\n### Querying\n\n```bash\n# Ask a question — answer cites wiki pages\nsynthadoc query \"What is Moore's Law?\" -w my-wiki\n\n# Save the answer as a new wiki page\nsynthadoc query \"What is Moore's Law?\" --save -w my-wiki\n```\n\n### Linting\n\n```bash\n# Run a full lint pass (enqueues job)\nsynthadoc lint run -w my-wiki\n\n# Only contradictions\nsynthadoc lint run --scope contradictions -w my-wiki\n\n# Auto-apply high-confidence resolutions\nsynthadoc lint run --auto-resolve -w my-wiki\n\n# Instant report (reads wiki files directly, no server needed)\nsynthadoc lint report -w my-wiki\n```\n\n### Monitoring jobs\n\n```bash\n# List all jobs (most recent first)\nsynthadoc jobs list -w my-wiki\n\n# Filter by status\nsynthadoc jobs list --status pending -w my-wiki\nsynthadoc jobs list --status failed -w my-wiki\nsynthadoc jobs list --status dead -w my-wiki\n\n# Single job detail\nsynthadoc jobs status \u003Cjob-id> -w my-wiki\n\n# Retry a dead job\nsynthadoc jobs retry \u003Cjob-id> -w my-wiki\n\n# Cancel all pending jobs at once (e.g. after a bad batch ingest)\nsynthadoc jobs cancel -w my-wiki        # prompts for confirmation\nsynthadoc jobs cancel --yes -w my-wiki  # skip confirmation\n\n# Remove old records\nsynthadoc jobs purge --older-than 30 -w my-wiki\n```\n\n### Inspecting ingest results\n\n```bash\n# Preview how a source will be analysed without writing pages\nsynthadoc ingest report.pdf --analyse-only -w my-wiki\n# → {\"entities\": [...], \"tags\": [...], \"summary\": \"...\"}\n```\n\n### Audit trail\n\n```bash\n# Ingest history: timestamp, source file, wiki page, tokens, cost\nsynthadoc audit history -w my-wiki            # last 50 records\nsynthadoc audit history -n 100 -w my-wiki     # last 100 records\nsynthadoc audit history --json -w my-wiki     # raw JSON for scripting\n\n# Token usage: totals + daily breakdown (cost always $0.0000 in v0.1)\nsynthadoc audit cost -w my-wiki               # last 30 days\nsynthadoc audit cost --days 7 -w my-wiki      # last 7 days\n\n# Audit events: contradictions found, auto-resolutions, cost gate triggers\nsynthadoc audit events -w my-wiki             # last 100 events\nsynthadoc audit events --json -w my-wiki      # raw JSON for scripting\n```\n\n### Scheduling recurring jobs\n\n```bash\n# Register a nightly ingest\nsynthadoc schedule add --op \"ingest --batch raw_sources\u002F\" --cron \"0 2 * * *\" -w my-wiki\n\n# Weekly lint\nsynthadoc schedule add --op \"lint\" --cron \"0 3 * * 0\" -w my-wiki\n\n# List scheduled jobs\nsynthadoc schedule list -w my-wiki\n\n# Remove a scheduled job\nsynthadoc schedule remove \u003Cid> -w my-wiki\n```\n\n### Removing a wiki\n\nStop the server for that wiki before uninstalling — the serve process must not be running\nwhen the directory is deleted.\n\n```bash\n# Stop the background server (PID is in \u003Cwiki-root>\u002F.synthadoc\u002Fserver.pid)\nkill $(cat ~\u002Fwikis\u002Fmy-wiki\u002F.synthadoc\u002Fserver.pid)          # Linux \u002F macOS\ntaskkill \u002FPID \u003Cpid> \u002FF                                      # Windows\n\n# Then uninstall — two-step confirmation required, no --yes escape\nsynthadoc uninstall my-wiki\n```\n\nFor Obsidian plugin commands see [Appendix A — Obsidian Plugin Command Reference](docs\u002Fuser-quick-start-guide.md#appendix-a--obsidian-plugin-commands) in the Quick-Start Guide.\n\n---\n\n## Administrative Reference\n\n### Health and status\n\n```bash\n# Wiki statistics: pages, queue depth, cache hit rate\nsynthadoc status -w my-wiki\n\n# Liveness probe (useful in scripts and monitoring)\n# Port is per-wiki — check [server] port in \u003Cwiki-root>\u002F.synthadoc\u002Fconfig.toml\n# Default is 7070; each additional wiki uses its own port (7071, 7072, …)\ncurl http:\u002F\u002F127.0.0.1:7070\u002Fhealth\n```\n\nExpected `status` output:\n\n```\nWiki:         \u002Fhome\u002Fuser\u002Fwikis\u002Fmy-wiki\nPages:        34\nJobs pending: 0\nJobs total:   12\n```\n\n### Logs\n\nSynthadoc writes three log artefacts per wiki:\n\n\n| File            | Location                          | Format                  | Use                                                                 |\n| --------------- | --------------------------------- | ----------------------- | ------------------------------------------------------------------- |\n| `log.md`        | `\u003Cwiki-root>\u002Flog.md`              | Human-readable Markdown | Read inside Obsidian; shows every ingest, contradiction, lint event |\n| `synthadoc.log` | `\u003Cwiki-root>\u002F.synthadoc\u002Flogs\u002F`    | JSON lines (rotating)   | Structured debug\u002Fops log; grep or pipe to jq                        |\n| `audit.db`      | `\u003Cwiki-root>\u002F.synthadoc\u002Faudit.db` | SQLite (append-only)    | Source hashes, cost records, job history                            |\n\n**Tailing the JSON log:**\n\n```bash\n# Tail and pretty-print with jq\ntail -f .synthadoc\u002Flogs\u002Fsynthadoc.log | jq .\n\n# Filter to errors only\ntail -f .synthadoc\u002Flogs\u002Fsynthadoc.log | jq 'select(.level == \"ERROR\")'\n\n# Filter to a specific job\n# job_id is present only on records logged in job context (ingest\u002Flint workers)\ntail -f .synthadoc\u002Flogs\u002Fsynthadoc.log | jq 'select(.job_id == \"abc123\")'\n```\n\n**Log rotation:** When `synthadoc.log` reaches `max_file_mb`, it is renamed to `synthadoc.log.1`; the previous `.1` becomes `.2`; files beyond `backup_count` are deleted. Total disk ≈ `max_file_mb × (backup_count + 1)`.\n\n**Changing log level at runtime:** Edit `[logs] level` in `.synthadoc\u002Fconfig.toml` and restart `synthadoc serve`. Or pass `--verbose` to get `DEBUG` for one session without editing config.\n\n### Audit trail\n\n```bash\nsynthadoc audit history -w my-wiki          # table: timestamp, source file, wiki page, tokens, cost\nsynthadoc audit history -n 100 -w my-wiki   # last 100 records (default 50)\nsynthadoc audit history --json -w my-wiki   # raw JSON for scripting\n\nsynthadoc audit cost -w my-wiki             # total tokens + daily breakdown, last 30 days\nsynthadoc audit cost --days 7 -w my-wiki    # weekly view\nsynthadoc audit cost --json -w my-wiki      # {total_tokens, total_cost_usd, daily: [...]}\n\nsynthadoc audit events -w my-wiki           # table: timestamp, job_id, event type, metadata\nsynthadoc audit events --json -w my-wiki    # raw JSON\n```\n\n> **Note:** Per-model cost tracking is live from v0.2.0 — pricing tables cover all 7 API providers. Token counts and USD cost are recorded for every ingest and query operation in `audit.db`.\n\n### Cache management\n\n```bash\n# Remove all cached LLM responses\n# Output: \"Cache cleared: N entries removed.\"\nsynthadoc cache clear -w my-wiki\n```\n\nCache invalidation happens automatically when:\n\n- A source file's SHA-256 hash changes (content changed)\n- `CACHE_VERSION` is bumped in `core\u002Fcache.py` (after prompt template edits)\n- `--force` is passed to ingest\n\n### OpenTelemetry integration\n\nBy default, traces and metrics are written to `\u003Cwiki-root>\u002F.synthadoc\u002Flogs\u002Ftraces.jsonl`. To send to any OTLP backend (Jaeger, Grafana Tempo, Honeycomb, Datadog):\n\n```toml\n# ~\u002F.synthadoc\u002Fconfig.toml\n[observability]\nexporter      = \"otlp\"\notlp_endpoint = \"http:\u002F\u002Flocalhost:4317\"\n```\n\n### Debugging\n\n```bash\n# Start server with DEBUG console logging\nsynthadoc serve -w my-wiki --verbose\n\n# Check for configuration problems\nsynthadoc status -w my-wiki     # prints pre-flight warnings\n\n# View recent job failures\nsynthadoc jobs list --status failed -w my-wiki\nsynthadoc jobs status \u003Cjob-id> -w my-wiki    # shows error message + traceback\n\n# Force a re-ingest to rule out cache issues\nsynthadoc ingest --force problem.pdf -w my-wiki\n```\n\n---\n\n## Understanding Logs and the Audit Trail\n\nSynthadoc writes three log artefacts per wiki: `log.md` (human-readable Markdown, open in Obsidian), `synthadoc.log` (JSON lines, rotate-by-size, grep with `jq`), and `audit.db` (append-only SQLite — source hashes, cost records, job history).\n\nFor the full field reference, log levels, rotation config, OTel integration, and audit query examples see [docs\u002Fdesign.md — Logs and Audit Trail](docs\u002Fdesign.md#logs-and-audit-trail).\n\n---\n\n## Customization\n\n### Custom skills (new file formats)\n\nSubclass `BaseSkill` (Apache-2.0 — no AGPL obligation on your skill code), drop the file in `\u003Cwiki-root>\u002Fskills\u002F` or `~\u002F.synthadoc\u002Fskills\u002F`, and Synthadoc hot-loads it on the next ingest. Skills can match by file extension or intent prefix (supports any Unicode text, including Chinese\u002FJapanese\u002FArabic prefixes).\n\n### Custom LLM providers\n\nSubclass `LLMProvider` from `synthadoc\u002Fproviders\u002Fbase.py` (Apache-2.0) and place it in `~\u002F.synthadoc\u002Fproviders\u002F` or the wiki `providers\u002F` directory.\n\n### Hooks\n\nShell commands (any language) that fire on `on_ingest_complete` and `on_lint_complete`. Receive a JSON context on stdin. Set `blocking = true` to gate the operation on the hook's exit code.\n\n### Cache\n\nThree cache layers (embedding, LLM response, provider prompt cache). Cache invalidates automatically on source file change (SHA-256). Force a fresh call with `--force` or wipe all responses with `synthadoc cache clear -w my-wiki`.\n\n### Per-wiki AGENTS.md\n\nEdit `\u003Cwiki-root>\u002FAGENTS.md` to give the LLM domain-specific instructions — terminology, page naming conventions, what to cross-reference. Highest-priority instruction source for every agent run against this wiki.\n\nFor full examples, API signatures, and intent-dispatch config see [docs\u002Fdesign.md — Customization](docs\u002Fdesign.md#customization).\n\n---\n\n## Links\n\n- Design document: [docs\u002Fdesign.md](docs\u002Fdesign.md)\n- Quick-Start Guide: [docs\u002Fuser-quick-start-guide.md](docs\u002Fuser-quick-start-guide.md)\n- Contributing: [CONTRIBUTING.md](CONTRIBUTING.md)\n- Issues: [GitHub Issues](..\u002F..\u002Fissues)\n","Synthadoc 是一个开源的大型语言模型知识编译引擎，能够将原始文档转换为结构化、本地优先的维基。其核心功能包括使用LLM处理多种格式的源文档（如PDF、电子表格、PPT等），自动生成交叉引用，检测并标记矛盾和孤立页面，并确保每个答案都有来源标注。技术特点上，Synthadoc 提供了透明且人类可读的知识图谱替代方案，无需依赖其他工具即可实现自我管理和改进。该工具非常适合个人用户、小型团队及大型企业，在需要构建或维护准确度高的领域特定知识库时使用。",2,"2026-06-11 02:43:22","CREATED_QUERY"]