[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74209":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},74209,"OpenKB","VectifyAI\u002FOpenKB","VectifyAI","OpenKB: Open LLM Knowledge Base","https:\u002F\u002Fpageindex.ai",null,"Python",2039,227,7,14,0,23,71,290,69,29.07,"Apache License 2.0",false,"main",[26,27,28,29,30,31],"agents","ai","knowledge-base","llm","rag","retrieval","2026-06-12 02:03:23","\u003Cdiv align=\"center\">\n\n\u003Ca href=\"https:\u002F\u002Fopenkb.ai\">\n  \u003Cimg src=\"https:\u002F\u002Fdocs.pageindex.ai\u002Fimages\u002Fopenkb.png\" alt=\"OpenKB (by PageIndex)\" \u002F>\n\u003C\u002Fa>\n\n# OpenKB — Open LLM Knowledge Base\n\n\u003Cp align=\"center\">\u003Ci>Scale to long documents&nbsp; • &nbsp;Reasoning-based retrieval&nbsp; • &nbsp;Native multi-modality&nbsp; • &nbsp;No Vector DB\u003C\u002Fi>\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n---\n\n# 📑 What is OpenKB\n\n**OpenKB (Open Knowledge Base)** is an open-source system (in CLI) that compiles raw documents into a structured, interlinked wiki-style knowledge base using LLMs, powered by [**PageIndex**](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex) for vectorless long document retrieval.\n\nThe idea is based on a [concept](https:\u002F\u002Fx.com\u002Fkarpathy\u002Fstatus\u002F2039805659525644595) described by Andrej Karpathy: LLMs generate summaries, concept pages, and cross-references, all maintained automatically. Knowledge compounds over time instead of being re-derived on every query.\n\n### Why not traditional RAG?\n\nTraditional RAG rediscovers knowledge from scratch on every query. Nothing accumulates. OpenKB compiles knowledge once into a persistent wiki, then keeps it current. Cross-references already exist. Contradictions are flagged. Synthesis reflects everything consumed.\n\n### Features\n\n- **Broad format support** — PDF, Word, Markdown, PowerPoint, HTML, Excel, text, and more via markitdown\n- **Scale to long documents** — Long and complex documents are handled via [PageIndex](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex) tree indexing, enabling accurate, vectorless long-context retrieval\n- **Native multi-modality** — Retrieves and understands figures, tables, and images, not just text\n- **Compiled Wiki** — LLM manages and compiles your documents into summaries, concept pages, and cross-links, all kept in sync\n- **Query** — Ask questions (one-off) against your wiki. The LLM navigates your compiled knowledge to answer\n- **Interactive Chat** — Multi-turn conversations with persisted sessions you can resume across runs\n- **Lint** — Health checks find contradictions, gaps, orphans, and stale content\n- **Watch mode** — Drop files into `raw\u002F`, wiki updates automatically\n- **Obsidian compatible** — Wiki is plain `.md` files with `[[wikilinks]]`. Open in Obsidian for graph view and browsing\n\n# 🚀 Getting Started\n\n### Install\n\n```bash\npip install openkb\n```\n\n\u003Cdetails>\n\u003Csummary>\u003Ci>Other install options\u003C\u002Fi>\u003C\u002Fsummary>\n\n- **Latest from GitHub:**\n\n  ```bash\n  pip install git+https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FOpenKB.git\n  ```\n\n- **Install from source** (editable, for development):\n\n  ```bash\n  git clone https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FOpenKB.git\n  cd OpenKB\n  pip install -e .\n  ```\n\n\u003C\u002Fdetails>\n\n### Quick Start\n\n```bash\n# 1. Create a directory for your knowledge base\nmkdir my-kb && cd my-kb\n\n# 2. Initialize the knowledge base\nopenkb init\n\n# 3. Add documents\nopenkb add paper.pdf\nopenkb add ~\u002Fpapers\u002F                            # Add a whole directory\nopenkb add https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.11420     # Or fetch from a URL\n\n# 4. Ask a question\nopenkb query \"What are the main findings?\"\n\n# 5. Or chat interactively\nopenkb chat\n```\n\n### Set up your LLM\n\nOpenKB comes with [multi-LLM support](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders) (e.g., OpenAI, Claude, Gemini) via [LiteLLM](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm) (pinned to a [safe version](https:\u002F\u002Fdocs.litellm.ai\u002Fblog\u002Fsecurity-update-march-2026)).\n\nSet your model during `openkb init`, or in [`.openkb\u002Fconfig.yaml`](#configuration), using `provider\u002Fmodel` LiteLLM format (like `anthropic\u002Fclaude-sonnet-4-6`). OpenAI models can omit the prefix (like `gpt-5.4`).\n\nCreate a `.env` file with your LLM API key:\n\n```bash\nLLM_API_KEY=your_llm_api_key\n```\n\n# 🧩 How OpenKB Works\n\n### Architecture\n\n```\nraw\u002F                              You drop files here\n │\n ├─ Short docs ──→ markitdown ──→ LLM reads full text\n │                                     │\n ├─ Long PDFs ──→ PageIndex ────→ LLM reads document trees\n │                                     │\n │                                     ▼\n │                         Wiki Compilation (using LLM)\n │                                     │\n ▼                                     ▼\nwiki\u002F\n ├── index.md            Knowledge base overview\n ├── log.md              Operations timeline\n ├── AGENTS.md           Wiki schema (LLM instructions)\n ├── sources\u002F            Full-text conversions\n ├── summaries\u002F          Per-document summaries\n ├── concepts\u002F           Cross-document synthesis ← the good stuff\n ├── explorations\u002F       Saved query results\n └── reports\u002F            Lint reports\n```\n\n### Short vs. Long Document Handling\n\n| | Short documents | Long documents (PDF ≥ 20 pages) |\n|---|---|---|\n| **Convert** | markitdown → Markdown | PageIndex → tree index + summaries |\n| **Images** | Extracted inline (pymupdf) | Extracted by PageIndex |\n| **LLM reads** | Full text | Document trees |\n| **Result** | summary + concepts | summary + concepts |\n\nShort docs are read in full by the LLM. Long PDFs are indexed by PageIndex into a hierarchical tree with summaries. The LLM reads the tree instead of the full text, enabling better retrieval from long documents.\n\n### Knowledge Compilation\n\nWhen you add a document, the LLM:\n\n1. Generates a **summary** page\n2. Reads existing **concept** pages\n3. Creates or updates concepts with cross-document synthesis\n4. Updates the **index** and **log**\n\nA single source might touch 10-15 wiki pages. Knowledge accumulates: each document enriches the existing wiki rather than sitting in isolation.\n\n# ⚙️ Usage\n\n### Commands\n\n| Command | Description |\n|---|---|\n| `openkb init` | Initialize a new knowledge base (interactive) |\n| \u003Ccode>openkb&nbsp;add&nbsp;&lt;file_or_dir_or_URL&gt;\u003C\u002Fcode> | Add documents and compile to wiki. URL ingest auto-detects PDF (saved as `.pdf` → PageIndex \u002F markitdown) vs HTML (trafilatura main-content extract → `.md`) |\n| \u003Ccode>openkb&nbsp;remove&nbsp;&lt;doc&gt;\u003C\u002Fcode> | Remove a document and clean up its wiki pages, images, registry, and PageIndex state (use `--dry-run` to preview, `--keep-raw` \u002F `--keep-empty-concepts` to retain artifacts) |\n| \u003Ccode>openkb&nbsp;query&nbsp;\"question\"\u003C\u002Fcode> | Ask a question over the knowledge base (use `--save` to save the answer to `wiki\u002Fexplorations\u002F`) |\n| `openkb chat` | Start an interactive multi-turn chat (use `--resume`, `--list`, `--delete` to manage sessions) |\n| `openkb watch` | Watch `raw\u002F` and auto-compile new files |\n| `openkb lint` | Run structural + knowledge health checks |\n| `openkb list` | List indexed documents and concepts |\n| `openkb status` | Show knowledge base stats |\n| \u003Ccode>openkb&nbsp;feedback&nbsp;[\"msg\"]\u003C\u002Fcode> | File feedback by opening a prefilled GitHub issue (use `--type bug\u002Ffeature\u002Fquestion` to tag the issue) |\n\n\u003C!-- | `openkb lint --fix` | Auto-fix what it can | -->\n\n### Interactive Chat\n\n`openkb chat` opens an interactive chat session over your wiki knowledge base. Unlike the one-shot `openkb query`, each turn carries the conversation history, so you can dig into a topic without re-typing context.\n\n```bash\nopenkb chat                       # start a new session\nopenkb chat --resume              # resume the most recent session\nopenkb chat --resume 20260411     # resume by id (unique prefix works)\nopenkb chat --list                # list all sessions\nopenkb chat --delete \u003Cid>         # delete a session\n```\n\nInside a chat, type `\u002F` to access slash commands (Tab to complete):\n\n- `\u002Fhelp` — list available commands\n- `\u002Fstatus` — show knowledge base status\n- `\u002Flist` — list all documents\n- `\u002Fadd \u003Cpath>` — add a document or directory without leaving the chat\n- `\u002Fsave [name]` — export the transcript to `wiki\u002Fexplorations\u002F`\n- `\u002Fclear` — start a fresh session (the current one stays on disk)\n- `\u002Flint` — run knowledge base lint\n- `\u002Fexit` — exit (Ctrl-D also works)\n\n### Configuration\n\nSettings are initialized by `openkb init`, and stored in `.openkb\u002Fconfig.yaml`:\n\n```yaml\nmodel: gpt-5.4                   # LLM model (any LiteLLM-supported provider)\nlanguage: en                     # Wiki output language\npageindex_threshold: 20          # PDF pages threshold for PageIndex\n```\n\nModel names use `provider\u002Fmodel` LiteLLM [format](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders) (OpenAI models can omit the prefix):\n\n| Provider | Model example |\n|---|---|\n| OpenAI | `gpt-5.4` |\n| Anthropic | `anthropic\u002Fclaude-sonnet-4-6` |\n| Gemini | `gemini\u002Fgemini-3.1-pro-preview` |\n\n### PageIndex Integration\n\nLong documents are challenging for LLMs due to context limits, context rot, and summarization loss.\n[PageIndex](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex) solves this with vectorless, reasoning-based retrieval — building a hierarchical tree index that lets LLMs reason over the index for context-aware retrieval.\n\nPageIndex runs locally by default using the [open-source version](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex), with no external dependencies required.\n\n#### Optional: Cloud Support\n\nFor large or complex PDFs, [PageIndex Cloud](https:\u002F\u002Fdocs.pageindex.ai\u002F) can be used to access additional capabilities, including:\n\n- OCR support for scanned PDFs (via hosted VLM models)\n- Faster structure generation\n- Scalable indexing for large documents\n\nSet `PAGEINDEX_API_KEY` in your `.env` to enable cloud features:\n\n```\nPAGEINDEX_API_KEY=your_pageindex_api_key\n```\n\n### AGENTS.md\n\nThe `wiki\u002FAGENTS.md` file defines wiki structure and conventions. It's the LLM's instruction manual for maintaining the wiki. Customize it to change how your wiki is organized.\n\nAt runtime, the LLM reads `AGENTS.md` from disk, so your edits take effect immediately.\n\n### Using with Obsidian\n\nOpenKB's wiki is a directory of Markdown files with `[[wikilinks]]`. Obsidian renders it natively.\n\n1. Open `wiki\u002F` as an Obsidian vault\n2. Browse summaries, concepts, and explorations\n3. Use graph view to see knowledge connections\n4. Use Obsidian Web Clipper to add web articles to `raw\u002F`\n\n### Using with Claude Code \u002F Codex \u002F Gemini CLI\n\nOpenKB ships a `SKILL.md` so any agent CLI can read your compiled wiki — no extra runtime, no MCP setup, just install the skill once.\n\n**Claude Code**:\n\n```\n\u002Fplugin marketplace add VectifyAI\u002FOpenKB\n\u002Fplugin install openkb@vectify\n```\n\n**Gemini CLI**:\n\n```bash\ngemini skills install https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FOpenKB.git --path skills\u002Fopenkb --consent\n```\n\n**OpenAI Codex CLI** (no marketplace command yet — manual symlink):\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FOpenKB.git ~\u002Fopenkb-src\nmkdir -p ~\u002F.agents\u002Fskills\nln -s ~\u002Fopenkb-src\u002Fskills\u002Fopenkb ~\u002F.agents\u002Fskills\u002Fopenkb\n```\n\nThe skill is read-only — it won't run `openkb add`, `remove`, or `lint --fix` without you asking. See [`skills\u002Fopenkb\u002FSKILL.md`](skills\u002Fopenkb\u002FSKILL.md) for the full instruction set.\n\n# 🧭 Learn More\n\n### Compared to Karpathy's Approach\n\n| | Karpathy's workflow | OpenKB |\n|---|---|---|\n| Short documents | LLM reads directly | markitdown → LLM reads |\n| Long documents | Context limits, context rot | PageIndex tree index |\n| Supported formats | Web clipper → .md | PDF, Word, PPT, Excel, HTML, text, CSV, .md |\n| Wiki compilation | LLM agent | LLM agent (same) |\n| Q&A | Query over wiki | Wiki + PageIndex retrieval |\n\n### The Stack\n\n- [PageIndex](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex) — Vectorless, reasoning-based document indexing and retrieval\n- [markitdown](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown) — Universal file-to-markdown conversion\n- [OpenAI Agents SDK](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-agents-python) — Agent framework (supports non-OpenAI models via LiteLLM)\n- [LiteLLM](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm) — Multi-provider LLM gateway\n- [Click](https:\u002F\u002Fclick.palletsprojects.com\u002F) — CLI framework\n- [watchdog](https:\u002F\u002Fgithub.com\u002Fgorakhargosh\u002Fwatchdog) — Filesystem monitoring\n\n### Roadmap\n\n- [ ] Extend long document handling to non-PDF formats\n- [ ] Scale to large document collections with nested folder support\n- [ ] Hierarchical concept (topic) indexing for massive knowledge bases\n- [ ] Database-backed storage engine\n- [ ] Web UI for browsing and managing wikis\n\n### Contributing\n\nContributions are welcome! Please submit a pull request, or open an [issue](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FOpenKB\u002Fissues) for bugs or feature requests. For larger changes, consider opening an issue first to discuss the approach.\n\n### License\n\nApache 2.0. See [LICENSE](LICENSE).\n\n### Support Us\n\nIf you find OpenKB useful, please give us a star 🌟 — and check out [PageIndex](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex) too!  \n\n\u003Cdiv>\n\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-000000?style=for-the-badge&logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002FPageIndexAI)&ensp;\n[![LinkedIn](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fvectify-ai\u002F)&ensp;\n[![Contact Us](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContact_Us-3B82F6?style=for-the-badge&logo=envelope&logoColor=white)](https:\u002F\u002Fii2abc2jejf.typeform.com\u002Fto\u002FtK3AXl8T)\n\n\u003C\u002Fdiv>\n","OpenKB 是一个基于大语言模型（LLM）的开源知识库系统，能够将原始文档编译成结构化且相互链接的维基风格知识库。该项目的核心功能包括支持多种文档格式（如PDF、Word、Markdown等），处理长文档的能力，以及对图表、表格和图片等多模态内容的理解。它通过PageIndex技术实现无向量数据库的长文档检索，并自动维护知识库中的摘要、概念页面及交叉引用，使得知识积累而非每次查询时重新生成。OpenKB适用于需要从大量复杂文档中高效提取和管理知识的场景，如研究机构、企业内部知识管理和个人学习资料整理等。",2,"2026-06-11 03:49:30","high_star"]