[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71948":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":40,"readmeContent":41,"aiSummary":42,"trendingCount":16,"starSnapshotCount":16,"syncStatus":43,"lastSyncTime":44,"discoverSource":45},71948,"LEANN","yichuan-w\u002FLEANN","yichuan-w","[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.","https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08276",null,"Python",11803,1053,74,38,0,125,44.07,"MIT License",false,"main",true,[24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39],"ai","faiss","gpt-oss","langchain","llama-index","llm","localstorage","offline-first","ollama","privacy","python","rag","retrieval-augmented-generation","vector-database","vector-search","vectors","2026-06-12 02:02:56","\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo-text.png\" alt=\"LEANN Logo\" width=\"400\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F15049\" target=\"_blank\">\n    \u003Cimg src=\"https:\u002F\u002Ftrendshift.io\u002Fapi\u002Fbadge\u002Frepositories\u002F15049\" alt=\"yichuan-w\u002FLEANN | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue.svg\" alt=\"Python Versions\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fyichuan-w\u002FLEANN\u002Factions\u002Fworkflows\u002Fbuild-and-publish.yml\u002Fbadge.svg\" alt=\"CI Status\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPlatform-Ubuntu%20%26%20Arch%20%26%20WSL%20%7C%20macOS%20(ARM64%2FIntel)%20%7C%20Windows-lightgrey\" alt=\"Platform\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-green.svg\" alt=\"MIT License\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMCP-Native%20Integration-blue\" alt=\"MCP Integration\">\n  \u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fleann-e2u9779\u002Fshared_invite\u002Fzt-3ol2ww9ic-Eg_kB8omwe6xmYVd0epr4Q\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-Join-4A154B?logo=slack&logoColor=white\" alt=\"Join Slack\">\n  \u003C\u002Fa>\n\n\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fforms.gle\u002FrDbZf864gMNxhpTq8\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📣_Community_Survey-Help_Shape_v0.4-007ec6?style=for-the-badge&logo=google-forms&logoColor=white\" alt=\"Take Survey\">\n  \u003C\u002Fa>\n  \u003Cp>\n    We track \u003Cb>zero telemetry\u003C\u002Fb>. This survey is the ONLY way to tell us if you want \u003Cbr>\n    \u003Cb>GPU Acceleration\u003C\u002Fb> or \u003Cb>More Integrations\u003C\u002Fb> next.\u003Cbr>\n    👉 \u003Ca href=\"https:\u002F\u002Fforms.gle\u002FrDbZf864gMNxhpTq8\">\u003Cb>Click here to cast your vote (2 mins)\u003C\u002Fb>\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ch3>💬 Join our Slack community!\u003C\u002Fh3>\n  \u003Cp>\n    We'd love for you to be part of the LEANN community!\u003Cbr>\n    👉 \u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fleann-e2u9779\u002Fshared_invite\u002Fzt-3ol2ww9ic-Eg_kB8omwe6xmYVd0epr4Q\">\u003Cb>Join LEANN Slack\u003C\u002Fb>\u003C\u002Fa>\u003Cbr>\n    If the invite link has expired or you have trouble joining, please \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fyichuan-w\u002FLEANN\u002Fissues\">open an issue\u003C\u002Fa> and we'll help you get in!\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Ch2 align=\"center\" tabindex=\"-1\" class=\"heading-element\" dir=\"auto\">\n    The smallest vector index in the world. RAG Everything with LEANN!\n\u003C\u002Fh2>\n\nLEANN is an innovative vector database that democratizes personal AI. Transform your laptop into a powerful RAG system that can index and search through millions of documents while using **97% less storage** than traditional solutions **without accuracy loss**.\n\n\nLEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#️-architecture--how-it-works) | [Paper →](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08276)\n\n**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can semantic search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)** ([WeChat](#-wechat-detective-unlock-your-golden-memories), [iMessage](#-imessage-history-your-personal-conversation-archive)), **[agent memory](#-chatgpt-chat-history-your-personal-ai-conversation-archive)** ([ChatGPT](#-chatgpt-chat-history-your-personal-ai-conversation-archive), [Claude](#-claude-chat-history-your-personal-ai-conversation-archive)), **[live data](#mcp-integration-rag-on-live-data-from-any-platform)** ([Slack](#slack-messages-search-your-team-conversations), [Twitter](#-twitter-bookmarks-your-personal-tweet-library)), **[codebase](#-claude-code-integration-transform-your-development-workflow)**\\* , or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.\n\n\n\\* Claude Code only supports basic `grep`-style keyword search. **LEANN** is a drop-in **semantic search MCP service fully compatible with Claude Code**, unlocking intelligent retrieval without changing your workflow. 🔥 Check out [the easy setup →](packages\u002Fleann-mcp\u002FREADME.md)\n\n\n\n## Why LEANN?\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Feffects.png\" alt=\"LEANN vs Traditional Vector DB Storage Comparison\" width=\"70%\">\n\u003C\u002Fp>\n\n> **The numbers speak for themselves:** Index 60 million text chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#-storage-comparison)\n\n\n🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no \"terms of service\".\n\n🪶 **Lightweight:** Graph-based recomputation eliminates heavy embedding storage, while smart graph pruning and CSR format minimize graph storage overhead. Always less storage, less memory usage!\n\n📦 **Portable:** Transfer your entire knowledge base between devices (even with others) with minimal cost - your personal AI memory travels with you.\n\n📈 **Scalability:** Handle messy personal data that would crash traditional vector DBs, easily managing your growing personalized data and agent generated memory!\n\n✨ **No Accuracy Loss:** Maintain the same search quality as heavyweight solutions while using 97% less storage.\n\n## Installation\n\n### 📦 Prerequisites: Install uv\n\n[Install uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fgetting-started\u002Finstallation\u002F#installation-methods) first if you don't have it. Typically, you can install it with:\n\n```bash\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n```\n\n### 🚀 Quick Install\n\nClone the repository to access all examples and try amazing applications,\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fyichuan-w\u002FLEANN.git leann\ncd leann\n```\n\nand install LEANN from [PyPI](https:\u002F\u002Fpypi.org\u002Fproject\u002Fleann\u002F) to run them immediately:\n\n```bash\nuv venv\nsource .venv\u002Fbin\u002Factivate\nuv pip install leann\n\n# CPU-only (Linux): use the `cpu` extra (e.g. `leann[cpu]`)\n```\n\n\u003C!--\n> Low-resource? See \"Low-resource setups\" in the [Configuration Guide](docs\u002Fconfiguration-guide.md#low-resource-setups). -->\n\n\u003Cdetails>\n\u003Csummary>\n\u003Cstrong>🔧 Build from Source (Recommended for development)\u003C\u002Fstrong>\n\u003C\u002Fsummary>\n\n\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fyichuan-w\u002FLEANN.git leann\ncd leann\ngit submodule update --init --recursive\n```\n\n**macOS:**\n\nNote: DiskANN requires MacOS 13.3 or later.\n\n```bash\nbrew install libomp boost protobuf zeromq pkgconf\nuv sync --extra diskann\n```\n\n**Linux (Ubuntu\u002FDebian):**\n\nNote: On Ubuntu 20.04, you may need to build a newer Abseil and pin Protobuf (e.g., v3.20.x) for building DiskANN. See [Issue #30](https:\u002F\u002Fgithub.com\u002Fyichuan-w\u002FLEANN\u002Fissues\u002F30) for a step-by-step note.\n\nYou can manually install [Intel oneAPI MKL](https:\u002F\u002Fwww.intel.com\u002Fcontent\u002Fwww\u002Fus\u002Fen\u002Fdeveloper\u002Ftools\u002Foneapi\u002Fonemkl.html) instead of `libmkl-full-dev` for DiskANN. You can also use `libopenblas-dev` for building HNSW only, by removing `--extra diskann` in the command below.\n\n```bash\nsudo apt-get update && sudo apt-get install -y \\\n  libomp-dev libboost-all-dev protobuf-compiler libzmq3-dev \\\n  pkg-config libabsl-dev libaio-dev libprotobuf-dev \\\n  libmkl-full-dev\n\nuv sync --extra diskann\n```\n\n**Linux (Arch Linux):**\n\n```bash\nsudo pacman -Syu && sudo pacman -S --needed base-devel cmake pkgconf git gcc \\\n  boost boost-libs protobuf abseil-cpp libaio zeromq\n\n# For MKL in DiskANN\nsudo pacman -S --needed base-devel git\ngit clone https:\u002F\u002Faur.archlinux.org\u002Fparu-bin.git\ncd paru-bin && makepkg -si\nparu -S intel-oneapi-mkl intel-oneapi-compiler\nsource \u002Fopt\u002Fintel\u002Foneapi\u002Fsetvars.sh\n\nuv sync --extra diskann\n```\n\n**Linux (RHEL \u002F CentOS Stream \u002F Oracle \u002F Rocky \u002F AlmaLinux):**\n\nSee [Issue #50](https:\u002F\u002Fgithub.com\u002Fyichuan-w\u002FLEANN\u002Fissues\u002F50) for more details.\n\n```bash\nsudo dnf groupinstall -y \"Development Tools\"\nsudo dnf install -y libomp-devel boost-devel protobuf-compiler protobuf-devel \\\n  abseil-cpp-devel libaio-devel zeromq-devel pkgconf-pkg-config\n\n# For MKL in DiskANN\nsudo dnf install -y intel-oneapi-mkl intel-oneapi-mkl-devel \\\n  intel-oneapi-openmp || sudo dnf install -y intel-oneapi-compiler\nsource \u002Fopt\u002Fintel\u002Foneapi\u002Fsetvars.sh\n\nuv sync --extra diskann\n```\n\n**Windows:**\n\nRequires [Visual Studio 2022 Build Tools](https:\u002F\u002Fvisualstudio.microsoft.com\u002Fdownloads\u002F#build-tools-for-visual-studio-2022) with the **C++ desktop development** workload, and [vcpkg](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvcpkg).\n\n```powershell\n# Install toolchain (if not already present)\nchoco install cmake swig pkgconfiglite nuget.commandline -y\n\n# Install C++ dependencies via vcpkg\nvcpkg install zeromq:x64-windows openblas:x64-windows lapack:x64-windows `\n  boost-program-options:x64-windows protobuf:x64-windows\n\n# Set environment variables (adjust VCPKG_ROOT to your vcpkg path)\n$env:CMAKE_PREFIX_PATH = \"$env:VCPKG_ROOT\\installed\\x64-windows\"\n$env:PKG_CONFIG_PATH = \"$env:VCPKG_ROOT\\installed\\x64-windows\\lib\\pkgconfig\"\n$env:PKG_CONFIG_EXECUTABLE = \"C:\\ProgramData\\chocolatey\\bin\\pkg-config.exe\"\n$env:OPENBLAS_LIB = \"$env:VCPKG_ROOT\\installed\\x64-windows\\lib\\openblas.lib\"\n$env:PATH += \";$env:VCPKG_ROOT\\installed\\x64-windows\\bin\"\n$env:PATH += \";$env:VCPKG_ROOT\\installed\\x64-windows\\tools\\protobuf\"\n\nuv sync --extra diskann\n```\n\n\u003C\u002Fdetails>\n\n\n## Quick Start\n\nOur declarative API makes RAG as easy as writing a config file.\n\nCheck out [demo.ipynb](demo.ipynb) or [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fyichuan-w\u002FLEANN\u002Fblob\u002Fmain\u002Fdemo.ipynb)\n\n```python\nfrom leann import LeannBuilder, LeannSearcher, LeannChat\nfrom pathlib import Path\nINDEX_PATH = str(Path(\".\u002F\").resolve() \u002F \"demo.leann\")\n\n# Build an index\nbuilder = LeannBuilder(backend_name=\"hnsw\")\nbuilder.add_text(\"LEANN saves 97% storage compared to traditional vector databases.\")\nbuilder.add_text(\"Tung Tung Tung Sahur called—they need their banana‑crocodile hybrid back\")\nbuilder.build_index(INDEX_PATH)\n\n# Search\nsearcher = LeannSearcher(INDEX_PATH)\nresults = searcher.search(\"fantastical AI-generated creatures\", top_k=1)\n\n# Chat with your data\nchat = LeannChat(INDEX_PATH, llm_config={\"type\": \"hf\", \"model\": \"Qwen\u002FQwen3-0.6B\"})\nresponse = chat.ask(\"How much storage does LEANN save?\", top_k=1)\n```\n\n## RAG on Everything!\n\nLEANN supports RAG on various data sources including documents (`.pdf`, `.txt`, `.md`), Apple Mail, Google Search History, WeChat, ChatGPT conversations, Claude conversations, iMessage conversations, and **live data from any platform through MCP (Model Context Protocol) servers** - including Slack, Twitter, and more.\n\n\n\n### Generation Model Setup\n\n#### LLM Backend\n\nLEANN supports many LLM providers for text generation (HuggingFace, Ollama, Anthropic, and Any OpenAI compatible API).\n\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>🔑 OpenAI API Setup (Default)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nSet your OpenAI API key as an environment variable:\n\n```bash\nexport OPENAI_API_KEY=\"your-api-key-here\"\n```\n\nMake sure to use `--llm openai` flag when using the CLI.\nYou can also specify the model name with `--llm-model \u003Cmodel-name>` flag.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>🛠️ Supported LLM & Embedding Providers (via OpenAI Compatibility)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nThanks to the widespread adoption of the OpenAI API format, LEANN is compatible out-of-the-box with a vast array of LLM and embedding providers. Simply set the `OPENAI_BASE_URL` and `OPENAI_API_KEY` environment variables to connect to your preferred service.\n\n```sh\nexport OPENAI_API_KEY=\"xxx\"\nexport OPENAI_BASE_URL=\"http:\u002F\u002Flocalhost:1234\u002Fv1\" # base url of the provider\n```\n\nTo use OpenAI compatible endpoint with the CLI interface:\n\nIf you are using it for text generation, make sure to use `--llm openai` flag and specify the model name with `--llm-model \u003Cmodel-name>` flag.\n\nIf you are using it for embedding, set the `--embedding-mode openai` flag and specify the model name with `--embedding-model \u003CMODEL>`.\n\n-----\n\n\nBelow is a list of base URLs for common providers to get you started.\n\n\n### 🖥️ Local Inference Engines (Recommended for full privacy)\n\n| Provider         | Sample Base URL             |\n| ---------------- | --------------------------- |\n| **Ollama** | `http:\u002F\u002Flocalhost:11434\u002Fv1` |\n| **LM Studio** | `http:\u002F\u002Flocalhost:1234\u002Fv1`  |\n| **vLLM** | `http:\u002F\u002Flocalhost:8000\u002Fv1`  |\n| **llama.cpp** | `http:\u002F\u002Flocalhost:8080\u002Fv1`  |\n| **SGLang** | `http:\u002F\u002Flocalhost:30000\u002Fv1` |\n| **LiteLLM** | `http:\u002F\u002Flocalhost:4000`     |\n\n-----\n\n### ☁️ Cloud Providers\n\n> **🚨 A Note on Privacy:** Before choosing a cloud provider, carefully review their privacy and data retention policies. Depending on their terms, your data may be used for their own purposes, including but not limited to human reviews and model training, which can lead to serious consequences if not handled properly.\n\n\n| Provider         | Base URL                                                   |\n| ---------------- | ---------------------------------------------------------- |\n| **OpenAI** | `https:\u002F\u002Fapi.openai.com\u002Fv1`                                |\n| **OpenRouter** | `https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1`                             |\n| **Gemini** | `https:\u002F\u002Fgenerativelanguage.googleapis.com\u002Fv1beta\u002Fopenai\u002F` |\n| **x.AI (Grok)** | `https:\u002F\u002Fapi.x.ai\u002Fv1`                                      |\n| **Groq AI** | `https:\u002F\u002Fapi.groq.com\u002Fopenai\u002Fv1`                           |\n| **DeepSeek** | `https:\u002F\u002Fapi.deepseek.com\u002Fv1`                              |\n| **SiliconFlow** | `https:\u002F\u002Fapi.siliconflow.cn\u002Fv1`                            |\n| **Zhipu (BigModel)** | `https:\u002F\u002Fopen.bigmodel.cn\u002Fapi\u002Fpaas\u002Fv4\u002F`                |\n| **Mistral AI** | `https:\u002F\u002Fapi.mistral.ai\u002Fv1`                                |\n| **Anthropic** | `https:\u002F\u002Fapi.anthropic.com\u002Fv1`                             |\n| **Jina AI** (Embeddings) | `https:\u002F\u002Fapi.jina.ai\u002Fv1`                         |\n\n> **💡 Tip: Separate Embedding Provider**\n>\n> To use a different provider for embeddings (e.g., Jina AI) while using another for LLM, use `--embedding-api-base` and `--embedding-api-key`:\n> ```bash\n> leann build my-index --docs .\u002Fdocs \\\n>   --embedding-mode openai \\\n>   --embedding-model jina-embeddings-v3 \\\n>   --embedding-api-base https:\u002F\u002Fapi.jina.ai\u002Fv1 \\\n>   --embedding-api-key $JINA_API_KEY\n> ```\n\nIf your provider isn't on this list, don't worry! Check their documentation for an OpenAI-compatible endpoint—chances are, it's OpenAI Compatible too!\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>🔧 Ollama Setup (Recommended for full privacy)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n**macOS:**\n\nFirst, [download Ollama for macOS](https:\u002F\u002Follama.com\u002Fdownload\u002Fmac).\n\n```bash\n# Pull a lightweight model (recommended for consumer hardware)\nollama pull llama3.2:1b\n```\n\n**Linux:**\n\n```bash\n# Install Ollama\ncurl -fsSL https:\u002F\u002Follama.ai\u002Finstall.sh | sh\n\n# Start Ollama service manually\nollama serve &\n\n# Pull a lightweight model (recommended for consumer hardware)\nollama pull llama3.2:1b\n```\n\n\u003C\u002Fdetails>\n\n\n## ⭐ Flexible Configuration\n\nLEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.\n\n📚 **Need configuration best practices?** Check our [Configuration Guide](docs\u002Fconfiguration-guide.md) for detailed optimization tips, model selection advice, and solutions to common issues like slow embeddings or poor search quality.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: Common Parameters (Available in All Examples)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nAll RAG examples share these common parameters. **Interactive mode** is available in all examples - simply run without `--query` to start a continuous Q&A session where you can ask multiple questions. Type 'quit' to exit.\n\n```bash\n# Environment Variables (GPU Device Selection)\nLEANN_EMBEDDING_DEVICE       # GPU for embedding model (e.g., cuda:0, cuda:1, cpu)\nLEANN_LLM_DEVICE             # GPU for HFChat LLM (e.g., cuda:1, or \"cuda\" for multi-GPU auto)\n\n# Core Parameters (General preprocessing for all examples)\n--index-dir DIR              # Directory to store the index (default: current directory)\n--query \"YOUR QUESTION\"      # Single query mode. Omit for interactive chat (type 'quit' to exit), and now you can play with your index interactively\n--max-items N                # Limit data preprocessing (default: -1, process all data)\n--force-rebuild              # Force rebuild index even if it exists\n\n# Embedding Parameters\n--embedding-model MODEL      # e.g., facebook\u002Fcontriever, text-embedding-3-small, mlx-community\u002FQwen3-Embedding-0.6B-8bit or nomic-embed-text\n--embedding-mode MODE        # sentence-transformers, openai, mlx, or ollama\n\n# LLM Parameters (Text generation models)\n--llm TYPE                   # LLM backend: openai, ollama, hf, or anthropic (default: openai)\n--llm-model MODEL            # Model name (default: gpt-4o) e.g., gpt-4o-mini, llama3.2:1b, Qwen\u002FQwen2.5-1.5B-Instruct\n--thinking-budget LEVEL      # Thinking budget for reasoning models: low\u002Fmedium\u002Fhigh (supported by o3, o3-mini, GPT-Oss:20b, and other reasoning models)\n\n# Search Parameters\n--top-k N                    # Number of results to retrieve (default: 20)\n--search-complexity N        # Search complexity for graph traversal (default: 32)\n\n# Chunking Parameters\n--chunk-size N               # Size of text chunks (default varies by source: 256 for most, 192 for WeChat)\n--chunk-overlap N            # Overlap between chunks (default varies: 25-128 depending on source)\n\n# Index Building Parameters\n--backend-name NAME          # Backend to use: hnsw or diskann (default: hnsw)\n--graph-degree N             # Graph degree for index construction (default: 32)\n--build-complexity N         # Build complexity for index construction (default: 64)\n--compact \u002F --no-compact     # Use compact storage (default: true). Must be `no-compact` for `no-recompute` build.\n--recompute \u002F --no-recompute # Enable\u002Fdisable embedding recomputation (default: enabled). Should not do a `no-recompute` search in a `recompute` build.\n```\n\n\u003C\u002Fdetails>\n\n### 📄 Personal Data Manager: Process Any Documents (`.pdf`, `.txt`, `.md`)!\n\nAsk questions directly about your personal PDFs, documents, and any directory containing your files!\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"videos\u002Fpaper_clear.gif\" alt=\"LEANN Document Search Demo\" width=\"600\">\n\u003C\u002Fp>\n\nThe example below asks a question about summarizing our paper (uses default data in `data\u002F`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a Technical report about LLM in Huawei in Chinese), and this is the **easiest example** to run here:\n\n```bash\nsource .venv\u002Fbin\u002Factivate # Don't forget to activate the virtual environment\npython -m apps.document_rag --query \"What are the main techniques LEANN explores?\"\n```\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: Document-Specific Arguments\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n#### Parameters\n```bash\n--data-dir DIR           # Directory containing documents to process (default: data)\n--file-types .ext .ext   # Filter by specific file types (optional - all LlamaIndex supported types if omitted)\n```\n\n#### Example Commands\n```bash\n# Process all documents with larger chunks for academic papers\npython -m apps.document_rag --data-dir \"~\u002FDocuments\u002FPapers\" --chunk-size 1024\n\n# Filter only markdown and Python files with smaller chunks\npython -m apps.document_rag --data-dir \".\u002Fdocs\" --chunk-size 256 --file-types .md .py\n\n# Enable AST-aware chunking for code files\npython -m apps.document_rag --enable-code-chunking --data-dir \".\u002Fmy_project\"\n\n# Or use the specialized code RAG for better code understanding\npython -m apps.code_rag --repo-dir \".\u002Fmy_codebase\" --query \"How does authentication work?\"\n```\n\n\u003C\u002Fdetails>\n\n### 🎨 ColQwen: Multimodal PDF Retrieval with Vision-Language Models\n\nSearch through PDFs using both text and visual understanding with ColQwen2\u002FColPali models. Perfect for research papers, technical documents, and any PDFs with complex layouts, figures, or diagrams.\n\n> **🍎 Mac Users**: ColQwen is optimized for Apple Silicon with MPS acceleration for faster inference!\n\n```bash\n# Build index from PDFs\npython -m apps.colqwen_rag build --pdfs .\u002Fmy_papers\u002F --index research_papers\n\n# Search with text queries\npython -m apps.colqwen_rag search research_papers \"How does attention mechanism work?\"\n\n# Interactive Q&A\npython -m apps.colqwen_rag ask research_papers --interactive\n```\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: ColQwen Setup & Usage\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n#### Prerequisites\n```bash\n# Install dependencies\nuv pip install colpali_engine pdf2image pillow matplotlib qwen_vl_utils einops seaborn\nbrew install poppler  # macOS only, for PDF processing\n```\n\n#### Build Index\n```bash\npython -m apps.colqwen_rag build \\\n  --pdfs .\u002Fpdf_directory\u002F \\\n  --index my_index \\\n  --model colqwen2  # or colpali\n```\n\n#### Search\n```bash\npython -m apps.colqwen_rag search my_index \"your question here\" --top-k 5\n```\n\n#### Models\n- **ColQwen2** (`colqwen2`): Latest vision-language model with improved performance\n- **ColPali** (`colpali`): Proven multimodal retriever\n\nFor detailed usage, see the [ColQwen Guide](docs\u002FCOLQWEN_GUIDE.md).\n\n\u003C\u002Fdetails>\n\n### 📧 Your Personal Email Secretary: RAG on Apple Mail!\n\n> **Note:** The examples below currently support macOS only. Windows support coming soon.\n\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"videos\u002Fmail_clear.gif\" alt=\"LEANN Email Search Demo\" width=\"600\">\n\u003C\u002Fp>\n\nBefore running the example below, you need to grant full disk access to your terminal\u002FVS Code in System Preferences → Privacy & Security → Full Disk Access.\n\n```bash\npython -m apps.email_rag --query \"What's the food I ordered by DoorDash or Uber Eats mostly?\"\n```\n**780K email chunks → 78MB storage.** Finally, search your email like you search Google.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: Email-Specific Arguments\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n#### Parameters\n```bash\n--mail-path PATH         # Path to specific mail directory (auto-detects if omitted)\n--include-html          # Include HTML content in processing (useful for newsletters)\n```\n\n#### Example Commands\n```bash\n# Search work emails from a specific account\npython -m apps.email_rag --mail-path \"~\u002FLibrary\u002FMail\u002FV10\u002FWORK_ACCOUNT\"\n\n# Find all receipts and order confirmations (includes HTML)\npython -m apps.email_rag --query \"receipt order confirmation invoice\" --include-html\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: Example queries you can try\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nOnce the index is built, you can ask questions like:\n- \"Find emails from my boss about deadlines\"\n- \"What did John say about the project timeline?\"\n- \"Show me emails about travel expenses\"\n\u003C\u002Fdetails>\n\n### 🔍 Time Machine for the Web: RAG Your Entire Chrome Browser History!\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"videos\u002Fgoogle_clear.gif\" alt=\"LEANN Browser History Search Demo\" width=\"600\">\n\u003C\u002Fp>\n\n```bash\npython -m apps.browser_rag --query \"Tell me my browser history about machine learning?\"\n```\n**38K browser entries → 6MB storage.** Your browser history becomes your personal search engine.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: Browser-Specific Arguments\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n#### Parameters\n```bash\n--chrome-profile PATH    # Path to Chrome profile directory (auto-detects if omitted)\n```\n\n#### Example Commands\n```bash\n# Search academic research from your browsing history\npython -m apps.browser_rag --query \"arxiv papers machine learning transformer architecture\"\n\n# Track competitor analysis across work profile\npython -m apps.browser_rag --chrome-profile \"~\u002FLibrary\u002FApplication Support\u002FGoogle\u002FChrome\u002FWork Profile\" --max-items 5000\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: How to find your Chrome profile\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nThe default Chrome profile path is configured for a typical macOS setup. If you need to find your specific Chrome profile:\n\n1. Open Terminal\n2. Run: `ls ~\u002FLibrary\u002FApplication\\ Support\u002FGoogle\u002FChrome\u002F`\n3. Look for folders like \"Default\", \"Profile 1\", \"Profile 2\", etc.\n4. Use the full path as your `--chrome-profile` argument\n\n**Common Chrome profile locations:**\n- macOS: `~\u002FLibrary\u002FApplication Support\u002FGoogle\u002FChrome\u002FDefault`\n- Linux: `~\u002F.config\u002Fgoogle-chrome\u002FDefault`\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>💬 Click to expand: Example queries you can try\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nOnce the index is built, you can ask questions like:\n\n- \"What websites did I visit about machine learning?\"\n- \"Find my search history about programming\"\n- \"What YouTube videos did I watch recently?\"\n- \"Show me websites I visited about travel planning\"\n\n\u003C\u002Fdetails>\n\n### 💬 WeChat Detective: Unlock Your Golden Memories!\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"videos\u002Fwechat_clear.gif\" alt=\"LEANN WeChat Search Demo\" width=\"600\">\n\u003C\u002Fp>\n\n```bash\npython -m apps.wechat_rag --query \"Show me all group chats about weekend plans\"\n```\n**400K messages → 64MB storage** Search years of chat history in any language.\n\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>🔧 Click to expand: Installation Requirements\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nFirst, you need to install the [WeChat exporter](https:\u002F\u002Fgithub.com\u002Fsunnyyoung\u002FWeChatTweak-CLI),\n\n```bash\nbrew install sunnyyoung\u002Frepo\u002Fwechattweak-cli\n```\n\nor install it manually (if you have issues with Homebrew):\n\n```bash\nsudo packages\u002Fwechat-exporter\u002Fwechattweak-cli install\n```\n\n**Troubleshooting:**\n- **Installation issues**: Check the [WeChatTweak-CLI issues page](https:\u002F\u002Fgithub.com\u002Fsunnyyoung\u002FWeChatTweak-CLI\u002Fissues\u002F41)\n- **Export errors**: If you encounter the error below, try restarting WeChat\n  ```bash\n  Failed to export WeChat data. Please ensure WeChat is running and WeChatTweak is installed.\n  Failed to find or export WeChat data. Exiting.\n  ```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: WeChat-Specific Arguments\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n#### Parameters\n```bash\n--export-dir DIR         # Directory to store exported WeChat data (default: wechat_export_direct)\n--force-export          # Force re-export even if data exists\n```\n\n#### Example Commands\n```bash\n# Search for travel plans discussed in group chats\npython -m apps.wechat_rag --query \"travel plans\" --max-items 10000\n\n# Re-export and search recent chats (useful after new messages)\npython -m apps.wechat_rag --force-export --query \"work schedule\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>💬 Click to expand: Example queries you can try\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nOnce the index is built, you can ask questions like:\n\n- \"我想买魔术师约翰逊的球衣，给我一些对应聊天记录?\" (Chinese: Show me chat records about buying Magic Johnson's jersey)\n\n\u003C\u002Fdetails>\n\n### 🤖 ChatGPT Chat History: Your Personal AI Conversation Archive!\n\nTransform your ChatGPT conversations into a searchable knowledge base! Search through all your ChatGPT discussions about coding, research, brainstorming, and more.\n\n```bash\npython -m apps.chatgpt_rag --export-path chatgpt_export.html --query \"How do I create a list in Python?\"\n```\n\n**Unlock your AI conversation history.** Never lose track of valuable insights from your ChatGPT discussions again.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: How to Export ChatGPT Data\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n**Step-by-step export process:**\n\n1. **Sign in to ChatGPT**\n2. **Click your profile icon** in the top right corner\n3. **Navigate to Settings** → **Data Controls**\n4. **Click \"Export\"** under Export Data\n5. **Confirm the export** request\n6. **Download the ZIP file** from the email link (expires in 24 hours)\n7. **Extract or use directly** with LEANN\n\n**Supported formats:**\n- `.html` files from ChatGPT exports\n- `.zip` archives from ChatGPT\n- Directories with multiple export files\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: ChatGPT-Specific Arguments\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n#### Parameters\n```bash\n--export-path PATH           # Path to ChatGPT export file (.html\u002F.zip) or directory (default: .\u002Fchatgpt_export)\n--separate-messages         # Process each message separately instead of concatenated conversations\n--chunk-size N              # Text chunk size (default: 512)\n--chunk-overlap N           # Overlap between chunks (default: 128)\n```\n\n#### Example Commands\n```bash\n# Basic usage with HTML export\npython -m apps.chatgpt_rag --export-path conversations.html\n\n# Process ZIP archive from ChatGPT\npython -m apps.chatgpt_rag --export-path chatgpt_export.zip\n\n# Search with specific query\npython -m apps.chatgpt_rag --export-path chatgpt_data.html --query \"Python programming help\"\n\n# Process individual messages for fine-grained search\npython -m apps.chatgpt_rag --separate-messages --export-path chatgpt_export.html\n\n# Process directory containing multiple exports\npython -m apps.chatgpt_rag --export-path .\u002Fchatgpt_exports\u002F --max-items 1000\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>💡 Click to expand: Example queries you can try\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nOnce your ChatGPT conversations are indexed, you can search with queries like:\n- \"What did I ask ChatGPT about Python programming?\"\n- \"Show me conversations about machine learning algorithms\"\n- \"Find discussions about web development frameworks\"\n- \"What coding advice did ChatGPT give me?\"\n- \"Search for conversations about debugging techniques\"\n- \"Find ChatGPT's recommendations for learning resources\"\n\n\u003C\u002Fdetails>\n\n### 🤖 Claude Chat History: Your Personal AI Conversation Archive!\n\nTransform your Claude conversations into a searchable knowledge base! Search through all your Claude discussions about coding, research, brainstorming, and more.\n\n```bash\npython -m apps.claude_rag --export-path claude_export.json --query \"What did I ask about Python dictionaries?\"\n```\n\n**Unlock your AI conversation history.** Never lose track of valuable insights from your Claude discussions again.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: How to Export Claude Data\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n**Step-by-step export process:**\n\n1. **Open Claude** in your browser\n2. **Navigate to Settings** (look for gear icon or settings menu)\n3. **Find Export\u002FDownload** options in your account settings\n4. **Download conversation data** (usually in JSON format)\n5. **Place the file** in your project directory\n\n*Note: Claude export methods may vary depending on the interface you're using. Check Claude's help documentation for the most current export instructions.*\n\n**Supported formats:**\n- `.json` files (recommended)\n- `.zip` archives containing JSON data\n- Directories with multiple export files\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: Claude-Specific Arguments\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n#### Parameters\n```bash\n--export-path PATH           # Path to Claude export file (.json\u002F.zip) or directory (default: .\u002Fclaude_export)\n--separate-messages         # Process each message separately instead of concatenated conversations\n--chunk-size N              # Text chunk size (default: 512)\n--chunk-overlap N           # Overlap between chunks (default: 128)\n```\n\n#### Example Commands\n```bash\n# Basic usage with JSON export\npython -m apps.claude_rag --export-path my_claude_conversations.json\n\n# Process ZIP archive from Claude\npython -m apps.claude_rag --export-path claude_export.zip\n\n# Search with specific query\npython -m apps.claude_rag --export-path claude_data.json --query \"machine learning advice\"\n\n# Process individual messages for fine-grained search\npython -m apps.claude_rag --separate-messages --export-path claude_export.json\n\n# Process directory containing multiple exports\npython -m apps.claude_rag --export-path .\u002Fclaude_exports\u002F --max-items 1000\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>💡 Click to expand: Example queries you can try\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nOnce your Claude conversations are indexed, you can search with queries like:\n- \"What did I ask Claude about Python programming?\"\n- \"Show me conversations about machine learning algorithms\"\n- \"Find discussions about software architecture patterns\"\n- \"What debugging advice did Claude give me?\"\n- \"Search for conversations about data structures\"\n- \"Find Claude's recommendations for learning resources\"\n\n\u003C\u002Fdetails>\n\n### 💬 iMessage History: Your Personal Conversation Archive!\n\nTransform your iMessage conversations into a searchable knowledge base! Search through all your text messages, group chats, and conversations with friends, family, and colleagues.\n\n```bash\npython -m apps.imessage_rag --query \"What did we discuss about the weekend plans?\"\n```\n\n**Unlock your message history.** Never lose track of important conversations, shared links, or memorable moments from your iMessage history.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: How to Access iMessage Data\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n**iMessage data location:**\n\niMessage conversations are stored in a SQLite database on your Mac at:\n```\n~\u002FLibrary\u002FMessages\u002Fchat.db\n```\n\n**Important setup requirements:**\n\n1. **Grant Full Disk Access** to your terminal or IDE:\n   - Open **System Preferences** → **Security & Privacy** → **Privacy**\n   - Select **Full Disk Access** from the left sidebar\n   - Click the **+** button and add your terminal app (Terminal, iTerm2) or IDE (VS Code, etc.)\n   - Restart your terminal\u002FIDE after granting access\n\n2. **Alternative: Use a backup database**\n   - If you have Time Machine backups or manual copies of the database\n   - Use `--db-path` to specify a custom location\n\n**Supported formats:**\n- Direct access to `~\u002FLibrary\u002FMessages\u002Fchat.db` (default)\n- Custom database path with `--db-path`\n- Works with backup copies of the database\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: iMessage-Specific Arguments\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n#### Parameters\n```bash\n--db-path PATH                    # Path to chat.db file (default: ~\u002FLibrary\u002FMessages\u002Fchat.db)\n--concatenate-conversations       # Group messages by conversation (default: True)\n--no-concatenate-conversations    # Process each message individually\n--chunk-size N                    # Text chunk size (default: 1000)\n--chunk-overlap N                 # Overlap between chunks (default: 200)\n```\n\n#### Example Commands\n```bash\n# Basic usage (requires Full Disk Access)\npython -m apps.imessage_rag\n\n# Search with specific query\npython -m apps.imessage_rag --query \"family dinner plans\"\n\n# Use custom database path\npython -m apps.imessage_rag --db-path \u002Fpath\u002Fto\u002Fbackup\u002Fchat.db\n\n# Process individual messages instead of conversations\npython -m apps.imessage_rag --no-concatenate-conversations\n\n# Limit processing for testing\npython -m apps.imessage_rag --max-items 100 --query \"weekend\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>💡 Click to expand: Example queries you can try\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nOnce your iMessage conversations are indexed, you can search with queries like:\n- \"What did we discuss about vacation plans?\"\n- \"Find messages about restaurant recommendations\"\n- \"Show me conversations with John about the project\"\n- \"Search for shared links about technology\"\n- \"Find group chat discussions about weekend events\"\n- \"What did mom say about the family gathering?\"\n\n\u003C\u002Fdetails>\n\n### MCP Integration: RAG on Live Data from Any Platform\n\nConnect to live data sources through the Model Context Protocol (MCP). LEANN now supports real-time RAG on platforms like Slack, Twitter, and more through standardized MCP servers.\n\n**Key Benefits:**\n- **Live Data Access**: Fetch real-time data without manual exports\n- **Standardized Protocol**: Use any MCP-compatible server\n- **Easy Extension**: Add new platforms with minimal code\n- **Secure Access**: MCP servers handle authentication\n\n#### 💬 Slack Messages: Search Your Team Conversations\n\nTransform your Slack workspace into a searchable knowledge base! Find discussions, decisions, and shared knowledge across all your channels.\n\n```bash\n# Test MCP server connection\npython -m apps.slack_rag --mcp-server \"slack-mcp-server\" --test-connection\n\n# Index and search Slack messages\npython -m apps.slack_rag \\\n  --mcp-server \"slack-mcp-server\" \\\n  --workspace-name \"my-team\" \\\n  --channels general dev-team random \\\n  --query \"What did we decide about the product launch?\"\n```\n\n**📖 Comprehensive Setup Guide**: For detailed setup instructions, troubleshooting common issues (like \"users cache is not ready yet\"), and advanced configuration options, see our [**Slack Setup Guide**](docs\u002Fslack-setup-guide.md).\n\n**Quick Setup:**\n1. Install a Slack MCP server (e.g., `npm install -g slack-mcp-server`)\n2. Create a Slack App and get API credentials (see detailed guide above)\n3. Set environment variables:\n   ```bash\n   export SLACK_BOT_TOKEN=\"xoxb-your-bot-token\"\n   export SLACK_APP_TOKEN=\"xapp-your-app-token\"  # Optional\n   ```\n4. Test connection with `--test-connection` flag\n\n**Arguments:**\n- `--mcp-server`: Command to start the Slack MCP server\n- `--workspace-name`: Slack workspace name for organization\n- `--channels`: Specific channels to index (optional)\n- `--concatenate-conversations`: Group messages by channel (default: true)\n- `--max-messages-per-channel`: Limit messages per channel (default: 100)\n- `--max-retries`: Maximum retries for cache sync issues (default: 5)\n- `--retry-delay`: Initial delay between retries in seconds (default: 2.0)\n\n#### 🐦 Twitter Bookmarks: Your Personal Tweet Library\n\nSearch through your Twitter bookmarks! Find that perfect article, thread, or insight you saved for later.\n\n```bash\n# Test MCP server connection\npython -m apps.twitter_rag --mcp-server \"twitter-mcp-server\" --test-connection\n\n# Index and search Twitter bookmarks\npython -m apps.twitter_rag \\\n  --mcp-server \"twitter-mcp-server\" \\\n  --max-bookmarks 1000 \\\n  --query \"What AI articles did I bookmark about machine learning?\"\n```\n\n**Setup Requirements:**\n1. Install a Twitter MCP server (e.g., `npm install -g twitter-mcp-server`)\n2. Get Twitter API credentials:\n   - Apply for a Twitter Developer Account at [developer.twitter.com](https:\u002F\u002Fdeveloper.twitter.com)\n   - Create a new app in the Twitter Developer Portal\n   - Generate API keys and access tokens with \"Read\" permissions\n   - For bookmarks access, you may need Twitter API v2 with appropriate scopes\n   ```bash\n   export TWITTER_API_KEY=\"your-api-key\"\n   export TWITTER_API_SECRET=\"your-api-secret\"\n   export TWITTER_ACCESS_TOKEN=\"your-access-token\"\n   export TWITTER_ACCESS_TOKEN_SECRET=\"your-access-token-secret\"\n   ```\n3. Test connection with `--test-connection` flag\n\n**Arguments:**\n- `--mcp-server`: Command to start the Twitter MCP server\n- `--username`: Filter bookmarks by username (optional)\n- `--max-bookmarks`: Maximum bookmarks to fetch (default: 1000)\n- `--no-tweet-content`: Exclude tweet content, only metadata\n- `--no-metadata`: Exclude engagement metadata\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>💡 Click to expand: Example queries you can try\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n**Slack Queries:**\n- \"What did the team discuss about the project deadline?\"\n- \"Find messages about the new feature launch\"\n- \"Show me conversations about budget planning\"\n- \"What decisions were made in the dev-team channel?\"\n\n**Twitter Queries:**\n- \"What AI articles did I bookmark last month?\"\n- \"Find tweets about machine learning techniques\"\n- \"Show me bookmarked threads about startup advice\"\n- \"What Python tutorials did I save?\"\n\n\u003C\u002Fdetails>\n\u003Csummary>\u003Cstrong>🔧 Using MCP with CLI Commands\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n**Want to use MCP data with regular LEANN CLI?** You can combine MCP apps with CLI commands:\n\n```bash\n# Step 1: Use MCP app to fetch and index data\npython -m apps.slack_rag --mcp-server \"slack-mcp-server\" --workspace-name \"my-team\"\n\n# Step 2: The data is now indexed and available via CLI\nleann search slack_messages \"project deadline\"\nleann ask slack_messages \"What decisions were made about the product launch?\"\n\n# Same for Twitter bookmarks\npython -m apps.twitter_rag --mcp-server \"twitter-mcp-server\"\nleann search twitter_bookmarks \"machine learning articles\"\n```\n\n**MCP vs Manual Export:**\n- **MCP**: Live data, automatic updates, requires server setup\n- **Manual Export**: One-time setup, works offline, requires manual data export\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>🔧 Adding New MCP Platforms\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nWant to add support for other platforms? LEANN's MCP integration is designed for easy extension:\n\n1. **Find or create an MCP server** for your platform\n2. **Create a reader class** following the pattern in `apps\u002Fslack_data\u002Fslack_mcp_reader.py`\n3. **Create a RAG application** following the pattern in `apps\u002Fslack_rag.py`\n4. **Test and contribute** back to the community!\n\n**Popular MCP servers to explore:**\n- GitHub repositories and issues\n- Discord messages\n- Notion pages\n- Google Drive documents\n- And many more in the MCP ecosystem!\n\n\u003C\u002Fdetails>\n\n### 🚀 Claude Code Integration: Transform Your Development Workflow!\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>AST‑Aware Code Chunking\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nLEANN features intelligent code chunking that preserves semantic boundaries (functions, classes, methods) for Python, Java, C#, and TypeScript, improving code understanding compared to text-based chunking.\n\n📖 Read the [AST Chunking Guide →](docs\u002Fast_chunking_guide.md)\n\n\u003C\u002Fdetails>\n\n**The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE.\n\n**Key features:**\n- 🔍 **Semantic code search** across your entire project, fully local index and lightweight\n- 🧠 **AST-aware chunking** preserves code structure (functions, classes)\n- 📚 **Context-aware assistance** for debugging and development\n- 🚀 **Zero-config setup** with automatic language detection\n\n```bash\n# Install LEANN globally for MCP integration\nuv tool install leann-core --with leann\nclaude mcp add --scope user leann-server -- leann_mcp\n# Setup is automatic - just start using Claude Code!\n```\nTry our fully agentic pipeline with auto query rewriting, semantic search planning, and more:\n\n![LEANN MCP Integration](assets\u002Fmcp_leann.png)\n\n**🔥 Ready to supercharge your coding?** [Complete Setup Guide →](packages\u002Fleann-mcp\u002FREADME.md)\n\n## Command Line Interface\n\nLEANN includes a powerful CLI for document processing and search. Perfect for quick document indexing and interactive chat.\n\n### Installation\n\nIf you followed the Quick Start, `leann` is already installed in your virtual environment:\n```bash\nsource .venv\u002Fbin\u002Factivate\nleann --help\n```\n\n**To make it globally available:**\n```bash\n# Install the LEANN CLI globally using uv tool\nuv tool install leann-core --with leann\n\n\n# Now you can use leann from anywhere without activating venv\nleann --help\n```\n\n> **Note**: Global installation is required for Claude Code integration. The `leann_mcp` server depends on the globally available `leann` command.\n\n\n\n### Usage Examples\n\n```bash\n# build from a specific directory, and my_docs is the index name(Here you can also build from multiple dict or multiple files)\nleann build my-docs --docs .\u002Fyour_documents\n\n# Search your documents\nleann search my-docs \"machine learning concepts\"\n\n# Interactive chat with your documents\nleann ask my-docs --interactive\n\n# Ask a single question (non-interactive)\nleann ask my-docs \"Where are prompts configured?\"\n\n# Detect file changes since last build\u002Fwatch checkpoint\nleann watch my-docs\n\n# List all your indexes\nleann list\n\n# Remove an index\nleann remove my-docs\n```\n\n**Key CLI features:**\n- Auto-detects document formats (PDF, TXT, MD, DOCX, PPTX + code files)\n- **🧠 AST-aware chunking** for Python, Java, C#, TypeScript files\n- Smart text chunking with overlap for all other content\n- **📂 File change detection** via Merkle tree snapshots (`leann watch`)\n- Multiple LLM providers (Ollama, OpenAI, HuggingFace)\n- Organized index storage in `.leann\u002Findexes\u002F` (project-local)\n- Support for advanced search parameters\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>📋 Click to expand: Complete CLI Reference\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nYou can use `leann --help`, or `leann build --help`, `leann search --help`, `leann watch --help`, `leann ask --help`, `leann list --help`, `leann remove --help` to get the complete CLI reference.\n\n**Build Command:**\n```bash\nleann build INDEX_NAME --docs DIRECTORY|FILE [DIRECTORY|FILE ...] [OPTIONS]\n\nOptions:\n  --backend {hnsw,diskann}     Backend to use (default: hnsw)\n  --embedding-model MODEL      Embedding model (default: facebook\u002Fcontriever)\n  --graph-degree N             Graph degree (default: 32)\n  --complexity N               Build complexity (default: 64)\n  --force                      Force rebuild existing index\n  --compact \u002F --no-compact     Use compact storage (default: true). Must be `no-compact` for `no-recompute` build.\n  --recompute \u002F --no-recompute Enable recomputation (default: true)\n```\n\n**Search Command:**\n```bash\nleann search INDEX_NAME QUERY [OPTIONS]\n\nOptions:\n  --top-k N                     Number of results (default: 5)\n  --complexity N                Search complexity (default: 64)\n  --recompute \u002F --no-recompute  Enable\u002Fdisable embedding recomputation (default: enabled). Should not do a `no-recompute` search in a `recompute` build.\n  --pruning-strategy {global,local,proportional}\n```\n\n**Watch Command:**\n```bash\nleann watch INDEX_NAME\n\n# Compares the current file system state against the last checkpoint (Merkle tree snapshot)\n# and reports which files have been added, removed, or modified, along with their chunk IDs.\n#\n# - Automatically saves a new checkpoint after detecting changes\n# - Each subsequent run compares against the most recent checkpoint\n# - File change detection uses SHA-256 content hashing via a Merkle tree\n#\n# Example output:\n#   === Changes since last checkpoint ===\n#   modified (1):\n#     - \u002Fpath\u002Fto\u002Ffile.py\n#       chunks: 42, 43, 44\n```\n\n**Ask Command:**\n```bash\nleann ask INDEX_NAME [OPTIONS]\n\nOptions:\n  --llm {ollama,openai,hf,anthropic}    LLM provider (default: ollama)\n  --model MODEL                         Model name (default: qwen3:8b)\n  --interactive                         Interactive chat mode\n  --top-k N                             Retrieval count (default: 20)\n```\n\n**List Command:**\n```bash\nleann list\n\n# Lists all indexes across all projects with status indicators:\n# ✅ - Index is complete and ready to use\n# ❌ - Index is incomplete or corrupted\n# 📁 - CLI-created index (in .leann\u002Findexes\u002F)\n# 📄 - App-created index (*.leann.meta.json files)\n```\n\n**Remove Command:**\n```bash\nleann remove INDEX_NAME [OPTIONS]\n\nOptions:\n  --force, -f    Force removal without confirmation\n\n# Smart removal: automatically finds and safely removes indexes\n# - Shows all matching indexes across projects\n# - Requires confirmation for cross-project removal\n# - Interactive selection when multiple matches found\n# - Supports both CLI and app-created indexes\n```\n\n\u003C\u002Fdetails>\n\n## 🚀 Advanced Features\n\n### 🎯 Metadata Filtering\n\nLEANN supports a simple metadata filtering system to enable sophisticated use cases like document filtering by date\u002Ftype, code search by file extension, and content management based on custom criteria.\n\n```python\n# Add metadata during indexing\nbuilder.add_text(\n    \"def authenticate_user(token): ...\",\n    metadata={\"file_extension\": \".py\", \"lines_of_code\": 25}\n)\n\n# Search with filters\nresults = searcher.search(\n    query=\"authentication function\",\n    metadata_filters={\n        \"file_extension\": {\"==\": \".py\"},\n        \"lines_of_code\": {\"\u003C\": 100}\n    }\n)\n```\n\n**Supported operators**: `==`, `!=`, `\u003C`, `\u003C=`, `>`, `>=`, `in`, `not_in`, `contains`, `starts_with`, `ends_with`, `is_true`, `is_false`\n\n📖 **[Complete Metadata filtering guide →](docs\u002Fmetadata_filtering.md)**\n\n### 🔍 Grep Search\n\nFor exact text matching instead of semantic search, use the `use_grep` parameter:\n\n```python\n# Exact text search\nresults = searcher.search(\"banana‑crocodile\", use_grep=True, top_k=1)\n```\n\n**Use cases**: Finding specific code patterns, error messages, function names, or exact phrases where semantic similarity isn't needed.\n\n📖 **[Complete grep search guide →](docs\u002Fgrep_search.md)**\n\n## 🏗️ Architecture & How It Works\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Farch.png\" alt=\"LEANN Architecture\" width=\"800\">\n\u003C\u002Fp>\n\n**The magic:** Most vector DBs store every single embedding (expensive). LEANN stores a pruned graph structure (cheap) and recomputes embeddings only when needed (fast).\n\n**Core techniques:**\n- **Graph-based selective recomputation:** Only compute embeddings for nodes in the search path\n- **High-degree preserving pruning:** Keep important \"hub\" nodes while removing redundant connections\n- **Dynamic batching:** Efficiently batch embedding computations for GPU utilization\n- **Two-level search:** Smart graph traversal that prioritizes promising nodes\n\n**Backends:**\n- **HNSW** (default): Ideal for most datasets with maximum storage savings through full recomputation\n- **DiskANN**: Advanced option with superior search performance, using PQ-based graph traversal with real-time reranking for the best speed-accuracy trade-off\n\n## Benchmarks\n\n**[DiskANN vs HNSW Performance Comparison →](benchmarks\u002Fdiskann_vs_hnsw_speed_comparison.py)** - Compare search performance between both backends\n\n**[Simple Example: Compare LEANN vs FAISS →](benchmarks\u002Fcompare_faiss_vs_leann.py)** - See storage savings in action\n\n### 📊 Storage Comparison\n\n| System | DPR (2.1M) | Wiki (60M) | Chat (400K) | Email (780K) | Browser (38K) |\n|--------|-------------|------------|-------------|--------------|---------------|\n| Traditional vector database (e.g., FAISS) | 3.8 GB      | 201 GB     | 1.8 GB     | 2.4 GB      | 130 MB        |\n| LEANN  | 324 MB      | 6 GB       | 64 MB       | 79 MB       | 6.4 MB        |\n| Savings| 91%         | 97%        | 97%         | 97%         | 95%           |\n\n\n\n## Reproduce Our Results\n\n```bash\nuv run benchmarks\u002Frun_evaluation.py    # Will auto-download evaluation data and run benchmarks\nuv run benchmarks\u002Frun_evaluation.py benchmarks\u002Fdata\u002Findices\u002Frpj_wiki\u002Frpj_wiki --num-queries 2000    # After downloading data, you can run the benchmark with our biggest index\n```\n\nThe evaluation script downloads data automatically on first run. The last three results were tested with partial personal data, and you can reproduce them with your own data!\n## 🔬 Paper\n\nIf you find Leann useful, please cite:\n\n**[LEANN: A Low-Storage Vector Index](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08276)**\n\n```bibtex\n@misc{wang2025leannlowstoragevectorindex,\n      title={LEANN: A Low-Storage Vector Index},\n      author={Yichuan Wang and Shu Liu and Zhifei Li and Yongji Wu and Ziming Mao and Yilong Zhao and Xiao Yan and Zhiying Xu and Yang Zhou and Ion Stoica and Sewon Min and Matei Zaharia and Joseph E. Gonzalez},\n      year={2025},\n      eprint={2506.08276},\n      archivePrefix={arXiv},\n      primaryClass={cs.DB},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08276},\n}\n```\n\n## ✨ [Detailed Features →](docs\u002Ffeatures.md)\n\n## 🤝 [CONTRIBUTING →](docs\u002FCONTRIBUTING.md)\n\n\n## ❓ [FAQ →](docs\u002Ffaq.md)\n\n\n## 📈 [Roadmap →](docs\u002Froadmap.md)\n\n## 📄 License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n## 🙏 Acknowledgments\n\nCore Contributors: [Yichuan Wang](https:\u002F\u002Fyichuan-w.github.io\u002F) & [Zhifei Li](https:\u002F\u002Fgithub.com\u002Fandylizf).\n\nActive Contributors: [Gabriel Dehan](https:\u002F\u002Fgithub.com\u002Fgabriel-dehan), [Aakash Suresh](https:\u002F\u002Fgithub.com\u002FASuresh0524)\n\n\nWe welcome more contributors! Feel free to open issues or submit PRs.\n\nThis work is done at [**Berkeley Sky Computing Lab**](https:\u002F\u002Fsky.cs.berkeley.edu\u002F).\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=yichuan-w\u002FLEANN&type=Date)](https:\u002F\u002Fwww.star-history.com\u002F#yichuan-w\u002FLEANN&Date)\n\u003Cp align=\"center\">\n  \u003Cstrong>⭐ Star us on GitHub if Leann is useful for your research or applications!\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  Made with ❤️ by the Leann team\n\u003C\u002Fp>\n\n## 🤖 Explore LEANN with AI\n\nLEANN is indexed on [DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fyichuan-w\u002FLEANN), so you can ask questions to LLMs using Deep Research to explore the codebase and get help to add new features.\n","LEANN 是一个创新的向量数据库，旨在将个人设备转变为强大的检索增强生成（RAG）系统。通过采用基于图的选择性重计算和高阶保留剪枝技术，LEANN 能够在保持查询准确性的前提下，实现比传统方案节省97%存储空间的效果。其核心功能包括本地化处理、隐私保护以及支持多种文件类型的语义搜索。适用于需要在个人设备上高效管理并快速检索大量文档资料的场景，特别适合对数据隐私有较高要求的研究者和个人用户。",2,"2026-05-29 03:47:44","high_star"]