[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73320":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":29,"discoverSource":30},73320,"chunkr","lumina-ai-inc\u002Fchunkr","lumina-ai-inc","Vision infrastructure to turn complex documents into RAG\u002FLLM-ready data","https:\u002F\u002Fchunkr.ai",null,"Rust",2948,182,17,11,0,2,4,6,65.39,"GNU Affero General Public License v3.0",false,"main",true,[],"2026-06-12 04:01:09","\u003Cbr \u002F>\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flumina-ai-inc\u002Fchunkr\">\n    \u003Cimg src=\"images\u002Flogo.svg\" alt=\"Chunkr Logo\" width=\"80\" height=\"80\">\n  \u003C\u002Fa>\n\n\u003Ch3 align=\"center\">Chunkr | Open Source Document Intelligence API\u003C\u002Fh3>\n\n  \u003Cp align=\"center\">\n    Production-ready service for document layout analysis, OCR, and semantic chunking.\u003Cbr \u002F>\n    Convert PDFs, PPTs, Word docs & images into RAG\u002FLLM-ready chunks.\n    \u003Cbr \u002F>\u003Cbr \u002F>\n    \u003Cb>Layout Analysis\u003C\u002Fb> | \u003Cb>OCR + Bounding Boxes\u003C\u002Fb> | \u003Cb>Structured HTML & Markdown\u003C\u002Fb> | \u003Cb>Vision-Language Model Processing\u003C\u002Fb>\n    \u003Cbr \u002F>\u003Cbr \u002F>\n    👉 \u003Cb>Note:\u003C\u002Fb> The \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flumina-ai-inc\u002Fchunkr\">open-source AGPL version\u003C\u002Fa> is **different** from our fully managed \u003Ca href=\"https:\u002F\u002Fwww.chunkr.ai\">Cloud API\u003C\u002Fa>.  \n    The open-source release uses community\u002Fopen-source models, while the Cloud API runs **proprietary in-house models** for higher accuracy, speed, and enterprise reliability.\n    \u003Cbr \u002F>\u003Cbr \u002F>\n    \u003Ca href=\"https:\u002F\u002Fwww.chunkr.ai\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTry_it_out-chunkr.ai-blue?style=flat&logo=rocket&height=20\" alt=\"Try it out\" height=\"20\">\u003C\u002Fa>\n    &nbsp;&nbsp;&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flumina-ai-inc\u002Fchunkr\u002Fissues\u002Fnew\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FReport_Bug-GitHub_Issues-red?style=flat&logo=github&height=20\" alt=\"Report Bug\" height=\"20\">\u003C\u002Fa>\n    &nbsp;&nbsp;&nbsp;\n    \u003Ca href=\"#connect-with-us\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContact-Get_in_Touch-green?style=flat&logo=mail&height=20\" alt=\"Contact\" height=\"20\">\u003C\u002Fa>\n    &nbsp;&nbsp;&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FXzKWFByKzW\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join_Community-5865F2?style=flat&logo=discord&logoColor=white&height=20\" alt=\"Discord\" height=\"20\">\u003C\u002Fa>\n    &nbsp;&nbsp;&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fdeepwiki.com\u002Flumina-ai-inc\u002Fchunkr\">\u003Cimg src=\"https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg\" alt=\"Ask DeepWiki\">\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.chunkr.ai\" width=\"1200\" height=\"630\">\n    \u003Cimg src=\"https:\u002F\u002Fchunkr.ai\u002Fog-image.png\" alt=\"Chunkr Cloud API\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n## Table of Contents\n- [Table of Contents](#table-of-contents)\n- [(Super) Quick Start](#super-quick-start)\n- [Documentation](#documentation)\n- [Open Source vs Cloud API vs Enterprise](#open-source-vs-cloud-api-vs-enterprise)\n- [Quick Start with Docker Compose](#quick-start-with-docker-compose)\n- [LLM Configuration](#llm-configuration)\n  - [Using models.yaml (Recommended)](#using-modelsyaml-recommended)\n  - [Using environment variables (Basic)](#using-environment-variables-basic)\n  - [Common LLM API Providers](#common-llm-api-providers)\n- [Licensing](#licensing)\n- [Connect With Us](#connect-with-us)\n\n## Open Source vs Cloud API vs Enterprise\n\n| Feature | Open Source Repo (good) | Cloud API - chunkr.ai (best) | Enterprise |\n|---------|--------------------|------------------------|------------|\n| **Perfect for** | Development & testing | Production workloads | Large-scale \u002F High-security |\n| **Layout Analysis** | Uses open-source models | Proprietary in-house models | In-house + custom-tuned |\n| **OCR Accuracy** | Community OCR engines | Optimized OCR stack | Optimized + domain-tuned |\n| **VLM Processing** | Basic open VLMs | Enhanced proprietary VLMs | Custom fine-tunes |\n| **Excel Support** | ❌ | ✅ Native parser | ✅ Native parser |\n| **Document Types** | PDF, PPT, Word, Images | PDF, PPT, Word, Images, Excel | PDF, PPT, Word, Images, Excel |\n| **Infrastructure** | Self-hosted | Fully managed cloud | Managed \u002F On-prem |\n| **Support** | Discord community | Dedicated support | Dedicated founding team |\n| **Migration Support** | Community-driven | Docs + email | Dedicated migration team |\n\n---\n\nThe **open-source release** is ideal if you want transparency, local hosting, or to experiment with Chunkr’s pipeline.  \nFor **best performance, production reliability, and access to in-house models**, we recommend the \u003Ca href=\"https:\u002F\u002Fwww.chunkr.ai\">Chunkr Cloud API\u003C\u002Fa>.  \nFor **high-security or regulated industries**, our **Enterprise edition** offers on-prem or VPC deployments.\n\n\n## Quick Start with Docker Compose\n\n1. Prerequisites:\n   - [Docker and Docker Compose](https:\u002F\u002Fdocs.docker.com\u002Fget-docker\u002F)\n   - [NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Finstall-guide.html) (for GPU support, optional)\n\n2. Clone the repo:\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Flumina-ai-inc\u002Fchunkr\ncd chunkr\n```\n\n3. Set up environment variables:\n```bash\n# Copy the example environment file\ncp .env.example .env\n\n# Configure your llm models\ncp models.example.yaml models.yaml\n```\n\nFor more information on how to set up LLMs, see [here](#llm-configuration).\n\n4. Start the services:\n```bash\n# For GPU deployment:\ndocker compose up -d\n\n# For CPU-only deployment:\ndocker compose -f compose.yaml -f compose.cpu.yaml up -d\n\n# For Mac ARM architecture (M1, M2, M3, etc.):\ndocker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml up -d\n```\n\n5. Access the services:\n   - Web UI: `http:\u002F\u002Flocalhost:5173`\n   - API: `http:\u002F\u002Flocalhost:8000`\n\n6. Stop the services when done:\n```bash\n# For GPU deployment:\ndocker compose down\n\n# For CPU-only deployment:\ndocker compose -f compose.yaml -f compose.cpu.yaml down\n\n# For Mac ARM architecture (M1, M2, M3, etc.):\ndocker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml down\n```\n## LLM Configuration\n\nChunkr supports two ways to configure LLMs:\n\n1. **models.yaml file**: Advanced configuration for multiple LLMs with additional options\n2. **Environment variables**: Simple configuration for a single LLM\n\n### Using models.yaml (Recommended)\n\nFor more flexible configuration with multiple models, default\u002Ffallback options, and rate limits:\n\n1. Copy the example file to create your configuration:\n```bash\ncp models.example.yaml models.yaml\n```\n\n2. Edit the models.yaml file with your configuration. Example:\n```yaml\nmodels:\n  - id: gpt-4o\n    model: gpt-4o\n    provider_url: https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fchat\u002Fcompletions\n    api_key: \"your_openai_api_key_here\"\n    default: true\n    rate-limit: 200 # requests per minute - optional\n```\n\nBenefits of using models.yaml:\n- Configure multiple LLM providers simultaneously\n- Set default and fallback models\n- Add distributed rate limits per model\n- Reference models by ID in API requests (see docs for more info)\n\n>Read the `models.example.yaml` file for more information on the available options.\n\n### Using environment variables (Basic)\n\nYou can use any OpenAI API compatible endpoint by setting the following variables in your .env file:\n``` \nLLM__KEY:\nLLM__MODEL:\nLLM__URL:\n```\n\n### Common LLM API Providers\n\nBelow is a table of common LLM providers and their configuration details to get you started:\n\n| Provider         | API URL                                                                  | Documentation                                                                                                                          |\n| ---------------- | ------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------- |\n| OpenAI           | https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fchat\u002Fcompletions                               | [OpenAI Docs](https:\u002F\u002Fplatform.openai.com\u002Fdocs)                                                                                        |\n| Google AI Studio | https:\u002F\u002Fgenerativelanguage.googleapis.com\u002Fv1beta\u002Fopenai\u002Fchat\u002Fcompletions | [Google AI Docs](https:\u002F\u002Fai.google.dev\u002Fgemini-api\u002Fdocs\u002Fopenai)                                                                         |\n| OpenRouter       | https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1\u002Fchat\u002Fcompletions                            | [OpenRouter Models](https:\u002F\u002Fopenrouter.ai\u002Fmodels)                                                                                      |\n| Self-Hosted      | http:\u002F\u002Flocalhost:8000\u002Fv1                                                 | [VLLM](https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Flatest\u002Fserving\u002Fopenai_compatible_server.html) or [Ollama](https:\u002F\u002Follama.com\u002Fblog\u002Fopenai-compatibility) |\n\n## Licensing\n\nThe core of this project is dual-licensed:\n\n1. [GNU Affero General Public License v3.0 (AGPL-3.0)](LICENSE)\n2. Commercial License\n\nTo use Chunkr without complying with the AGPL-3.0 license terms you can [contact us](mailto:mehul@chunkr.ai) or visit our [website](https:\u002F\u002Fchunkr.ai).\n\n## Connect With Us\n- 📧 Email: [mehul@chunkr.ai](mailto:mehul@chunkr.ai)\n- 📅 Schedule a call: [Book a 30-minute meeting](https:\u002F\u002Fcal.com\u002Fmehulc\u002F30min)\n- 🌐 Visit our website: [chunkr.ai](https:\u002F\u002Fchunkr.ai)\n","Chunkr 是一个用于将复杂文档转换为适合检索增强生成（RAG）和大语言模型（LLM）处理的数据的视觉基础设施。其核心功能包括文档布局分析、光学字符识别（OCR）、边界框标注以及结构化HTML与Markdown输出，支持PDF、PPT、Word文档及图像等多种格式。采用Rust语言开发，确保了高性能与安全性。适用于需要从非结构化文档中提取信息并进行后续自然语言处理的应用场景，如知识管理、内容自动化摘要或智能文档解析等。开源版本遵循AGPLv3许可证，并提供社区支持；同时也有基于专有模型的云服务选项以满足更高精度和企业级需求。","2026-06-11 03:44:59","high_star"]