[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-5544":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":47,"readmeContent":48,"aiSummary":49,"trendingCount":16,"starSnapshotCount":16,"syncStatus":50,"lastSyncTime":51,"discoverSource":52},5544,"kreuzberg","kreuzberg-dev\u002Fkreuzberg","kreuzberg-dev","A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node\u002FBun\u002FWasm\u002FDeno)- or use via CLI, REST API, or MCP server.","https:\u002F\u002Fkreuzberg.dev\u002F",null,"Rust",8480,497,29,13,0,8,40,195,34,39.09,"Other",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46],"bun","csharp","document-intelligence","elixir","ffi","golang","java","metadata-extraction","node","pdf-extraction","pdfium","php","python","rag","ruby","rust","table-extraction","tesseract","text-extraction","wasm","2026-06-12 02:01:11","# Kreuzberg\n\n\u003Cdiv align=\"center\" style=\"display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;\">\n  \u003C!-- Language Bindings -->\n  \u003Ca href=\"https:\u002F\u002Fcrates.io\u002Fcrates\u002Fkreuzberg\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fkreuzberg?label=Rust&color=007ec6\" alt=\"Rust\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhex.pm\u002Fpackages\u002Fkreuzberg\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fhexpm\u002Fv\u002Fkreuzberg?label=Elixir&color=007ec6\" alt=\"Elixir\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fkreuzberg\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fkreuzberg?label=Python&color=007ec6\" alt=\"Python\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002F@kreuzberg\u002Fnode\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fv\u002F@kreuzberg\u002Fnode?label=Node.js&color=007ec6\" alt=\"Node.js\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002F@kreuzberg\u002Fwasm\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fv\u002F@kreuzberg\u002Fwasm?label=WASM&color=007ec6\" alt=\"WASM\">\n  \u003C\u002Fa>\n\n  \u003Ca href=\"https:\u002F\u002Fcentral.sonatype.com\u002Fartifact\u002Fdev.kreuzberg\u002Fkreuzberg\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fmaven-central\u002Fv\u002Fdev.kreuzberg\u002Fkreuzberg?label=Java&color=007ec6\" alt=\"Java\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Freleases\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Ftag\u002Fkreuzberg-dev\u002Fkreuzberg?label=Go&color=007ec6&filter=v4.9.5\" alt=\"Go\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.nuget.org\u002Fpackages\u002FKreuzberg\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fnuget\u002Fv\u002FKreuzberg?label=C%23&color=007ec6\" alt=\"C#\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpackagist.org\u002Fpackages\u002Fkreuzberg\u002Fkreuzberg\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpackagist\u002Fv\u002Fkreuzberg\u002Fkreuzberg?label=PHP&color=007ec6\" alt=\"PHP\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Frubygems.org\u002Fgems\u002Fkreuzberg\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgem\u002Fv\u002Fkreuzberg?label=Ruby&color=007ec6\" alt=\"Ruby\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fkreuzberg-dev.r-universe.dev\u002Fkreuzberg\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FR-kreuzberg-007ec6\" alt=\"R\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Fpkgs\u002Fcontainer\u002Fkreuzberg\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocker-007ec6?logo=docker&logoColor=white\" alt=\"Docker\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Freleases\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FC-FFI-007ec6\" alt=\"C\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fartifacthub.io\u002Fpackages\u002Fsearch?repo=kreuzberg\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fartifacthub.io\u002Fbadge\u002Frepository\u002Fkreuzberg\" alt=\"Artifact Hub\">\n  \u003C\u002Fa>\n\n  \u003C!-- Project Info -->\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Fblob\u002Fmain\u002FLICENSE\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Elastic--2.0-blue.svg\" alt=\"License\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fdocs.kreuzberg.dev\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-kreuzberg.dev-007ec6\" alt=\"Documentation\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fdocs.kreuzberg.dev\u002Fdemo.html\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%E2%96%B6%EF%B8%8F_Live_Demo-007ec6\" alt=\"Live Demo\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FKreuzberg\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97_Hugging_Face-007ec6\" alt=\"Hugging Face\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cimg width=\"3384\" height=\"573\" alt=\"Linkedin- Banner\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1b6c6ad7-3b6d-4171-b1c9-f2026cc9deb8\" \u002F>\n\n\u003Cdiv align=\"center\" style=\"margin-top: 20px;\">\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Fxt9WY3GnKR\">\n      \u003Cimg height=\"22\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join%20our%20community-7289da?logo=discord&logoColor=white\" alt=\"Discord\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\nExtract text, metadata, and code intelligence from 97+ file formats and 305 programming languages at native speeds without needing a GPU.\n\n## Key Features\n\n- **Code intelligence** – Extract functions, classes, imports, symbols, and docstrings from [248 programming languages](https:\u002F\u002Fdocs.tree-sitter-language-pack.kreuzberg.dev) via tree-sitter. Results in `ExtractionResult.code_intelligence` with semantic chunking\n- **Extensible architecture** – Plugin system for custom OCR backends, validators, post-processors, document extractors, and renderers\n- **Polyglot** – Native bindings for Rust, Python, TypeScript\u002FNode.js, Ruby, Go, Java, C#, PHP, Elixir, R, and C\n- **91+ file formats** – PDF, Office documents, images, HTML, XML, emails, archives, academic formats across 8 categories\n- **LLM intelligence** – VLM OCR (GPT-4o, Claude, Gemini, Ollama), structured JSON extraction with schema constraints, and provider-hosted embeddings via 146 LLM providers (including local engines: Ollama, LM Studio, vLLM, llama.cpp) through [liter-llm](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fliter-llm)\n- **OCR support** – Tesseract (all bindings, including Tesseract-WASM for browsers), PaddleOCR (all native bindings), EasyOCR (Python), VLM OCR (146 vision model providers including local engines), extensible via plugin API\n- **High performance** – Rust core with native PDFium, SIMD optimizations and full parallelism\n- **Flexible deployment** – Use as library, CLI tool, REST API server, or MCP server\n- **TOON wire format** – Token-efficient serialization for LLM\u002FRAG pipelines, ~30-50% fewer tokens than JSON\n- **GFM-quality output** – Comrak-based rendering with proper fenced code blocks, table nodes, bracket escaping, and cross-format parity (Markdown, HTML, Djot, Plain)\n- **HTML passthrough** – HTML-to-Markdown conversion uses html-to-markdown output directly, bypassing lossy intermediate round-trips\n- **Memory efficient** – Streaming parsers for multi-GB files\n\n**[Complete Documentation](https:\u002F\u002Fkreuzberg.dev\u002F)** | **[Live Demo](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fdemo.html)** | **[Installation Guides](#installation)**\n\n## Installation\n\nEach language binding provides comprehensive documentation with examples and best practices. Choose your platform to get started:\n\n**Scripting Languages:**\n\n- **[Python](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fpackages\u002Fpython)** – PyPI package, async\u002Fsync APIs, OCR backends (Tesseract, PaddleOCR, EasyOCR)\n- **[Ruby](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fpackages\u002Fruby)** – RubyGems package, idiomatic Ruby API, native bindings\n- **[PHP](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fpackages\u002Fphp)** – Composer package, modern PHP 8.4+ support, type-safe API, async extraction\n- **[Elixir](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fpackages\u002Felixir)** – Hex package, OTP integration, concurrent processing\n- **[R](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fpackages\u002Fr)** – r-universe package, idiomatic R API, extendr bindings\n\n**JavaScript\u002FTypeScript:**\n\n- **[@kreuzberg\u002Fnode](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fcrates\u002Fkreuzberg-node)** – Native NAPI-RS bindings for Node.js\u002FBun, fastest performance\n- **[@kreuzberg\u002Fwasm](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fpackages\u002Ftypescript)** – WebAssembly for browsers\u002FDeno\u002FCloudflare Workers, full feature parity (PDF, Excel, OCR, archives)\n\n**Compiled Languages:**\n\n- **[Go](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fpackages\u002Fgo)** – Go module with FFI bindings, context-aware async\n- **[Java](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fpackages\u002Fjava)** – Maven Central, Foreign Function & Memory API\n- **[C#](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fpackages\u002Fcsharp)** – NuGet package, .NET 6.0+, full async\u002Fawait support\n\n**Native:**\n\n- **[Rust](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fcrates\u002Fkreuzberg)** – Core library, flexible feature flags, zero-copy APIs\n- **[C (FFI)](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg\u002Ftree\u002Fmain\u002Fcrates\u002Fkreuzberg-ffi)** – C header + shared library, pkg-config\u002FCMake support, cross-platform\n\n**Containers:**\n\n- **[Docker](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fguides\u002Fdocker\u002F)** – Official images with API, CLI, and MCP server modes (Core: ~1.0-1.3GB, Full: ~1.0-1.3GB with OCR + legacy format support)\n\n**Command-Line:**\n\n- **[CLI](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fcli\u002Fusage\u002F)** – Cross-platform binary, batch processing, MCP server mode\n\n> All language bindings include precompiled binaries for both x86_64 and aarch64 architectures on Linux and macOS.\n\n## Platform Support\n\nComplete architecture coverage across all language bindings:\n\n| Language | Linux x86_64 | Linux aarch64 | macOS ARM64 | Windows x64 |\n| -------- | :----------: | :-----------: | :---------: | :---------: |\n| Python   |      ✅      |      ✅       |     ✅      |     ✅      |\n| Node.js  |      ✅      |      ✅       |     ✅      |     ✅      |\n| WASM     |      ✅      |      ✅       |     ✅      |     ✅      |\n| Ruby     |      ✅      |      ✅       |     ✅      |      -      |\n| R        |      ✅      |      ✅       |     ✅      |     ✅      |\n| Elixir   |      ✅      |      ✅       |     ✅      |     ✅      |\n| Go       |      ✅      |      ✅       |     ✅      |     ✅      |\n| Java     |      ✅      |      ✅       |     ✅      |     ✅      |\n| C#       |      ✅      |      ✅       |     ✅      |     ✅      |\n| PHP      |      ✅      |      ✅       |     ✅      |     ✅      |\n| Rust     |      ✅      |      ✅       |     ✅      |     ✅      |\n| C (FFI)  |      ✅      |      ✅       |     ✅      |     ✅      |\n| CLI      |      ✅      |      ✅       |     ✅      |     ✅      |\n| Docker   |      ✅      |      ✅       |     ✅      |      -      |\n\n**Note**: ✅ = Precompiled binaries available with instant installation. WASM runs in any environment with WebAssembly support (browsers, Deno, Bun, Cloudflare Workers). All platforms are tested in CI. MacOS support is Apple Silicon only.\n\n### Embeddings Support (Optional)\n\nTo use embeddings functionality:\n\n1. **Install ONNX Runtime 1.24+**:\n   - Linux: Download from [ONNX Runtime releases](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Freleases) (Debian packages may have older versions)\n   - MacOS: `brew install onnxruntime`\n   - Windows: Download from [ONNX Runtime releases](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Freleases)\n\n2. Use embeddings in your code - see [Embeddings Guide](https:\u002F\u002Fdocs.kreuzberg.dev\u002Ffeatures\u002F#embeddings)\n\n**Note:** Kreuzberg requires ONNX Runtime version 1.24+ for embeddings. All other Kreuzberg features work without ONNX Runtime.\n\n## Supported Formats\n\n91+ file formats across 8 major categories with intelligent format detection and comprehensive metadata extraction.\n\n### Office Documents\n\n| Category            | Formats                                                                                          | Capabilities                                       |\n| ------------------- | ------------------------------------------------------------------------------------------------ | -------------------------------------------------- |\n| **Word Processing** | `.docx`, `.docm`, `.dotx`, `.dotm`, `.dot`, `.odt`, `.pages`                                     | Full text, tables, lists, images, metadata, styles |\n| **Spreadsheets**    | `.xlsx`, `.xlsm`, `.xlsb`, `.xls`, `.xla`, `.xlam`, `.xltm`, `.xltx`, `.xlt`, `.ods`, `.numbers` | Sheet data, formulas, cell metadata, charts        |\n| **Presentations**   | `.pptx`, `.pptm`, `.ppsx`, `.potx`, `.potm`, `.pot`, `.key`                                      | Slides, speaker notes, images, metadata            |\n| **PDF**             | `.pdf`                                                                                           | Text, tables, images, metadata, OCR support        |\n| **eBooks**          | `.epub`, `.fb2`                                                                                  | Chapters, metadata, embedded resources             |\n| **Database**        | `.dbf`                                                                                           | Table data extraction, field type support          |\n| **Hangul**          | `.hwp`, `.hwpx`                                                                                  | Korean document format, text extraction            |\n\n### Images (OCR-Enabled)\n\n| Category     | Formats                                                                          | Features                                                     |\n| ------------ | -------------------------------------------------------------------------------- | ------------------------------------------------------------ |\n| **Raster**   | `.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.bmp`, `.tiff`, `.tif`                | OCR, table detection, EXIF metadata, dimensions, color space |\n| **Advanced** | `.jp2`, `.jpx`, `.jpm`, `.mj2`, `.jbig2`, `.jb2`, `.pnm`, `.pbm`, `.pgm`, `.ppm` | Pure Rust decoders (JPEG 2000, JBIG2), OCR, table detection  |\n| **Vector**   | `.svg`                                                                           | DOM parsing, embedded text, graphics metadata                |\n\n### Web & Data\n\n| Category            | Formats                                                             | Features                                                          |\n| ------------------- | ------------------------------------------------------------------- | ----------------------------------------------------------------- |\n| **Markup**          | `.html`, `.htm`, `.xhtml`, `.xml`, `.svg`                           | DOM parsing, metadata (Open Graph, Twitter Card), link extraction |\n| **Structured Data** | `.json`, `.yaml`, `.yml`, `.toml`, `.csv`, `.tsv`                   | Schema detection, nested structures, validation                   |\n| **Text & Markdown** | `.txt`, `.md`, `.markdown`, `.djot`, `.mdx`, `.rst`, `.org`, `.rtf` | CommonMark, GFM, Djot, MDX, reStructuredText, Org Mode, Rich Text |\n\n### Email & Archives\n\n| Category     | Formats                              | Features                                                |\n| ------------ | ------------------------------------ | ------------------------------------------------------- |\n| **Email**    | `.eml`, `.msg`                       | Headers, body (HTML\u002Fplain), attachments, UTF-16 support |\n| **Archives** | `.zip`, `.tar`, `.tgz`, `.gz`, `.7z` | Recursive extraction, nested archives, metadata         |\n\n### Academic & Scientific\n\n| Category          | Formats                                               | Features                                                    |\n| ----------------- | ----------------------------------------------------- | ----------------------------------------------------------- |\n| **Citations**     | `.bib`, `.ris`, `.nbib`, `.enw`, `.csl`               | BibTeX\u002FBibLaTeX, RIS, PubMed\u002FMEDLINE, EndNote XML, CSL JSON |\n| **Scientific**    | `.tex`, `.latex`, `.typ`, `.typst`, `.jats`, `.ipynb` | LaTeX, Typst, JATS journal articles, Jupyter notebooks      |\n| **Publishing**    | `.fb2`, `.docbook`, `.dbk`, `.opml`                   | FictionBook, DocBook XML, OPML outlines                     |\n| **Documentation** | `.pod`, `.mdoc`, `.troff`                             | Perl POD, man pages, troff                                  |\n\n**[Complete Format Reference →](https:\u002F\u002Fdocs.kreuzberg.dev\u002Freference\u002Fformats\u002F)**\n\n### Code Intelligence (248 Languages)\n\n| Feature                    | Description                                                   |\n| -------------------------- | ------------------------------------------------------------- |\n| **Structure Extraction**   | Functions, classes, methods, structs, interfaces, enums       |\n| **Import\u002FExport Analysis** | Module dependencies, re-exports, wildcard imports             |\n| **Symbol Extraction**      | Variables, constants, type aliases, properties                |\n| **Docstring Parsing**      | Google, NumPy, Sphinx, JSDoc, RustDoc, and 10+ formats        |\n| **Diagnostics**            | Parse errors with line\u002Fcolumn positions                       |\n| **Syntax-Aware Chunking**  | Split code by semantic boundaries, not arbitrary byte offsets |\n\nPowered by [tree-sitter-language-pack](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Ftree-sitter-language-pack) with dynamic grammar download. See [TSLP documentation](https:\u002F\u002Fdocs.tree-sitter-language-pack.kreuzberg.dev) for the full language list.\n\n## Key Features\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>OCR with Table Extraction\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nMultiple OCR backends (Tesseract, EasyOCR, PaddleOCR) with intelligent table detection and reconstruction. Extract structured data from scanned documents and images with configurable accuracy thresholds.\n\n**[OCR Backend Documentation →](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fguides\u002Focr\u002F)**\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Batch Processing\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nProcess multiple documents concurrently with configurable parallelism. Optimize throughput for large-scale document processing workloads with automatic resource management.\n\n**[Batch Processing Guide →](https:\u002F\u002Fdocs.kreuzberg.dev\u002Ffeatures\u002F#batch-processing)**\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Password-Protected PDFs\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nHandle encrypted PDFs with single or multiple password attempts. Supports both RC4 and AES encryption with automatic fallback strategies.\n\n**[PDF Configuration →](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fguides\u002Fconfiguration\u002F)**\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Language Detection\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nAutomatic language detection in extracted text using fast-langdetect. Configure confidence thresholds and access per-language statistics.\n\n**[Language Detection Guide →](https:\u002F\u002Fdocs.kreuzberg.dev\u002Ffeatures\u002F#language-detection)**\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Metadata Extraction\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nExtract comprehensive metadata from all supported formats: authors, titles, creation dates, page counts, EXIF data, and format-specific properties.\n\n**[Metadata Guide →](https:\u002F\u002Fdocs.kreuzberg.dev\u002Freference\u002Ftypes\u002F#metadata)**\n\n\u003C\u002Fdetails>\n\n## AI Coding Assistants\n\nKreuzberg ships with an [Agent Skill](https:\u002F\u002Fagentskills.io) that teaches AI coding assistants how to use the library correctly. It works with Claude Code, Codex, Gemini CLI, Cursor, VS Code, Amp, Goose, Roo Code, and any tool supporting the Agent Skills standard.\n\nInstall the skill into any project using the [Vercel Skills CLI](https:\u002F\u002Fgithub.com\u002Fvercel-labs\u002Fskills):\n\n```bash\nnpx skills add kreuzberg-dev\u002Fkreuzberg\n```\n\nThe skill is located at [`skills\u002Fkreuzberg\u002FSKILL.md`](skills\u002Fkreuzberg\u002FSKILL.md) and is automatically discovered by supported AI coding tools once installed.\n\n## Documentation\n\n- **[Installation Guide](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fgetting-started\u002Finstallation\u002F)** – Setup and dependencies\n- **[User Guide](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fguides\u002Fextraction\u002F)** – Comprehensive usage guide\n- **[API Reference](https:\u002F\u002Fdocs.kreuzberg.dev\u002Freference\u002Fapi-python\u002F)** – Complete API documentation\n- **[Format Support](https:\u002F\u002Fdocs.kreuzberg.dev\u002Freference\u002Fformats\u002F)** – Supported file formats\n- **[OCR Backends](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fguides\u002Focr\u002F)** – OCR engine setup\n- **[CLI Guide](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fcli\u002Fusage\u002F)** – Command-line usage\n- **[Migration Guides](https:\u002F\u002Fdocs.kreuzberg.dev\u002Fmigration\u002Ffrom-unstructured\u002F)** – Upgrading from other libraries\n\n## Contributing\n\nContributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## License\n\nElastic License 2.0 (ELv2) - see [LICENSE](LICENSE) for details. See [https:\u002F\u002Fwww.elastic.co\u002Flicensing\u002Felastic-license](https:\u002F\u002Fwww.elastic.co\u002Flicensing\u002Felastic-license) for the full license text.\n","Kreuzberg 是一个基于 Rust 的多语言文档智能处理框架，能够从 PDF、Office 文档、图片及 97 种以上格式中提取文本、元数据、图片和结构化信息。该项目支持多种编程语言绑定，包括 Rust、Python、Ruby、Java、Go、PHP、Elixir、C#、R、C 和 TypeScript（Node\u002FBun\u002FWASM\u002FDeno），同时提供 CLI、REST API 和 MCP 服务器等多种使用方式。Kreuzberg 适用于需要高效且跨平台地处理各种文档类型的场景，如企业内容管理、自动化办公流程、数据分析等。其强大的功能集合和广泛的兼容性使其成为现代文档处理解决方案的理想选择。",2,"2026-06-11 03:03:53","top_language"]