[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73448":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":32,"discoverSource":33},73448,"zerox","getomni-ai\u002Fzerox","getomni-ai","OCR & Document Extraction using vision models","https:\u002F\u002Fgetomni.ai\u002Focr-demo",null,"TypeScript",12242,848,62,70,0,2,5,11,6,76.39,"MIT License",false,"main",true,[27,28],"ocr","pdf","2026-06-12 04:01:09","![Hero Image](.\u002Fassets\u002FheroImage.png)\n\n## Zerox OCR\n\n\u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Fsmg2QfwtJ6\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fcccc0e9a-e3b2-425e-9b54-e5024681b129\" alt=\"Join us on Discord\" width=\"200px\">\n\u003C\u002Fa>\n\nA dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense!\n\nThe general logic:\n\n- Pass in a file (PDF, DOCX, image, etc.)\n- Convert that file into a series of images\n- Pass each image to GPT and ask nicely for Markdown\n- Aggregate the responses and return Markdown\n\nTry out the hosted version here: \u003Chttps:\u002F\u002Fgetomni.ai\u002Focr-demo>\nOr visit our full documentation at: \u003Chttps:\u002F\u002Fdocs.getomni.ai\u002Fzerox>\n\n## Getting Started\n\nZerox is available as both a Node and Python package.\n\n- [Node README](#node-zerox) - [npm package](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fzerox)\n- [Python README](#python-zerox) - [pip package](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpy-zerox\u002F)\n\n| Feature                   | Node.js                      | Python                     |\n| ------------------------- | ---------------------------- | -------------------------- |\n| PDF Processing            | ✓ (requires graphicsmagick)  | ✓ (requires poppler)       |\n| Image Processing          | ✓                            | ✓                          |\n| OpenAI Support            | ✓                            | ✓                          |\n| Azure OpenAI Support      | ✓                            | ✓                          |\n| AWS Bedrock Support       | ✓                            | ✓                          |\n| Google Gemini Support     | ✓                            | ✓                          |\n| Vertex AI Support         | ✗                            | ✓                          |\n| Data Extraction           | ✓ (`schema`)                 | ✗                          |\n| Per-page Extraction       | ✓ (`extractPerPage`)         | ✗                          |\n| Custom System Prompts     | ✗                            | ✓ (`custom_system_prompt`) |\n| Maintain Format Option    | ✓ (`maintainFormat`)         | ✓ (`maintain_format`)      |\n| Async API                 | ✓                            | ✓                          |\n| Error Handling Modes      | ✓ (`errorMode`)              | ✗                          |\n| Concurrent Processing     | ✓ (`concurrency`)            | ✓ (`concurrency`)          |\n| Temp Directory Management | ✓ (`tempDir`)                | ✓ (`temp_dir`)             |\n| Page Selection            | ✓ (`pagesToConvertAsImages`) | ✓ (`select_pages`)         |\n| Orientation Correction    | ✓ (`correctOrientation`)     | ✗                          |\n| Edge Trimming             | ✓ (`trimEdges`)              | ✗                          |\n\n## Node Zerox\n\n(Node.js SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, Google Gemini, etc.)\n\n### Installation\n\n```sh\nnpm install zerox\n```\n\nZerox uses `graphicsmagick` and `ghostscript` for the PDF => image processing step. These should be pulled automatically, but you may need to manually install.\n\nOn linux use:\n\n```\nsudo apt-get update\nsudo apt-get install -y graphicsmagick\n```\n\n## Usage\n\n**With file URL**\n\n```ts\nimport { zerox } from \"zerox\";\n\nconst result = await zerox({\n  filePath: \"https:\u002F\u002Fomni-demo-data.s3.amazonaws.com\u002Ftest\u002Fcs101.pdf\",\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n```\n\n**From local path**\n\n```ts\nimport { zerox } from \"zerox\";\nimport path from \"path\";\n\nconst result = await zerox({\n  filePath: path.resolve(__dirname, \".\u002Fcs101.pdf\"),\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n```\n\n### Parameters\n\n```ts\nconst result = await zerox({\n  \u002F\u002F Required\n  filePath: \"path\u002Fto\u002Ffile\",\n  credentials: {\n    apiKey: \"your-api-key\",\n    \u002F\u002F Additional provider-specific credentials as needed\n  },\n\n  \u002F\u002F Optional\n  cleanup: true, \u002F\u002F Clear images from tmp after run\n  concurrency: 10, \u002F\u002F Number of pages to run at a time\n  correctOrientation: true, \u002F\u002F True by default, attempts to identify and correct page orientation\n  directImageExtraction: false, \u002F\u002F Extract data directly from document images instead of the markdown\n  errorMode: ErrorMode.IGNORE, \u002F\u002F ErrorMode.THROW or ErrorMode.IGNORE, defaults to ErrorMode.IGNORE\n  extractionPrompt: \"\", \u002F\u002F LLM instructions for extracting data from document\n  extractOnly: false, \u002F\u002F Set to true to only extract structured data using a schema\n  extractPerPage, \u002F\u002F Extract data per page instead of the entire document\n  imageDensity: 300, \u002F\u002F DPI for image conversion\n  imageHeight: 2048, \u002F\u002F Maximum height for converted images\n  llmParams: {}, \u002F\u002F Additional parameters to pass to the LLM\n  maintainFormat: false, \u002F\u002F Slower but helps maintain consistent formatting\n  maxImageSize: 15, \u002F\u002F Maximum size of images to compress, defaults to 15MB\n  maxRetries: 1, \u002F\u002F Number of retries to attempt on a failed page, defaults to 1\n  maxTesseractWorkers: -1, \u002F\u002F Maximum number of Tesseract workers. Zerox will start with a lower number and only reach maxTesseractWorkers if needed\n  model: ModelOptions.OPENAI_GPT_4O, \u002F\u002F Model to use (supports various models from different providers)\n  modelProvider: ModelProvider.OPENAI, \u002F\u002F Choose from OPENAI, BEDROCK, GOOGLE, or AZURE\n  outputDir: undefined, \u002F\u002F Save combined result.md to a file\n  pagesToConvertAsImages: -1, \u002F\u002F Page numbers to convert to image as array (e.g. `[1, 2, 3]`) or a number (e.g. `1`). Set to -1 to convert all pages\n  prompt: \"\", \u002F\u002F LLM instructions for processing the document\n  schema: undefined, \u002F\u002F Schema for structured data extraction\n  tempDir: \"\u002Fos\u002Ftmp\", \u002F\u002F Directory to use for temporary files (default: system temp directory)\n  trimEdges: true, \u002F\u002F True by default, trims pixels from all edges that contain values similar to the given background color, which defaults to that of the top-left pixel\n});\n```\n\nThe `maintainFormat` option tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. This requires the requests to run synchronously, so it's a lot slower. But valuable if your documents have a lot of tabular data, or frequently have tables that cross pages.\n\n```\nRequest #1 => page_1_image\nRequest #2 => page_1_markdown + page_2_image\nRequest #3 => page_2_markdown + page_3_image\n```\n\n### Example Output\n\n```js\n{\n  completionTime: 10038,\n  fileName: 'invoice_36258',\n  inputTokens: 25543,\n  outputTokens: 210,\n  pages: [\n    {\n      page: 1,\n      content: '# INVOICE # 36258\\n' +\n        '**Date:** Mar 06 2012  \\n' +\n        '**Ship Mode:** First Class  \\n' +\n        '**Balance Due:** $50.10  \\n' +\n        '## Bill To:\\n' +\n        'Aaron Bergman  \\n' +\n        '98103, Seattle,  \\n' +\n        'Washington, United States  \\n' +\n        '## Ship To:\\n' +\n        'Aaron Bergman  \\n' +\n        '98103, Seattle,  \\n' +\n        'Washington, United States  \\n' +\n        '\\n' +\n        '| Item                                       | Quantity | Rate   | Amount  |\\n' +\n        '|--------------------------------------------|----------|--------|---------|\\n' +\n        \"| Global Push Button Manager's Chair, Indigo | 1        | $48.71 | $48.71  |\\n\" +\n        '| Chairs, Furniture, FUR-CH-4421             |          |        |         |\\n' +\n        '\\n' +\n        '**Subtotal:** $48.71  \\n' +\n        '**Discount (20%):** $9.74  \\n' +\n        '**Shipping:** $11.13  \\n' +\n        '**Total:** $50.10  \\n' +\n        '---\\n' +\n        '**Notes:**  \\n' +\n        'Thanks for your business!  \\n' +\n        '**Terms:**  \\n' +\n        'Order ID : CA-2012-AB10015140-40974  ',\n      contentLength: 747,\n    }\n  ],\n  extracted: null,\n  summary: {\n    totalPages: 1,\n    ocr: {\n      failed: 0,\n      successful: 1,\n    },\n    extracted: null,\n  },\n}\n```\n\n### Data Extraction\n\nZerox supports structured data extraction from documents using a schema. This allows you to pull specific information from documents in a structured format instead of getting the full markdown conversion.\n\nSet `extractOnly: true` and provide a `schema` to extract structured data. The schema follows the [JSON Schema standard](https:\u002F\u002Fjson-schema.org\u002Funderstanding-json-schema\u002F).\n\nUse `extractPerPage` to extract data per page instead of from the whole document at once.\n\nYou can also set `extractionModel`, `extractionModelProvider`, and `extractionCredentials` to use a different model for extraction than OCR. By default, the same model is used.\n\n### Supported Models\n\nZerox supports a wide range of models across different providers:\n\n- **Azure OpenAI**\n\n  - GPT-4 Vision (gpt-4o)\n  - GPT-4 Vision Mini (gpt-4o-mini)\n  - GPT-4.1 (gpt-4.1)\n  - GPT-4.1 Mini (gpt-4.1-mini)\n\n- **OpenAI**\n\n  - GPT-4 Vision (gpt-4o)\n  - GPT-4 Vision Mini (gpt-4o-mini)\n  - GPT-4.1 (gpt-4.1)\n  - GPT-4.1 Mini (gpt-4.1-mini)\n\n- **AWS Bedrock**\n\n  - Claude 3 Haiku (2024.03, 2024.10)\n  - Claude 3 Sonnet (2024.02, 2024.06, 2024.10)\n  - Claude 3 Opus (2024.02)\n\n- **Google Gemini**\n  - Gemini 1.5 (Flash, Flash-8B, Pro)\n  - Gemini 2.0 (Flash, Flash-Lite)\n\n```ts\nimport { zerox } from \"zerox\";\nimport { ModelOptions, ModelProvider } from \"zerox\u002Fnode-zerox\u002Fdist\u002Ftypes\";\n\n\u002F\u002F OpenAI\nconst openaiResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.OPENAI,\n  model: ModelOptions.OPENAI_GPT_4O,\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n\n\u002F\u002F Azure OpenAI\nconst azureResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.AZURE,\n  model: ModelOptions.OPENAI_GPT_4O,\n  credentials: {\n    apiKey: process.env.AZURE_API_KEY,\n    endpoint: process.env.AZURE_ENDPOINT,\n  },\n});\n\n\u002F\u002F AWS Bedrock\nconst bedrockResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.BEDROCK,\n  model: ModelOptions.BEDROCK_CLAUDE_3_SONNET_2024_10,\n  credentials: {\n    accessKeyId: process.env.AWS_ACCESS_KEY_ID,\n    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,\n    region: process.env.AWS_REGION,\n  },\n});\n\n\u002F\u002F Google Gemini\nconst geminiResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.GOOGLE,\n  model: ModelOptions.GOOGLE_GEMINI_1_5_PRO,\n  credentials: {\n    apiKey: process.env.GEMINI_API_KEY,\n  },\n});\n```\n\n## Python Zerox\n\n(Python SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, etc.)\n\n### Installation\n\n- Install **poppler** on the system, it should be available in path variable. See the [pdf2image documentation](https:\u002F\u002Fpdf2image.readthedocs.io\u002Fen\u002Flatest\u002Finstallation.html) for instructions by platform.\n- Install py-zerox:\n\n```sh\npip install py-zerox\n```\n\nThe `pyzerox.zerox` function is an asynchronous API that performs OCR (Optical Character Recognition) to markdown using vision models. It processes PDF files and converts them into markdown format. Make sure to set up the environment variables for the model and the model provider before using this API.\n\nRefer to the [LiteLLM Documentation](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders) for setting up the environment and passing the correct model name.\n\n### Usage\n\n```python\nfrom pyzerox import zerox\nimport os\nimport json\nimport asyncio\n\n### Model Setup (Use only Vision Models) Refer: https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders ###\n\n## placeholder for additional model kwargs which might be required for some models\nkwargs = {}\n\n## system prompt to use for the vision model\ncustom_system_prompt = None\n\n# to override\n# custom_system_prompt = \"For the below PDF page, do something..something...\" ## example\n\n###################### Example for OpenAI ######################\nmodel = \"gpt-4o-mini\" ## openai model\nos.environ[\"OPENAI_API_KEY\"] = \"\" ## your-api-key\n\n\n###################### Example for Azure OpenAI ######################\nmodel = \"azure\u002Fgpt-4o-mini\" ## \"azure\u002F\u003Cyour_deployment_name>\" -> format \u003Cprovider>\u002F\u003Cmodel>\nos.environ[\"AZURE_API_KEY\"] = \"\" # \"your-azure-api-key\"\nos.environ[\"AZURE_API_BASE\"] = \"\" # \"https:\u002F\u002Fexample-endpoint.openai.azure.com\"\nos.environ[\"AZURE_API_VERSION\"] = \"\" # \"2023-05-15\"\n\n\n###################### Example for Gemini ######################\nmodel = \"gemini\u002Fgpt-4o-mini\" ## \"gemini\u002F\u003Cgemini_model>\" -> format \u003Cprovider>\u002F\u003Cmodel>\nos.environ['GEMINI_API_KEY'] = \"\" # your-gemini-api-key\n\n\n###################### Example for Anthropic ######################\nmodel=\"claude-3-opus-20240229\"\nos.environ[\"ANTHROPIC_API_KEY\"] = \"\" # your-anthropic-api-key\n\n###################### Vertex ai ######################\nmodel = \"vertex_ai\u002Fgemini-1.5-flash-001\" ## \"vertex_ai\u002F\u003Cmodel_name>\" -> format \u003Cprovider>\u002F\u003Cmodel>\n## GET CREDENTIALS\n## RUN ##\n# !gcloud auth application-default login - run this to add vertex credentials to your env\n## OR ##\nfile_path = 'path\u002Fto\u002Fvertex_ai_service_account.json'\n\n# Load the JSON file\nwith open(file_path, 'r') as file:\n    vertex_credentials = json.load(file)\n\n# Convert to JSON string\nvertex_credentials_json = json.dumps(vertex_credentials)\n\nvertex_credentials=vertex_credentials_json\n\n## extra args\nkwargs = {\"vertex_credentials\": vertex_credentials}\n\n###################### For other providers refer: https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders ######################\n\n# Define main async entrypoint\nasync def main():\n    file_path = \"https:\u002F\u002Fomni-demo-data.s3.amazonaws.com\u002Ftest\u002Fcs101.pdf\" ## local filepath and file URL supported\n\n    ## process only some pages or all\n    select_pages = None ## None for all, but could be int or list(int) page numbers (1 indexed)\n\n    output_dir = \".\u002Foutput_test\" ## directory to save the consolidated markdown file\n    result = await zerox(file_path=file_path, model=model, output_dir=output_dir,\n                        custom_system_prompt=custom_system_prompt,select_pages=select_pages, **kwargs)\n    return result\n\n\n# run the main function:\nresult = asyncio.run(main())\n\n# print markdown result\nprint(result)\n```\n\n### Parameters\n\n```python\nasync def zerox(\n    cleanup: bool = True,\n    concurrency: int = 10,\n    file_path: Optional[str] = \"\",\n    maintain_format: bool = False,\n    model: str = \"gpt-4o-mini\",\n    output_dir: Optional[str] = None,\n    temp_dir: Optional[str] = None,\n    custom_system_prompt: Optional[str] = None,\n    select_pages: Optional[Union[int, Iterable[int]]] = None,\n    **kwargs\n) -> ZeroxOutput:\n  ...\n```\n\nParameters\n\n- **cleanup** (bool, optional):\n  Whether to clean up temporary files after processing. Defaults to True.\n- **concurrency** (int, optional):\n  The number of concurrent processes to run. Defaults to 10.\n- **file_path** (Optional[str], optional):\n  The path to the PDF file to process. Defaults to an empty string.\n- **maintain_format** (bool, optional):\n  Whether to maintain the format from the previous page. Defaults to False.\n- **model** (str, optional):\n  The model to use for generating completions. Defaults to \"gpt-4o-mini\".\n  Refer to LiteLLM Providers for the correct model name, as it may differ depending on the provider.\n- **output_dir** (Optional[str], optional):\n  The directory to save the markdown output. Defaults to None.\n- **temp_dir** (str, optional):\n  The directory to store temporary files, defaults to some named folder in system's temp directory. If already exists, the contents will be deleted before Zerox uses it.\n- **custom_system_prompt** (str, optional):\n  The system prompt to use for the model, this overrides the default system prompt of Zerox.Generally it is not required unless you want some specific behavior. Defaults to None.\n- **select_pages** (Optional[Union[int, Iterable[int]]], optional):\n  Pages to process, can be a single page number or an iterable of page numbers. Defaults to None\n- **kwargs** (dict, optional):\n  Additional keyword arguments to pass to the litellm.completion method.\n  Refer to the LiteLLM Documentation and Completion Input for details.\n\nReturns\n\n- ZeroxOutput:\n  Contains the markdown content generated by the model and also some metadata (refer below).\n\n### Example Output (output from \"azure\u002Fgpt-4o-mini\")\n\nNote the output is manually wrapped for this documentation for better readability.\n\n````Python\nZeroxOutput(\n    completion_time=9432.975,\n    file_name='cs101',\n    input_tokens=36877,\n    output_tokens=515,\n    pages=[\n        Page(\n            content='| Type    | Description                          | Wrapper Class |\\n' +\n                    '|---------|--------------------------------------|---------------|\\n' +\n                    '| byte    | 8-bit signed 2s complement integer   | Byte          |\\n' +\n                    '| short   | 16-bit signed 2s complement integer  | Short         |\\n' +\n                    '| int     | 32-bit signed 2s complement integer  | Integer       |\\n' +\n                    '| long    | 64-bit signed 2s complement integer  | Long          |\\n' +\n                    '| float   | 32-bit IEEE 754 floating point number| Float         |\\n' +\n                    '| double  | 64-bit floating point number         | Double        |\\n' +\n                    '| boolean | may be set to true or false          | Boolean       |\\n' +\n                    '| char    | 16-bit Unicode (UTF-16) character    | Character     |\\n\\n' +\n                    'Table 26.2.: Primitive types in Java\\n\\n' +\n                    '### 26.3.1. Declaration & Assignment\\n\\n' +\n                    'Java is a statically typed language meaning that all variables must be declared before you can use ' +\n                    'them or refer to them. In addition, when declaring a variable, you must specify both its type and ' +\n                    'its identifier. For example:\\n\\n' +\n                    '```java\\n' +\n                    'int numUnits;\\n' +\n                    'double costPerUnit;\\n' +\n                    'char firstInitial;\\n' +\n                    'boolean isStudent;\\n' +\n                    '```\\n\\n' +\n                    'Each declaration specifies the variable’s type followed by the identifier and ending with a ' +\n                    'semicolon. The identifier rules are fairly standard: a name can consist of lowercase and ' +\n                    'uppercase alphabetic characters, numbers, and underscores but may not begin with a numeric ' +\n                    'character. We adopt the modern camelCasing naming convention for variables in our code. In ' +\n                    'general, variables must be assigned a value before you can use them in an expression. You do not ' +\n                    'have to immediately assign a value when you declare them (though it is good practice), but some ' +\n                    'value must be assigned before they can be used or the compiler will issue an error.\\n\\n' +\n                    'The assignment operator is a single equal sign, `=` and is a right-to-left assignment. That is, ' +\n                    'the variable that we wish to assign the value to appears on the left-hand-side while the value ' +\n                    '(literal, variable or expression) is on the right-hand-side. Using our variables from before, ' +\n                    'we can assign them values:\\n\\n' +\n                    '> 2 Instance variables, that is variables declared as part of an object do have default values. ' +\n                    'For objects, the default is `null`, for all numeric types, zero is the default value. For the ' +\n                    'boolean type, `false` is the default, and the default char value is `\\\\0`, the null-terminating ' +\n                    'character (zero in the ASCII table).',\n            content_length=2333,\n            page=1\n        )\n    ]\n)\n````\n\n## Supported File Types\n\nWe use a combination of `libreoffice` and `graphicsmagick` to do document => image conversion. For non-image \u002F non-PDF files, we use libreoffice to convert that file to a PDF, and then to an image.\n\n```js\n[\n  \"pdf\", \u002F\u002F Portable Document Format\n  \"doc\", \u002F\u002F Microsoft Word 97-2003\n  \"docx\", \u002F\u002F Microsoft Word 2007-2019\n  \"odt\", \u002F\u002F OpenDocument Text\n  \"ott\", \u002F\u002F OpenDocument Text Template\n  \"rtf\", \u002F\u002F Rich Text Format\n  \"txt\", \u002F\u002F Plain Text\n  \"html\", \u002F\u002F HTML Document\n  \"htm\", \u002F\u002F HTML Document (alternative extension)\n  \"xml\", \u002F\u002F XML Document\n  \"wps\", \u002F\u002F Microsoft Works Word Processor\n  \"wpd\", \u002F\u002F WordPerfect Document\n  \"xls\", \u002F\u002F Microsoft Excel 97-2003\n  \"xlsx\", \u002F\u002F Microsoft Excel 2007-2019\n  \"ods\", \u002F\u002F OpenDocument Spreadsheet\n  \"ots\", \u002F\u002F OpenDocument Spreadsheet Template\n  \"csv\", \u002F\u002F Comma-Separated Values\n  \"tsv\", \u002F\u002F Tab-Separated Values\n  \"ppt\", \u002F\u002F Microsoft PowerPoint 97-2003\n  \"pptx\", \u002F\u002F Microsoft PowerPoint 2007-2019\n  \"odp\", \u002F\u002F OpenDocument Presentation\n  \"otp\", \u002F\u002F OpenDocument Presentation Template\n];\n```\n\n## Credits\n\n- [Litellm](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm): \u003Chttps:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm> | This powers our python sdk to support all popular vision models from different providers.\n\n### License\n\nThis project is licensed under the MIT License.\n","Zerox 是一个基于视觉模型的OCR和文档提取工具，能够将PDF、DOCX或图片等文件转换为Markdown格式。它通过将输入文件转化为一系列图像，并利用GPT等AI技术对每张图像进行处理，最终聚合输出Markdown文本。Zerox支持多种AI服务提供商（如OpenAI, Azure OpenAI, AWS Bedrock等），并提供了丰富的功能选项，包括并发处理、错误处理模式以及页面选择等。该项目适合需要从复杂布局文档中高效提取信息并转化为结构化数据的应用场景，比如自动化文档处理、知识管理平台等。","2026-06-11 03:45:35","high_star"]