[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82330":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":15,"starSnapshotCount":15,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},82330,"parsel","shipfastlabs\u002Fparsel","shipfastlabs","A fast, helpful, and open-source document parser for PHP","",null,"PHP",295,8,3,0,4,54,159,22,80.86,"MIT License",false,"main",[25,26,27,28],"docs","ocr","pdf","text-extarction","2026-06-12 04:01:37","# Parsel\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\".\u002Fart\u002Fog.png\" height=\"300\" alt=\"Parsel\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fshipfastlabs\u002Fparsel\u002Factions\">\u003Cimg alt=\"Tests\" src=\"https:\u002F\u002Fgithub.com\u002Fshipfastlabs\u002Fparsel\u002Factions\u002Fworkflows\u002Ftests.yml\u002Fbadge.svg\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpackagist.org\u002Fpackages\u002Fshipfastlabs\u002Fparsel\">\u003Cimg alt=\"Latest Version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpackagist\u002Fv\u002Fshipfastlabs\u002Fparsel\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpackagist.org\u002Fpackages\u002Fshipfastlabs\u002Fparsel\">\u003Cimg alt=\"Total Downloads\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpackagist\u002Fdt\u002Fshipfastlabs\u002Fparsel\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpackagist.org\u002Fpackages\u002Fshipfastlabs\u002Fparsel\">\u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpackagist\u002Fl\u002Fshipfastlabs\u002Fparsel\">\u003C\u002Fa>\n\u003C\u002Fp>\n\nParsel provides an expressive PHP API for parsing PDFs, Office documents, and images. Your documents are processed locally, allowing you to extract plain text, structured page data, coordinates, and screenshots without sending files to an external service.\n\nParsel may return plain text, structured document data, page screenshots, or one page at a time for larger documents. It is designed to feel natural in PHP applications while still giving you access to advanced parser options when you need them.\n\n```php\nuse Shipfastlabs\\Parsel;\n\n$text = Parsel::file('invoice.pdf')->text();\n\n$document = Parsel::file('invoice.pdf')\n    ->pageRange(1, 5)\n    ->withOcr(language: 'eng')\n    ->withDpi(300)\n    ->parse();\n\nforeach ($document->pages as $page) {\n    foreach ($page->items as $item) {\n        echo \"{$item->text} @ ({$item->x}, {$item->y})\\n\";\n    }\n}\n```\n\nParsel requires PHP 8.4 or greater and the `lit` binary.\n\n## Installation\n\nYou may install Parsel via Composer:\n\n```bash\ncomposer require shipfastlabs\u002Fparsel\n```\n\nInstall the required `lit` binary:\n\n```bash\nvendor\u002Fbin\u002Fparsel-install-lit\n```\n\nFor Office documents, spreadsheets, presentations, and images, you may also install the system dependencies:\n\n```bash\nvendor\u002Fbin\u002Fparsel-install-lit --with-system-dependencies\n```\n\nYou may choose the installer used for the `lit` binary:\n\n```bash\nvendor\u002Fbin\u002Fparsel-install-lit --manager=npm\nvendor\u002Fbin\u002Fparsel-install-lit --manager=pnpm\nvendor\u002Fbin\u002Fparsel-install-lit --manager=bun\nvendor\u002Fbin\u002Fparsel-install-lit --manager=pip\nvendor\u002Fbin\u002Fparsel-install-lit --manager=cargo\n```\n\nYou may also install `lit` yourself using one of the following commands:\n\n```bash\ncargo install liteparse\npip install liteparse\nnpm i -g @llamaindex\u002Fliteparse\npnpm add -g @llamaindex\u002Fliteparse\nbun add -g @llamaindex\u002Fliteparse\n```\n\nIf you prefer to install the system dependencies yourself, use the matching commands for your platform:\n\n```bash\n# macOS\nbrew install --cask libreoffice # For Office Document\nbrew install imagemagick # For Images\n\n# Ubuntu \u002F Debian\napt-get install libreoffice # For Office Document\napt-get install imagemagick # For Images\n\n# Windows\nchoco install libreoffice-fresh # For Office Document\nchoco install imagemagick.app # For Images\n```\n\nLibreOffice is used for Office document conversion, ImageMagick is used for image conversion, and OCR support uses Tesseract through liteparse.\n\n## Parsing Files\n\nThe `file` method creates a parser instance for a document that already exists on disk. Once a source has been selected, you may choose how the parsed output should be returned.\n\n```php\nuse Shipfastlabs\\Parsel;\n\n$text = Parsel::file('\u002Fpath\u002Fto\u002Freport.pdf')->text();\n```\n\nYou may also parse raw bytes. This is useful when working with uploaded files, database blobs, or documents that have not been persisted to disk. Because byte sources do not include a filename, you should provide the file extension.\n\n```php\n$document = Parsel::bytes($uploadedBytes, 'pdf')->parse();\n```\n\nParsel may be used with PDFs, Word documents, spreadsheets, presentations, and images. The same fluent API is used for each supported file type:\n\n```php\n$text = Parsel::file('contract.docx')->text();\n\n$rows = Parsel::file('report.xlsx')->text();\n\n$slides = Parsel::file('deck.pptx')->text();\n\n$scan = Parsel::file('receipt.png')\n    ->withOcr()\n    ->text();\n\n$photo = Parsel::file('invoice.jpg')\n    ->withOcr(language: 'eng')\n    ->parse();\n```\n\n## Plain Text\n\nThe `text` method returns the parsed document text as a string. Parsel removes page header markers from text output before returning it.\n\n```php\n$text = Parsel::file('document.pdf')\n    ->withoutOcr()\n    ->text();\n```\n\n## Structured Documents\n\nThe `parse` method returns a `Document` object containing the document text, metadata, pages, and positioned text items. This is useful when you need coordinates, font information, or OCR confidence values.\n\n```php\n$document = Parsel::file('document.pdf')->parse();\n\necho $document->text;\necho $document->pageCount();\n\nforeach ($document->pages as $page) {\n    foreach ($page->items as $item) {\n        echo $item->text;\n    }\n}\n```\n\nThe document may also be returned as an array:\n\n```php\n$array = Parsel::file('document.pdf')->toArray();\n```\n\n## Page Selection\n\nThe `page`, `pages`, and `pageRange` methods may be used to limit parsing to specific pages. These methods are additive, so you may combine multiple calls before parsing the document.\n\n```php\nParsel::file('document.pdf')->page(7);\n\nParsel::file('document.pdf')->pages(1, 3, 5);\n\nParsel::file('document.pdf')->pages('1-5', 10);\n\nParsel::file('document.pdf')->pageRange(1, 5);\n\nParsel::file('document.pdf')->pageRange(1, 5)->page(10);\n```\n\nIf you only need to limit how many pages are parsed, you may use the `maxPages` method:\n\n```php\nParsel::file('document.pdf')->maxPages(50)->text();\n```\n\n## OCR\n\nOCR is disabled by default so that parsing remains fast and predictable. You may enable OCR using the `withOcr` method:\n\n```php\n$text = Parsel::file('scan.pdf')->withOcr()->text();\n```\n\nThe `withOcr` method accepts named arguments for the most common OCR options:\n\n```php\n$text = Parsel::file('scan.pdf')\n    ->withOcr(\n        language: 'fra',\n        tessdataPath: '\u002Fusr\u002Fshare\u002Ftessdata',\n        serverUrl: 'http:\u002F\u002Flocalhost:8828\u002Focr',\n        workers: 8,\n    )\n    ->text();\n```\n\nIf you would like to be explicit that OCR should not be used, you may call `withoutOcr`:\n\n```php\n$text = Parsel::file('document.pdf')->withoutOcr()->text();\n```\n\n## Rendering Options\n\nParsel includes convenience methods for common parser options such as rendering DPI, small text preservation, encrypted documents, and per-call process settings.\n\n```php\nParsel::file('document.pdf')->withDpi(300);\n\nParsel::file('document.pdf')->preserveSmallText();\n\nParsel::file('secret.pdf')->withPassword('hunter2');\n\nParsel::file('document.pdf')->withBinary('\u002Fusr\u002Flocal\u002Fbin\u002Flit');\n\nParsel::file('document.pdf')->withTimeout(120);\n```\n\n## Additional Options\n\nIf you need to pass a flag that Parsel does not yet provide as a dedicated method, you may pass the option directly using the `option` method. Boolean options may be passed without a value, while options that require a value may receive one as the second argument.\n\n```php\nParsel::file('document.pdf')->option('some-new-flag');\n\nParsel::file('document.pdf')->option('some-new-flag', 42);\n```\n\n## Saving Output\n\nThe `save` method writes parsed output to disk and returns the path that was written. When the target path ends in `.json`, Parsel will write JSON output. For all other extensions, Parsel will write plain text.\n\n```php\nParsel::file('document.pdf')->save('document.txt');\n\nParsel::file('document.pdf')->save('document.json');\n```\n\n## Screenshots\n\nYou may use the `screenshots` method to render page screenshots into a directory. The method returns the image files found in the output directory after parsing has finished.\n\n```php\n$screenshots = Parsel::file('document.pdf')\n    ->pageRange(1, 5)\n    ->screenshots('\u002Ftmp\u002Fparsel-pages');\n```\n\nFor predictable results, you should pass a dedicated output directory that does not contain unrelated files.\n\n## Streaming Large Documents\n\nThe `parse` and `toArray` methods load the parsed document into memory. When working with large documents, you may use `lazyPages` to process one page at a time.\n\n```php\nforeach (Parsel::file('large-document.pdf')->lazyPages() as $page) {\n    foreach ($page->items as $item) {\n        \u002F\u002F Process one page at a time...\n    }\n}\n```\n\nThis allows Parsel to read pages incrementally instead of keeping the full document in memory.\n\n## Document Data\n\nA parsed `Document` contains the full text, metadata, and a list of pages. Each page contains its dimensions, page text, and text items with position data.\n\n```php\n$document->text;\n$document->metadata;\n$document->pages;\n$document->pageCount();\n\n$page->number;\n$page->width;\n$page->height;\n$page->text;\n$page->items;\n\n$item->text;\n$item->x;\n$item->y;\n$item->width;\n$item->height;\n$item->fontName;\n$item->fontSize;\n$item->confidence;\n```\n\n## Binary Resolution\n\nWhen Parsel needs to parse a document, it resolves the `lit` binary in the following order:\n\n1. The per-call binary configured with `withBinary`.\n2. The global binary configured with `Parsel::usingBinary`.\n3. The `PARSEL_LIT_BINARY` environment variable.\n4. The `lit` binary available on the system `PATH`.\n\nIf no binary can be resolved, Parsel will throw a `BinaryNotFoundException`.\n\n```php\nParsel::usingBinary('\u002Fusr\u002Flocal\u002Fbin\u002Flit');\n\nParsel::defaultTimeout(120);\n```\n\n## Testing\n\nParsel includes a fake runner that allows your tests to exercise parsing code without spawning the real binary. Response keys are matched against the command line as substrings. When multiple responses match, the longest matching key is used.\n\n```php\nuse Shipfastlabs\\Parsel;\n\n$fake = Parsel::fake([\n    '--format json' => file_get_contents(__DIR__.'\u002Ffixtures\u002Flit-output.json'),\n]);\n\n$document = Parsel::file('invoice.pdf')->parse();\n\nexpect($fake->recordedCommands()[0])->toContain('--format', 'json');\n```\n\nString responses are returned as successful stdout. If you need control over the exit code or stderr, you may provide a `ProcessResult` instance instead.\n\n## Development\n\nParsel uses Pint, Rector, PHPStan, and Pest to keep the codebase formatted and well tested.\n\n```bash\ncomposer lint\ncomposer test:types\ncomposer test:unit\ncomposer test\n```\n\nThe integration tests run against a real parser installation and are skipped when the `lit` binary is not available:\n\n```bash\n.\u002Fvendor\u002Fbin\u002Fpest --group=integration\n```\n\n## Credits\n\nParsel is maintained by [Shipfastlabs](https:\u002F\u002Fshipfastlabs.com) and released under the [MIT license](LICENSE.md).\n","Parsel 是一个快速、实用且开源的 PHP 文档解析库。它支持解析 PDF、Office 文档和图像，能够在本地处理文件，提取纯文本、结构化页面数据、坐标信息及页面截图，无需将文件上传到外部服务。Parsel 提供了丰富的 API 选项，允许用户根据需要调整解析参数，如 OCR 语言设置、DPI 等，以适应不同复杂度的应用场景。该工具非常适合需要在 PHP 应用中直接处理文档内容而不依赖云端服务的情况，尤其是在对数据隐私有严格要求或网络环境受限的环境中使用。",2,"2026-06-11 04:08:25","CREATED_QUERY"]