[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-8572":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":30,"readmeContent":31,"aiSummary":32,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":33,"discoverSource":34},8572,"tesseract-ocr-for-php","thiagoalessio\u002Ftesseract-ocr-for-php","thiagoalessio","A wrapper to work with Tesseract OCR inside PHP.","https:\u002F\u002Fpackagist.org\u002Fpackages\u002Fthiagoalessio\u002Ftesseract_ocr",null,"PHP",3033,554,116,7,0,2,5,1,61.73,"MIT License",false,"main",[25,26,27,28,29],"image-to-text","ocr","php","tesseract","text-recognition","2026-06-12 04:00:40","# Tesseract OCR for PHP\n\nA wrapper to work with Tesseract OCR inside PHP.\n\n[![CI][ci_badge]][ci]\n[![AppVeyor][appveyor_badge]][appveyor]\n[![Codacy][codacy_badge]][codacy]\n[![Test Coverage][test_coverage_badge]][test_coverage]\n\u003Cbr\u002F>\n[![Latest Stable Version][stable_version_badge]][packagist]\n[![Total Downloads][total_downloads_badge]][packagist]\n[![Monthly Downloads][monthly_downloads_badge]][packagist]\n\n## Installation\n\nVia [Composer][]:\n\n    $ composer require thiagoalessio\u002Ftesseract_ocr\n\n:bangbang: **This library depends on [Tesseract OCR][], version _3.02_ or later.**\n\n\u003Cbr\u002F>\n\n### ![][windows_icon] Note for Windows users\n\nThere are [many ways][tesseract_installation_on_windows] to install\n[Tesseract OCR][] on your system, but if you just want something quick to\nget up and running, I recommend installing the [Capture2Text][] package with\n[Chocolatey][].\n\n    choco install capture2text --version 3.9\n\n:warning: Recent versions of [Capture2Text][] stopped shipping the `tesseract` binary.\n\n\u003Cbr\u002F>\n\n### ![][macos_icon] Note for macOS users\n\nWith [MacPorts][] you can install support for individual languages, like so:\n\n    $ sudo port install tesseract-\u003Clangcode>\n\nBut that is not possible with [Homebrew][]. It comes only with **English** support\nby default, so if you intend to use it for other language, the quickest solution\nis to install them all:\n\n    $ brew install tesseract tesseract-lang\n\n\u003Cbr\u002F>\n\n## Usage\n\n### Basic usage\n\n\u003Cimg align=\"right\" width=\"50%\" title=\"The quick brown fox jumps over the lazy dog.\" src=\".\u002Ftests\u002FEndToEnd\u002Fimages\u002Ftext.png\"\u002F>\n\n```php\nuse thiagoalessio\\TesseractOCR\\TesseractOCR;\necho (new TesseractOCR('text.png'))\n    ->run();\n```\n\n```\nThe quick brown fox\njumps over\nthe lazy dog.\n```\n\n\u003Cbr\u002F>\n\n### Other languages\n\n\u003Cimg align=\"right\" width=\"50%\" title=\"Bülowstraße\" src=\".\u002Ftests\u002FEndToEnd\u002Fimages\u002Fgerman.png\"\u002F>\n\n```php\nuse thiagoalessio\\TesseractOCR\\TesseractOCR;\necho (new TesseractOCR('german.png'))\n    ->lang('deu')\n    ->run();\n```\n\n```\nBülowstraße\n```\n\n\u003Cbr\u002F>\n\n### Multiple languages\n\n\u003Cimg align=\"right\" width=\"50%\" title=\"I eat すし y Pollo\" src=\".\u002Ftests\u002FEndToEnd\u002Fimages\u002Fmixed-languages.png\"\u002F>\n\n```php\nuse thiagoalessio\\TesseractOCR\\TesseractOCR;\necho (new TesseractOCR('mixed-languages.png'))\n    ->lang('eng', 'jpn', 'spa')\n    ->run();\n```\n\n```\nI eat すし y Pollo\n```\n\n\u003Cbr\u002F>\n\n### Inducing recognition\n\n\u003Cimg align=\"right\" width=\"50%\" title=\"8055\" src=\".\u002Ftests\u002FEndToEnd\u002Fimages\u002F8055.png\"\u002F>\n\n```php\nuse thiagoalessio\\TesseractOCR\\TesseractOCR;\necho (new TesseractOCR('8055.png'))\n    ->allowlist(range('A', 'Z'))\n    ->run();\n```\n\n```\nBOSS\n```\n\n\u003Cbr\u002F>\n\n### Breaking CAPTCHAs\n\nYes, I know some of you might want to use this library for the *noble* purpose\nof breaking CAPTCHAs, so please take a look at this comment:\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fissues\u002F91#issuecomment-342290510>\n\n## API\n\n### run\n\nExecutes a `tesseract` command, optionally receiving an integer as `timeout`,\nin case you experience stalled tesseract processes.\n\n```php\n$ocr = new TesseractOCR();\n$ocr->run();\n```\n```php\n$ocr = new TesseractOCR();\n$timeout = 500;\n$ocr->run($timeout);\n```\n\n### image\n\nDefine the path of an image to be recognized by `tesseract`.\n\n```php\n$ocr = new TesseractOCR();\n$ocr->image('\u002Fpath\u002Fto\u002Fimage.png');\n$ocr->run();\n```\n\n### imageData\n\nSet the image to be recognized by `tesseract` from a string, with its size.\nThis can be useful when dealing with files that are already loaded in memory.\nYou can easily retrieve the image data and size of an image object :\n```php\n\u002F\u002FUsing Imagick\n$data = $img->getImageBlob();\n$size = $img->getImageLength();\n\u002F\u002FUsing GD\nob_start();\n\u002F\u002F Note that you can use any format supported by tesseract\nimagepng($img, null, 0);\n$size = ob_get_length();\n$data = ob_get_clean();\n\n$ocr = new TesseractOCR();\n$ocr->imageData($data, $size);\n$ocr->run();\n```\n\n### executable\n\nDefine a custom location of the `tesseract` executable,\nif by any reason it is not present in the `$PATH`.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->executable('\u002Fpath\u002Fto\u002Ftesseract')\n    ->run();\n```\n\n### version\n\nReturns the current version of `tesseract`.\n\n```php\necho (new TesseractOCR())->version();\n```\n\n### availableLanguages\n\nReturns a list of available languages\u002Fscripts.\n\n```php\nforeach((new TesseractOCR())->availableLanguages() as $lang) echo $lang;\n```\n\n__More info:__ \u003Chttps:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\u002Fblob\u002Fmaster\u002Fdoc\u002Ftesseract.1.asc#languages-and-scripts>\n\n### tessdataDir\n\nSpecify a custom location for the tessdata directory.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->tessdataDir('\u002Fpath')\n    ->run();\n```\n\n### userWords\n\nSpecify the location of user words file.\n\nThis is a plain text file containing a list of words that you want to be\nconsidered as a normal dictionary words by `tesseract`.\n\nUseful when dealing with contents that contain technical terminology, jargon,\netc.\n\n```\n$ cat \u002Fpath\u002Fto\u002Fuser-words.txt\nfoo\nbar\n```\n\n```php\necho (new TesseractOCR('img.png'))\n    ->userWords('\u002Fpath\u002Fto\u002Fuser-words.txt')\n    ->run();\n```\n\n### userPatterns\n\nSpecify the location of user patterns file.\n\nIf the contents you are dealing with have known patterns, this option can help\na lot tesseract's recognition accuracy.\n\n```\n$ cat \u002Fpath\u002Fto\u002Fuser-patterns.txt'\n1-\\d\\d\\d-GOOG-441\nwww.\\n\\\\\\*.com\n```\n\n```php\necho (new TesseractOCR('img.png'))\n    ->userPatterns('\u002Fpath\u002Fto\u002Fuser-patterns.txt')\n    ->run();\n```\n\n### lang\n\nDefine one or more languages to be used during the recognition.\nA complete list of available languages can be found at:\n\u003Chttps:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\u002Fblob\u002Fmaster\u002Fdoc\u002Ftesseract.1.asc#languages>\n\n__Tip from [@daijiale][]:__ Use the combination `->lang('chi_sim', 'chi_tra')`\nfor proper recognition of Chinese.\n\n```php\n echo (new TesseractOCR('img.png'))\n     ->lang('lang1', 'lang2', 'lang3')\n     ->run();\n```\n\n### psm\n\nSpecify the Page Segmentation Method, which instructs `tesseract` how to\ninterpret the given image.\n\n__More info:__ \u003Chttps:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\u002Fwiki\u002FImproveQuality#page-segmentation-method>\n\n```php\necho (new TesseractOCR('img.png'))\n    ->psm(6)\n    ->run();\n```\n\n### oem\n\nSpecify the OCR Engine Mode. (see `tesseract --help-oem`)\n\n```php\necho (new TesseractOCR('img.png'))\n    ->oem(2)\n    ->run();\n```\n\n### dpi\n\nSpecify the image DPI. It is useful if your image does not contain this information in its metadata.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->dpi(300)\n    ->run();\n```\n\n### allowlist\n\nThis is a shortcut for `->config('tessedit_char_whitelist', 'abcdef....')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->allowlist(range('a', 'z'), range(0, 9), '-_@')\n    ->run();\n```\n\n### configFile\n\nSpecify a config file to be used. It can either be the path to your own\nconfig file or the name of one of the predefined config files:\n\u003Chttps:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\u002Ftree\u002Fmaster\u002Ftessdata\u002Fconfigs>\n\n```php\necho (new TesseractOCR('img.png'))\n    ->configFile('hocr')\n    ->run();\n```\n\n### setOutputFile\n\nSpecify an Outputfile to be used. Be aware: If you set an outputfile then\nthe option `withoutTempFiles` is ignored.\nTempfiles are written (and deleted) even if `withoutTempFiles = true`.\n\nIn combination with `configFile` you are able to get the `hocr`, `tsv` or\n`pdf` files.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->configFile('pdf')\n    ->setOutputFile('\u002FPATH_TO_MY_OUTPUTFILE\u002Fsearchable.pdf')\n    ->run();\n```\n\n### digits\n\nShortcut for `->configFile('digits')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->digits()\n    ->run();\n```\n\n### hocr\n\nShortcut for `->configFile('hocr')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->hocr()\n    ->run();\n```\n\n### pdf\n\nShortcut for `->configFile('pdf')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->pdf()\n    ->run();\n```\n\n### quiet\n\nShortcut for `->configFile('quiet')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->quiet()\n    ->run();\n```\n\n### tsv\n\nShortcut for `->configFile('tsv')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->tsv()\n    ->run();\n```\n\n### txt\n\nShortcut for `->configFile('txt')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->txt()\n    ->run();\n```\n\n### tempDir\n\nDefine a custom directory to store temporary files generated by tesseract.\nMake sure the directory actually exists and the user running `php` is allowed\nto write in there.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->tempDir('.\u002Fmy\u002Fcustom\u002Ftemp\u002Fdir')\n    ->run();\n```\n\n### withoutTempFiles\n\nSpecify that `tesseract` should output the recognized text without writing to temporary files.\nThe data is gathered from the standard output of `tesseract` instead.\n\n```php\necho (new TesseractOCR('img.png'))\n    ->withoutTempFiles()\n    ->run();\n```\n\n### Other options\n\nAny configuration option offered by Tesseract can be used like that:\n\n```php\necho (new TesseractOCR('img.png'))\n    ->config('config_var', 'value')\n    ->config('other_config_var', 'other value')\n    ->run();\n```\n\nOr like that:\n\n```php\necho (new TesseractOCR('img.png'))\n    ->configVar('value')\n    ->otherConfigVar('other value')\n    ->run();\n```\n\n__More info:__ \u003Chttps:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\u002Fwiki\u002FControlParams>\n\n### Thread-limit\n\nSometimes, it may be useful to limit the number of threads that tesseract is\nallowed to use (e.g. in [this case](https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\u002Fissues\u002F898)).\nSet the maxmium number of threads as param for the `run` function:\n\n```php\necho (new TesseractOCR('img.png'))\n    ->threadLimit(1)\n    ->run();\n```\n\n## How to contribute\n\nYou can contribute to this project by:\n\n* Opening an [Issue][] if you found a bug or wish to propose a new feature;\n* Placing a [Pull Request][] with code that fix a bug, missing\u002Fwrong documentation\n  or implement a new feature;\n\nJust make sure you take a look at our [Code of Conduct][] and [Contributing][]\ninstructions.\n\n## License\n\ntesseract-ocr-for-php is released under the [MIT License][].\n\n\n\u003Ch2>\u003C\u002Fh2>\u003Cp align=\"center\">\u003Csub>Made with \u003Csub>\u003Ca href=\"#\">\u003Cimg src=\"https:\u002F\u002Fthiagoalessio.github.io\u002Ftesseract-ocr-for-php\u002Fimages\u002Fheart.svg\" alt=\"love\" width=\"14px\"\u002F>\u003C\u002Fa>\u003C\u002Fsub> in Berlin\u003C\u002Fsub>\u003C\u002Fp>\n\n[ci_badge]: https:\u002F\u002Fgithub.com\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fworkflows\u002FCI\u002Fbadge.svg?event=push&branch=main\n[ci]: https:\u002F\u002Fgithub.com\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Factions?query=workflow%3ACI\n[appveyor_badge]: https:\u002F\u002Fci.appveyor.com\u002Fapi\u002Fprojects\u002Fstatus\u002Fxwy5ls0798iwcim3\u002Fbranch\u002Fmain?svg=true\n[appveyor]: https:\u002F\u002Fci.appveyor.com\u002Fproject\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fbranch\u002Fmain\n[codacy_badge]: https:\u002F\u002Fapp.codacy.com\u002Fproject\u002Fbadge\u002FGrade\u002Fa81aa10012874f23a57df5b492d835f2\n[codacy]: https:\u002F\u002Fapp.codacy.com\u002Fgh\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fdashboard\n[test_coverage_badge]: https:\u002F\u002Fcodecov.io\u002Fgh\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg?token=Y0VnrqiSIf\n[test_coverage]: https:\u002F\u002Fcodecov.io\u002Fgh\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\n[stable_version_badge]: https:\u002F\u002Fimg.shields.io\u002Fpackagist\u002Fv\u002Fthiagoalessio\u002Ftesseract_ocr.svg\n[packagist]: https:\u002F\u002Fpackagist.org\u002Fpackages\u002Fthiagoalessio\u002Ftesseract_ocr\n[total_downloads_badge]: https:\u002F\u002Fimg.shields.io\u002Fpackagist\u002Fdt\u002Fthiagoalessio\u002Ftesseract_ocr.svg\n[monthly_downloads_badge]: https:\u002F\u002Fimg.shields.io\u002Fpackagist\u002Fdm\u002Fthiagoalessio\u002Ftesseract_ocr.svg\n[Tesseract OCR]: https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\n[Composer]: http:\u002F\u002Fgetcomposer.org\u002F\n[windows_icon]: https:\u002F\u002Fthiagoalessio.github.io\u002Ftesseract-ocr-for-php\u002Fimages\u002Fwindows-18.svg\n[macos_icon]: https:\u002F\u002Fthiagoalessio.github.io\u002Ftesseract-ocr-for-php\u002Fimages\u002Fapple-18.svg\n[tesseract_installation_on_windows]: https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\u002Fwiki#windows\n[Capture2Text]: https:\u002F\u002Fchocolatey.org\u002Fpackages\u002Fcapture2text\n[Chocolatey]: https:\u002F\u002Fchocolatey.org\n[MacPorts]: https:\u002F\u002Fwww.macports.org\n[Homebrew]: https:\u002F\u002Fbrew.sh\n[@daijiale]: https:\u002F\u002Fgithub.com\u002Fdaijiale\n[HOCR]: https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\u002Fwiki\u002FCommand-Line-Usage#hocr-output\n[TSV]: https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract\u002Fwiki\u002FCommand-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github\n[Issue]: https:\u002F\u002Fgithub.com\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fissues\n[Pull Request]: https:\u002F\u002Fgithub.com\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fpulls\n[Code of Conduct]: https:\u002F\u002Fgithub.com\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fblob\u002Fmain\u002F.github\u002FCODE_OF_CONDUCT.md\n[Contributing]: https:\u002F\u002Fgithub.com\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fblob\u002Fmain\u002F.github\u002FCONTRIBUTING.md\n[MIT License]: https:\u002F\u002Fgithub.com\u002Fthiagoalessio\u002Ftesseract-ocr-for-php\u002Fblob\u002Fmain\u002FMIT-LICENSE\n","该项目是一个用于在PHP环境中操作Tesseract OCR的封装库。它支持将图像中的文本识别并提取出来，具有多语言支持、字符白名单设置等核心功能，能够处理包括英文、日文、西班牙文在内的多种语言混合文本。该库依赖于Tesseract OCR 3.02或更高版本，并通过Composer进行安装管理。适用于需要在Web应用中集成OCR技术以实现文档数字化、自动化数据输入验证等场景。","2026-06-11 03:18:40","top_language"]