PaddleOCR

PaddlePaddle

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

AI 简介

PaddleOCR 是一个开源的高性能光学字符识别（OCR）与文档理解工具包，支持100多种语言，可将图像和PDF文档精准转换为结构化文本数据。其核心功能包括多语言文字检测与识别、版面分析、表格识别、关键信息抽取（KIE）及PDF解析（支持PDF转Markdown等），基于飞桨（PaddlePaddle）框架实现，兼顾轻量化与高精度，支持CPU/GPU/XPU/NPU多硬件部署。适用于智能文档处理、RAG系统构建、金融票据识别、政务材料数字化、教育资料结构化等需要高鲁棒性文档AI能力的场景。

Python

Apache License 2.0

ai4science chineseocr document-parsing document-translation kie ocr paddleocr-vl pdf-extractor-rag pdf-parser pdf2markdown pp-ocr pp-structure rag

在 GitHub 查看官方网站

85.2k

Stars

11k

Forks

553

Watchers

154

Issues

Star 增长

今日0

近 7 天0

近 30 天+1999

综合评分80

默认分支main

PaddleOCR

Star 增长

加入交流群