[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82091":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":8,"rankLanguage":8,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},82091,"MinerU-Popo","opendatalab\u002FMinerU-Popo","opendatalab",null,"Python",147,10,35,3,0,42,63,109,126,3.12,"MIT License",false,"master",[],"2026-06-12 02:04:23","# MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2605.24973\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.12882-b31b1b?style=flat-square&logo=arxiv\" alt=\"arXiv\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FDreamEternal\u002FMinerU-Popo\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97_Dataset-HuggingFace-yellow?style=flat-square\" alt=\"Hugging Face dataset\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\".\u002FLICENSE.txt\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-green?style=flat-square\" alt=\"License MIT\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cb>If you like our project, please give us a star ⭐ on GitHub for the latest update.\u003C\u002Fb>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  📖 \u003Ca href=\".\u002FREADME.md\">\u003Cb>English\u003C\u002Fb>\u003C\u002Fa> &nbsp;|&nbsp; \u003Ca href=\".\u002FREADME_zh.md\">\u003Cb>简体中文\u003C\u002Fb>\u003C\u002Fa>\n\u003C\u002Fp>\n\n![image](.\u002Ffigures\u002Fintro.png)\n\n## ✨ Introduction\n**MinerU-Popo** is a lightweight and universal framework for POst-Processing OCR outputs, bridging the gap between page-level OCR parsing and document-level semantic structure.\nIt constructs document tree structures with a 4B post-processing model that performs four subtasks: table truncation analysis, text truncation analysis, title hierarchy analysis, and image-text association analysis. We handle the challenges of cross-page geometric discontinuity, redundant document parsing, and scalability to long documents via:\n\n- **Task-Oriented Data Engine**: Generate representative training data and simplify the task-specific input.\n- **Dynamic Chunking and Synchronization**: Process long document by dynamic chunks and reduce deviations across chunks to preserve global consistency.\n- **Document Enrichment**: Structurally construct a tree, semantically generate summaries and split long-section nodes.\n\n![image](.\u002Ffigures\u002Foverview.png)\n\n## 📊 Performance\n\n### Better Hierarchy (TEDS) after Post-Processing\n**Basic OCR** | **Before** | **After**\n:---:|:---:|:---:|\n MinerU | 53.7 | **90.6** |\n MonkeyOCR | 48.9 | **87.4** |\n Dolphin | 60.4 | **83.5** |\n PaddleOCR | 59.3 | **82.6** |\n GLM-OCR | 53.5 | **81.8** |\n\n### Advantages Compared to Directly Using Pre-trained Model\n**Model** | **TEDS** | **Doc\u002Fs**\n:---:|:---:|:---:|\n MinerU-Popo | **90.6** | **0.37** |\n Qwen3-VL-2B | 21.2 | 0.22 |\n Qwen3-VL-4B | 56.5 | 0.20 |\n Qwen3-VL-8B | 65.9 | 0.16 |\n Qwen3-VL-32B | 78.0 | 0.04 |\n\n### Benefits for Downstream Retrieval and Analysis (Acc on ViDoRe V3)\n**Method** | **C.S.** | **Fin.** | **H.R.** | **Ind.** | **Phar.**\n:---:|:---:|:---:|:---:|:---:|:---:|\n MinerU-Popo | **84.4** | 49.5 | **66.8** | 58.7 | **71.6**\n Raw RAG | 82.3 | 48.7 | 63.2 | **60.4** | 64.4\n Visual RAG | 80.7 | **58.4** | 64.8 | 59.7 | 67.6\n\n## ⚙️ Setup\n\n1. Prepare Environment\n```bash\nconda create -n popo python=3.10\nconda activate popo\npip install -r requirements.txt\n```\n\n2. Download Model\n\nDownload the MinerU-Popo post-processing model:\n\n```bash\nhf download DreamEternal\u002FMinerU-Popo --local-dir models\u002FMineru-Popo\n```\n\n- [MinerU-Popo](https:\u002F\u002Fhuggingface.co\u002FDreamEternal\u002FMinerU-Popo)\n\n1. Model Configuration\n\nIn the [Configuration](.\u002Fpost_processing\u002Fmodel_utils.py),\nfor transformer inference, edit the environment `POPO_MODEL_PATH`. For vllm inference, edit the `url` and `key` in function `popo_generate`.\n\nFor enrichment and question answering, further edit the `url` and `key` in `qwen_generate` and `gpt_generate`.\n\n## 💻 Usage\n\nThe post-processing pipeline takes page-level parsing results from OCR\u002Flayout systems, normalizes them into a unified schema, runs MinerU-Popo inference, and finally builds document trees.\n\n### Step 1: Prepare OCR\u002FLayout Outputs\n\nRun your preferred page-level parser first, such as MinerU, MonkeyOCR, Dolphin, PaddleOCR-VL, or GLM-OCR. Place each model's output under:\n\n```text\npost-process\u002F\u003Cmodel_name>\u002F\n```\n\nFor example:\n\n```text\npost-process\u002Fmineru\u002F\npost-process\u002Fmonkeyocr\u002F\npost-process\u002FPaddleOCR-VL-1.5\u002F\npost-process\u002Fdolphin\u002F\npost-process\u002Fglm-ocr\u002F\n```\n\n### Step 2: Normalize Labels\n\nConvert raw model-specific labels and bounding boxes into the unified MinerU-Popo input format:\n\n```bash\nbash scripts\u002Frun_label_normalization.sh\n```\n\nThe normalized outputs are written to:\n\n```text\noutputs\u002Flabel_normalization\u002F\u003Cmodel_name>\u002F\n```\n\n### Step 3: Run MinerU-Popo Inference\n\nRun MinerU-Popo on the normalized labels:\n\n```bash\nbash scripts\u002Frun_inference.sh\n```\n\nThe inference outputs are written to:\n\n```text\noutputs\u002Finference\u002F\u003Cmodel_name>\u002F\n```\n\n### Step 4: Build Document Trees\n\nBuild structured document trees from the inference outputs:\n\n```bash\nbash scripts\u002Fbuild_tree.sh\n```\n\nThe final tree outputs and text previews are written to:\n\n```text\noutputs\u002Fbuild_tree\u002F\u003Cmodel_name>\u002F\noutputs\u002Fbuild_tree_txt\u002F\u003Cmodel_name>\u002F\n```\n\nExample tree outputs are provided in:\n\n```text\noutput_cases\u002F\n```\n\n\n\n## 🙏 Acknowledgements\n- [MinerU](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU) and other OCR system (MonkeyOCR, Dolphin, PaddleOCR, GLM-OCR) for page-level parsing.\n- [ViDoRe V3](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fvidore\u002Fvidore-benchmark-v3) and [MMDA](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FDreamEternal\u002FMMDA_Bench) as benchmarks.\n\n## 📄 License\nThis project is licensed under the MIT License. See the [LICENSE](.\u002FLICENSE) file for details.\n","MinerU-Popo 是一个轻量级且通用的框架，用于OCR输出后的后处理，旨在填补页面级OCR解析与文档级语义结构之间的空白。其核心功能包括通过4B模型执行四个子任务：表格截断分析、文本截断分析、标题层级分析和图文关联分析。项目采用任务导向的数据引擎生成代表性训练数据并简化特定任务输入；利用动态分块和同步技术处理长文档以保持全局一致性；并通过文档丰富化结构构建树形结构、生成摘要及拆分长节点。此工具适用于需要提升OCR识别结果准确性和文档结构化程度的场景，如金融报告、法律文件等复杂文档的处理。",2,"2026-06-11 04:07:42","CREATED_QUERY"]