[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71976":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":37,"discoverSource":38},71976,"Dolphin","bytedance\u002FDolphin","bytedance","The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.","",null,"Python",9013,765,70,69,0,2,6,81,79.75,"Other",false,"master",[25,26,27,28,29,30,31,32,33],"document-analysis","layout-analysis","ocr","parser","pdf","pdf-converter","pdf-parser","python","vlm-ocr","2026-06-17 04:01:01","\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fdolphin.png\" width=\"300\">\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14059\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-red\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FByteDance\u002FDolphin-v2\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-Dolphin-yellow\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbytedance\u002FDolphin\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode-Github-green\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-lightgray\">\n  \u003C\u002Fa>\n  \u003Cbr>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fdemo.gif\" width=\"800\">\n\u003C\u002Fdiv>\n\n# Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting\nDolphin-v2 is an enhanced universal document parsing model that substantially improves upon the original Dolphin. It seamlessly handles any document type—whether digital-born or photographed—through a document-type-aware two-stage architecture with scalable anchor prompting.\n\n\n## 📑 Overview\n\nDocument image parsing is challenging due to diverse document types and complexly intertwined elements such as text paragraphs, figures, formulas, tables, and code blocks. Dolphin-v2 addresses these challenges through a document-type-aware two-stage approach:\n\n1. **🔍 Stage 1**: Document type classification (digital vs. photographed) + layout analysis with reading order prediction\n2. **🧩 Stage 2**: Hybrid parsing strategy - holistic parsing for photographed documents, parallel element-wise parsing for digital documents\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fframework.png\" width=\"680\">\n\u003C\u002Fdiv>\n\nDolphin achieves promising performance across diverse page-level and element-level parsing tasks while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism.\n\n\u003C!-- ## 🚀 Demo\nTry our demo on [Demo-Dolphin](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FByteDance\u002FDolphin). -->\n\n## 📅 Changelog\n- 🔥 **2025.12.12** Released *Dolphin-v2* model. Upgraded to 3B parameters with 21-element detection, attribute field extraction, dedicated formula\u002Fcode parsing, and robust photographed document parsing. (Dolphin-1.5 moved to [v1.5 branch](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FDolphin\u002Ftree\u002Fv1.5))\n- 🔥 **2025.10.16** Released *Dolphin-1.5* model. While maintaining the lightweight 0.3B architecture, this version achieves significant parsing improvements. (Dolphin 1.0 moved to [v1.0 branch](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FDolphin\u002Ftree\u002Fv1.0))\n- 🔥 **2025.07.10** Released the *Fox-Page Benchmark*, a manually refined subset of the original [Fox dataset](https:\u002F\u002Fgithub.com\u002Fucaslcl\u002FFox). Download via: [Baidu Yun](https:\u002F\u002Fpan.baidu.com\u002Fshare\u002Finit?surl=t746ULp6iU5bUraVrPlMSw&pwd=fox1) | [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1yZQZqI34QCqvhB4Tmdl3X_XEvYvQyP0q\u002Fview?usp=sharing).\n- 🔥 **2025.06.30** Added [TensorRT-LLM support](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FDolphin\u002Fblob\u002Fmaster\u002Fdeployment\u002Ftensorrt_llm\u002FReadMe.md) for accelerated inference！\n- 🔥 **2025.06.27** Added [vLLM support](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FDolphin\u002Fblob\u002Fmaster\u002Fdeployment\u002Fvllm\u002FReadMe.md) for accelerated inference！\n- 🔥 **2025.06.13** Added multi-page PDF document parsing capability.\n- 🔥 **2025.05.21** Our demo is released at [link](http:\u002F\u002F115.190.42.15:8888\u002Fdolphin\u002F). Check it out!\n- 🔥 **2025.05.20** The pretrained model and inference code of Dolphin are released.\n- 🔥 **2025.05.16** Our paper has been accepted by ACL 2025. Paper link: [arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14059).\n\n## 📈 Performance\n\n\u003Ctable style=\"width:90%; border-collapse: collapse; text-align: center;\">\n    \u003Ccaption>Comprehensive evaluation of document parsing on OmniDocBench (v1.5)\u003C\u002Fcaption>\n    \u003Cthead>\n        \u003Ctr>\n            \u003Cth style=\"text-align: center !important;\">Model\u003C\u002Fth>\n            \u003Cth style=\"text-align: center !important;\">Size\u003C\u002Fth>\n            \u003Cth style=\"text-align: center !important;\">Overall&#x2191;\u003C\u002Fth>\n            \u003Cth style=\"text-align: center !important;\">Text\u003Csup>Edit\u003C\u002Fsup>&#x2193;\u003C\u002Fth>\n            \u003Cth style=\"text-align: center !important;\">Formula\u003Csup>CDM\u003C\u002Fsup>&#x2191;\u003C\u002Fth>\n            \u003Cth style=\"text-align: center !important;\">Table\u003Csup>TEDS\u003C\u002Fsup>&#x2191;\u003C\u002Fth>\n            \u003Cth style=\"text-align: center !important;\">Table\u003Csup>TEDS-S\u003C\u002Fsup>&#x2191;\u003C\u002Fth>\n            \u003Cth style=\"text-align: center !important;\">Read Order\u003Csup>Edit\u003C\u002Fsup>&#x2193;\u003C\u002Fth>\n        \u003C\u002Ftr>\n    \u003C\u002Fthead>\n    \u003Ctbody>\n        \u003Ctr>\n            \u003Ctd>Dolphin\u003C\u002Ftd>\n            \u003Ctd>0.3B\u003C\u002Ftd>\n            \u003Ctd>74.67\u003C\u002Ftd>\n            \u003Ctd>0.125\u003C\u002Ftd>\n            \u003Ctd>67.85\u003C\u002Ftd>\n            \u003Ctd>68.70\u003C\u002Ftd>\n            \u003Ctd>77.77\u003C\u002Ftd>\n            \u003Ctd>0.124\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n            \u003Ctd>Dolphin-1.5\u003C\u002Ftd>\n            \u003Ctd>0.3B\u003C\u002Ftd>\n            \u003Ctd>85.06\u003C\u002Ftd>\n            \u003Ctd>0.085\u003C\u002Ftd>\n            \u003Ctd>79.44\u003C\u002Ftd>\n            \u003Ctd>84.25\u003C\u002Ftd>\n            \u003Ctd>88.06\u003C\u002Ftd>\n            \u003Ctd>0.071\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n            \u003Ctd>Dolphin-v2\u003C\u002Ftd>\n            \u003Ctd>3B\u003C\u002Ftd>\n            \u003Ctd>\u003Cstrong>89.78\u003C\u002Fstrong>\u003C\u002Ftd>\n            \u003Ctd>\u003Cstrong>0.054\u003C\u002Fstrong>\u003C\u002Ftd>\n            \u003Ctd>\u003Cstrong>87.63\u003C\u002Fstrong>\u003C\u002Ftd>\n            \u003Ctd>\u003Cstrong>87.02\u003C\u002Fstrong>\u003C\u002Ftd>\n            \u003Ctd>\u003Cstrong>90.48\u003C\u002Fstrong>\u003C\u002Ftd>\n            \u003Ctd>\u003Cstrong>0.054\u003C\u002Fstrong>\u003C\u002Ftd>\n        \u003C\u002Ftr>\n    \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n## 🛠️ Installation\n\n1. Clone the repository:\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002FByteDance\u002FDolphin.git\n   cd Dolphin\n   ```\n\n2. Install the dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. Download the pre-trained models of *Dolphin-v2*:\n\n   Visit our Huggingface [model card](https:\u002F\u002Fhuggingface.co\u002FByteDance\u002FDolphin-v2), or download model by:\n   \n   ```bash\n   # Download the model from Hugging Face Hub\n   git lfs install\n   git clone https:\u002F\u002Fhuggingface.co\u002FByteDance\u002FDolphin-v2 .\u002Fhf_model\n   # Or use the Hugging Face CLI\n   pip install huggingface_hub\n   huggingface-cli download ByteDance\u002FDolphin-v2 --local-dir .\u002Fhf_model\n   ```\n\n## ⚡ Inference\n\nDolphin provides two inference frameworks with support for two parsing granularities:\n- **Page-level Parsing**: Parse the entire document page into a structured JSON and Markdown format\n- **Element-level Parsing**: Parse individual document elements (text, table, formula)\n\n\n### 📄 Page-level Parsing\n\n```bash\n# Process a single document image\npython demo_page.py --model_path .\u002Fhf_model --save_dir .\u002Fresults \\\n    --input_path .\u002Fdemo\u002Fpage_imgs\u002Fpage_1.png \n\n# Process a single document pdf\npython demo_page.py --model_path .\u002Fhf_model --save_dir .\u002Fresults \\\n    --input_path .\u002Fdemo\u002Fpage_imgs\u002Fpage_6.pdf \n\n# Process all documents in a directory\npython demo_page.py --model_path .\u002Fhf_model --save_dir .\u002Fresults \\\n    --input_path .\u002Fdemo\u002Fpage_imgs \n\n# Process with custom batch size for parallel element decoding\npython demo_page.py --model_path .\u002Fhf_model --save_dir .\u002Fresults \\\n    --input_path .\u002Fdemo\u002Fpage_imgs \\\n    --max_batch_size 8\n```\n\n### 🧩 Element-level Parsing\n\n````bash\n# Process element images (specify element_type: table, formula, text, or code)\npython demo_element.py --model_path .\u002Fhf_model --save_dir .\u002Fresults \\\n    --input_path  \\\n    --element_type [table|formula|text|code]\n````\n\n### 🎨 Layout Parsing\n````bash\n# Process a single document image\npython demo_layout.py --model_path .\u002Fhf_model --save_dir .\u002Fresults \\\n    --input_path .\u002Fdemo\u002Fpage_imgs\u002Fpage_1.png \\\n    \n# Process a single PDF document\npython demo_layout.py --model_path .\u002Fhf_model --save_dir .\u002Fresults \\\n    --input_path .\u002Fdemo\u002Fpage_imgs\u002Fpage_6.pdf \\\n\n# Process all documents in a directory\npython demo_layout.py --model_path .\u002Fhf_model --save_dir .\u002Fresults \\\n    --input_path .\u002Fdemo\u002Fpage_imgs \n````\n\n\n## 🌟 Key Features\n\n- 🔄 Two-stage analyze-then-parse approach based on a single VLM\n- 📊 Promising performance on document parsing tasks\n- 🔍 Natural reading order element sequence generation\n- 🧩 Heterogeneous anchor prompting for different document elements\n- ⏱️ Efficient parallel parsing mechanism\n- 🤗 Support for Hugging Face Transformers for easier integration\n\n\n## 📮 Notice\n**Call for Bad Cases:** If you have encountered any cases where the model performs poorly, we would greatly appreciate it if you could share them in the issue. We are continuously working to optimize and improve the model.\n\n## 💖 Acknowledgement\n\nWe would like to acknowledge the following open-source projects that provided inspiration and reference for this work:\n- [OmniDocBench](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FOmniDocBench)\n- [Donut](https:\u002F\u002Fgithub.com\u002Fclovaai\u002Fdonut\u002F)\n- [Nougat](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat)\n- [GOT](https:\u002F\u002Fgithub.com\u002FUcas-HaoranWei\u002FGOT-OCR2.0)\n- [MinerU](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Ftree\u002Fmaster)\n- [Swin](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSwin-Transformer)\n- [Hugging Face Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)\n\n## 📝 Citation\n\nIf you find this code useful for your research, please use the following BibTeX entry.\n\n```bibtex\n@article{feng2025dolphin,\n  title={Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting},\n  author={Feng, Hao and Wei, Shu and Fei, Xiang and Shi, Wei and Han, Yingdong and Liao, Lei and Lu, Jinghui and Wu, Binghong and Liu, Qi and Lin, Chunhui and others},\n  journal={arXiv preprint arXiv:2505.14059},\n  year={2025}\n}\n```\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=bytedance\u002FDolphin&type=Date)](https:\u002F\u002Fwww.star-history.com\u002F#bytedance\u002FDolphin&Date)\n","Dolphin 是一个用于文档图像解析的模型，通过异构锚点提示技术显著提升了文档解析能力。其核心功能包括文档类型识别（区分数字化和拍摄的文档）与布局分析、阅读顺序预测以及基于文档类型的两阶段解析策略：对拍摄文档进行整体解析，对数字文档进行并行元素级解析。Dolphin-v2 版本增强了模型参数至30亿，并支持21种元素检测及特定公式\u002F代码解析等功能，适用于多种文档处理场景，如PDF转换、OCR识别等，尤其在处理复杂排版或包含多种内容形式（如文本段落、图表、公式、表格和代码块）的文档时表现出色。该模型采用轻量级架构设计，确保了高效运行。","2026-06-17 03:38:50","high_star"]