[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-495":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":43,"readmeContent":44,"aiSummary":45,"trendingCount":16,"starSnapshotCount":16,"syncStatus":46,"lastSyncTime":47,"discoverSource":48},495,"MinerU","opendatalab\u002FMinerU","opendatalab","Transforms complex documents like PDFs and Office docs into LLM-ready markdown\u002FJSON for your Agentic workflows.","https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002F",null,"Python",67244,5666,247,6,0,134,844,4660,665,120,"Other",false,"master",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42],"ai4science","document-analysis","docx","extract-data","layout-analysis","ocr","parser","pdf","pdf-converter","pdf-extractor-llm","pdf-extractor-pretrain","pdf-extractor-rag","pdf-parser","pptx","python","xlsx","2026-06-11 04:00:19","\u003Cdiv align=\"center\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fhtml\">\n\u003C!-- logo -->\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fgcore.jsdelivr.net\u002Fgh\u002Fopendatalab\u002FMinerU@master\u002Fdocs\u002Fimages\u002FMinerU-logo.png\" width=\"300px\" style=\"vertical-align:middle;\">\n\u003C\u002Fp>\n\n\u003C!-- icon -->\n\n[![stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopendatalab\u002FMinerU.svg)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n[![forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopendatalab\u002FMinerU.svg)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n[![open issues](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-raw\u002Fopendatalab\u002FMinerU)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues)\n[![issue resolution](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-closed-raw\u002Fopendatalab\u002FMinerU)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues)\n[![PyPI version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmineru)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmineru\u002F)\n[![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fmineru)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmineru\u002F)\n[![Downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fmineru)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmineru)\n[![Downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fmineru\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmineru)\n[![OpenDataLab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwebapp_on_mineru.net-blue?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMTM0IiBoZWlnaHQ9IjEzNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJtMTIyLDljMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0idXJsKCNhKSIvPjxwYXRoIGQ9Im0xMjIsOWMwLDUtNCw5LTksOXMtOS00LTktOSw0LTksOS05LDksNCw5LDl6IiBmaWxsPSIjMDEwMTAxIi8+PHBhdGggZD0ibTkxLDE4YzAsNS00LDktOSw5cy05LTQtOS05LDQtOSw5LTksOSw0LDksOXoiIGZpbGw9InVybCgjYikiLz48cGF0aCBkPSJtOTEsMThjMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0iIzAxMDEwMSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0idXJsKCNjKSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0iIzAxMDEwMSIvPjxkZWZzPjxsaW5lYXJHcmFkaWVudCBpZD0iYSIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYiIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYyIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjwvZGVmcz48L3N2Zz4=&labelColor=white)](https:\u002F\u002Fmineru.net\u002FOpenSourceTools\u002FExtractor?source=github)\n[![HuggingFace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_HuggingFace-yellow.svg?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t\u002FAAAAk1BMVEVHcEz\u002FnQv\u002FnQv\u002FnQr\u002FnQv\u002FnQr\u002FnQv\u002FnQv\u002FnQr\u002FwRf\u002FtxT\u002Fpg7\u002FyRr\u002FrBD\u002FzRz\u002Fngv\u002FoAz\u002Fzhz\u002Fnwv\u002FtxT\u002Fngv\u002F0B3+zBz\u002FnQv\u002F0h7\u002Fwxn\u002FvRb\u002FthXkuiT\u002FrxH\u002FpxD\u002Fogzcqyf\u002FnQvTlSz\u002FczCxky7\u002FSjifdjT\u002FMj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9\u002FfxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw\u002F1f3UaWcSGYNKTdf\u002FP+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl\u002F6C4s\u002FZLAM45SOi\u002F1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8\u002FPhXiBXPMjLSxtwp8W9f\u002F1AngRierBkA+kk\u002FIpUSOeKByzn8y3kAAAfh\u002F\u002F0oXgV4roHm\u002Fkz4E2z\u002F\u002FzRc3\u002FlgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6\u002FPT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr\u002FcyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61\u002FUj\u002F9H\u002FVzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3kOAp2f1Kf0Weony7pn\u002FcPydvhQYV+eFOfmOu7VB\u002FViPe34\u002FEN3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO\u002FuOvHofxjrV\u002FTNS6iMJS+4TcSTgk9n5agJdBQbB\u002F\u002FIfF\u002FHpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ\u002FptaJq5T\u002F7WcgAZywR\u002FXlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN\u002Fi1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi\u002FhnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX\u002Fe6479yZcLwCBmTxiawEwrOcleuu12t3tbLv\u002FN4RLYIBhYexm7Fcn4OJcn0+zc+s8\u002FVfPeddZHAGN6TT8eGczHdR\u002FGts1\u002FMzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG\u002FvsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fopendatalab\u002FMinerU)\n[![ModelScope](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_ModelScope-purple?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002FOpenDataLab\u002FMinerU)\n[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgist\u002Fmyhloli\u002Fa3cb16570ab3cfeadf9d8f0ac91b4fca\u002Fmineru_demo.ipynb)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.18839)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22186)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMinerU2.5%20Pro-Technical%20Report-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.04771)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fopendatalab\u002FMinerU)\n\n\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F11174\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Ftrendshift.io\u002Fapi\u002Fbadge\u002Frepositories\u002F11174\" alt=\"opendatalab%2FMinerU | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n\u003C!-- language -->\n\n[English](README.md) | [简体中文](README_zh-CN.md)\n\n\u003C!-- hot link -->\n\n\u003Cp align=\"center\">\n🚀\u003Ca href=\"https:\u002F\u002Fmineru.net\u002F?source=github\">Access MinerU Now→✅ Zero-Install Web Version ✅ Full-Featured Desktop Client ✅ Instant API Access; Skip deployment headaches – get all product formats in one click. Developers, dive in!\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C!-- join us -->\n\n\u003Cp align=\"center\">\n    👋 join us on \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTdedn9GTXq\" target=\"_blank\">Discord\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fmineru.net\u002Fcommunity-portal\u002F?aliasId=3c430f94\" target=\"_blank\">WeChat\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n\n\u003Cdetails>\n\u003Csummary>MinerU — High-accuracy document parsing engine for LLM · RAG · Agent workflows\u003C\u002Fsummary>\nConverts PDF · DOCX · PPTX · XLSX · Images · Web pages into structured Markdown \u002F JSON · VLM+OCR dual engine · 109 languages \u003Cbr>\nMCP Server · LangChain \u002F Dify \u002F FastGPT native integration · 10+ domestic AI chip support\n\n**🔍 Core Parsing Capabilities**\n\n- Native support for `DOCX`, `PPTX`, and `XLSX` parsing\n- Formulas → LaTeX · Tables → HTML, accurate layout reconstruction\n- Supports scanned docs, handwriting, multi-column layouts, cross-page table merging\n- Output follows human reading order with automatic header\u002Ffooter removal\n- VLM + OCR dual engine, 109-language OCR recognition\n\n**🔌 Integration**\n\n| Use Case | Solution |\n|----------|----------|\n| AI Coding Tools | MCP Server — Cursor · Claude Desktop · Windsurf |\n| RAG Frameworks | LangChain · LlamaIndex · RAGFlow · RAG-Anything · Flowise · Dify · FastGPT |\n| Development | Python \u002F Go \u002F TypeScript SDK · CLI · REST API · Docker |\n| No-Code | mineru.net online · Gradio WebUI · Desktop client |\n\n**🖥️ Deployment (Private · Fully Offline)**\n\n| Inference Backend | Best For |\n|------------------|---------|\n| pipeline         | Fast & stable, no hallucination, runs on CPU or GPU |\n| vlm-engine       | High accuracy, supports vLLM \u002F LMDeploy \u002F mlx ecosystem |\n| hybrid-engine    | High accuracy, native text extraction, low hallucination |\n\nDomestic AI chips: Ascend · Cambricon · Enflame · MetaX · Moore Threads · Kunlunxin · Iluvatar · Hygon · Biren · T-Head\n\u003C\u002Fdetails>\n\n# Changelog\n\n- 2026\u002F04\u002F18 3.1.0 Released\n\n  This release focuses on **licensing openness, parsing accuracy, and full-format native support**. The main updates include:\n\n  - License upgrade\n    - MinerU has officially moved from `AGPLv3` to the [MinerU Open Source License](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fblob\u002Fmaster\u002FLICENSE.md), a custom license based on `Apache 2.0`.\n    - This change significantly reduces adoption friction for both community users and commercial deployments, making MinerU easier to integrate into real-world workflows.\n  - VLM main model upgrade\n    - The primary VLM model has been upgraded to `MinerU2.5-Pro-2604-1.2B`, bringing overall parsing accuracy to a state-of-the-art level.\n    - The new model now supports image and chart parsing, truncated paragraph merging, cross-page table merging, and image recognition inside tables, further strengthening performance on complex document layouts.\n  - Full-format native parsing support\n    - Native parsing support has now been extended to `PPTX` and `XLSX`.\n    - MinerU now fully supports parsing across images, `PDF`, `DOCX`, `PPTX`, and `XLSX`, providing a more complete multi-format document understanding workflow.\n\n  With the 3.1.0 release, MinerU becomes more open, more accurate, and easier to adopt in production. The new license lowers the barrier for both community and commercial use, `MinerU2.5-Pro-2604-1.2B` improves parsing quality on complex content, and native `PPTX` \u002F `XLSX` support completes end-to-end coverage of mainstream document formats.\n\n- 2026\u002F03\u002F29 3.0.0 Released\n\n  This release delivers a systematic upgrade centered on **parsing capability, system architecture, and engineering usability**. The main updates include:\n  \n  - Native `DOCX` parsing\n    - Official support for native `DOCX` parsing, delivering high-precision results without hallucinations.\n    - Compared with the traditional workflow of first converting `DOCX` to `PDF` and then parsing it, end-to-end speed is improved by tens of times, making it better suited for scenarios with high requirements for both accuracy and throughput.\n  - `pipeline` backend upgrade\n    - The `pipeline` backend achieves a score of `86.2` on OmniDocBench (v1.5), surpassing the accuracy of the previous-generation mainstream VLM `MinerU2.0-2505-0.9B`.\n    - Added support for parsing images\u002Fformulas inside tables, seal text recognition, vertical text support, and interline formula numbering recognition, continuously improving parsing quality for complex document scenarios.\n    - While maintaining high accuracy, it keeps resource usage extremely low and continues to support inference in pure CPU environments.\n  - `API \u002F CLI \u002F Router` orchestration upgrade\n    - `mineru` now runs as an orchestration client based on `mineru-api`; when `--api-url` is not provided, it will automatically start a local temporary service.\n    - `mineru-api` adds a new asynchronous task endpoint `POST \u002Ftasks`, supporting task submission, status querying, and result retrieval; meanwhile, it retains the synchronous parsing endpoint `POST \u002Ffile_parse` for compatibility with legacy plugins.\n    - Added `mineru-router`, designed for unified entry deployment and task routing across multiple services and multiple GPUs; its interfaces are fully compatible with `mineru-api` and support automatic task load balancing.\n  - Deployment and usability improvements\n    - Resolved compatibility issues with `torch >= 2.8`; the base image has been upgraded to `vllm0.11.2 + torch2.9.0`, unifying installation paths across different Compute Capabilities.\n    - Optimized the parsing pipeline with a sliding-window mechanism, significantly reducing peak memory usage in long-document scenarios, so documents with tens of thousands of pages no longer need to be split manually.\n    - Batch inference in `pipeline` now supports streaming writes to disk, allowing completed parsing results to be written out in time and further improving the experience for long-running tasks.\n    - Completed thread-safety optimization and now fully supports multi-threaded concurrent inference; together with `mineru-router`, this enables one-click multi-GPU deployment and makes it easy to build high-concurrency, high-throughput parsing systems.\n    - Completely removed the use of two AGPLv3 models (`doclayoutyolo` and `mfd_yolov8`) and one CC-BY-NC-SA 4.0 model (`layoutreader`).  \n  \n  This update is not just a set of feature enhancements, but a key leap forward in MinerU's overall system capabilities. We specifically addressed the peak memory usage issue in long-document parsing. Through optimizations such as sliding windows and streaming writes to disk, ultra-long document parsing has moved from “requiring manual splitting and careful handling” to being “stable, scalable, and ready for production workloads.” At the same time, we completed thread-safety optimization and fully enabled multi-threaded concurrent inference, further improving single-machine resource utilization and runtime stability under high-concurrency workloads. On top of this, with `mineru-router` and the new `API \u002F CLI` orchestration framework, MinerU now supports one-click multi-GPU deployment, unified access across multiple services, and automatic task load balancing, significantly reducing the difficulty of large-scale deployment. As a result, MinerU is evolving from a standalone data production tool into a large-scale document parsing foundation for high-concurrency and high-throughput scenarios, providing enterprise-grade document data processing with infrastructure that is more stable, more efficient, and easier to scale.\n\n> 📝 View the complete [Changelog](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Freference\u002Fchangelog\u002F) for more historical version information\n\n# MinerU\n\n## Project Introduction\n\nMinerU is a document parsing tool that converts `PDF`, image, `DOCX`, `PPTX`, and `XLSX` inputs into machine-readable formats such as Markdown and JSON for downstream retrieval, extraction, and processing.\nMinerU was born during the pre-training process of [InternLM](https:\u002F\u002Fgithub.com\u002FInternLM\u002FInternLM). We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models.\nCompared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on [issue](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues) and **attach the relevant document or sample file**.\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4bea02c9-6d54-4cd6-97ed-dff14340982c\n\n## Key Features\n\n- Support `PDF`, image, `DOCX`, `PPTX`, and `XLSX` inputs.\n- Remove headers, footers, footnotes, page numbers, etc., to ensure semantic coherence.\n- Output text in human-readable order, suitable for single-column, multi-column, and complex layouts.\n- Preserve the structure of the original document, including headings, paragraphs, lists, etc.\n- Extract images, image descriptions, tables, table titles, and footnotes.\n- Automatically recognize and convert formulas in the document to LaTeX format.\n- Automatically recognize and convert tables in the document to HTML format.\n- Automatically detect scanned PDFs and garbled PDFs and enable OCR functionality.\n- OCR supports detection and recognition of 109 languages.\n- Supports multiple output formats, such as multimodal and NLP Markdown, JSON sorted by reading order, and rich intermediate formats.\n- Supports various visualization results, including layout visualization and span visualization, for efficient confirmation of output quality.\n- Built-in CLI, FastAPI, Gradio WebUI, for local orchestration and multi-service deployment.\n- Supports running in a pure CPU environment, and also supports GPU\u002FMPS acceleration\n- Compatible with Windows, Linux, and Mac platforms.\n\n# Quick Start\n\nDocument parsing is a difficult and complex task. In scenarios such as complex layouts, scanned pages, and handwritten content, the parsing results may fall short of expectations. We recommend trying the online demo first to evaluate MinerU's parsing quality and suitability before choosing an appropriate deployment method based on your actual needs.\nIf you have **document** samples with unsatisfactory parsing results, feel free to share them in an [issue](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues). We will continue improving the parsing capabilities.\nIf you encounter any installation issues, please first consult the \u003Ca href=\"#faq\">FAQ\u003C\u002Fa>. \n\n## Online Experience\n\n### Official online web application\nThe official online version has the same functionality as the client, with a beautiful interface and rich features, requires login to use  \n \n- [![OpenDataLab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwebapp_on_mineru.net-blue?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMTM0IiBoZWlnaHQ9IjEzNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJtMTIyLDljMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0idXJsKCNhKSIvPjxwYXRoIGQ9Im0xMjIsOWMwLDUtNCw5LTksOXMtOS00LTktOSw0LTksOS05LDksNCw5LDl6IiBmaWxsPSIjMDEwMTAxIi8+PHBhdGggZD0ibTkxLDE4YzAsNS00LDktOSw5cy05LTQtOS05LDQtOSw5LTksOSw0LDksOXoiIGZpbGw9InVybCgjYikiLz48cGF0aCBkPSJtOTEsMThjMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0iIzAxMDEwMSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0idXJsKCNjKSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0iIzAxMDEwMSIvPjxkZWZzPjxsaW5lYXJHcmFkaWVudCBpZD0iYSIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYiIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYyIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjwvZGVmcz48L3N2Zz4=&labelColor=white)](https:\u002F\u002Fmineru.net\u002FOpenSourceTools\u002FExtractor?source=github)\n\n### Gradio-based online demo\nA WebUI developed based on Gradio, with a simple interface and only core parsing functionality, no login required  \n\n- [![ModelScope](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_ModelScope-purple?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002FOpenDataLab\u002FMinerU)\n- [![HuggingFace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_HuggingFace-yellow.svg?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t\u002FAAAAk1BMVEVHcEz\u002FnQv\u002FnQv\u002FnQr\u002FnQv\u002FnQr\u002FnQv\u002FnQv\u002FnQr\u002FwRf\u002FtxT\u002Fpg7\u002FyRr\u002FrBD\u002FzRz\u002Fngv\u002FoAz\u002Fzhz\u002Fnwv\u002FtxT\u002Fngv\u002F0B3+zBz\u002FnQv\u002F0h7\u002Fwxn\u002FvRb\u002FthXkuiT\u002FrxH\u002FpxD\u002Fogzcqyf\u002FnQvTlSz\u002FczCxky7\u002FSjifdjT\u002FMj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9\u002FfxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw\u002F1f3UaWcSGYNKTdf\u002FP+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl\u002F6C4s\u002FZLAM45SOi\u002F1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8\u002FPhXiBXPMjLSxtwp8W9f\u002F1AngRierBkA+kk\u002FIpUSOeKByzn8y3kAAAfh\u002F\u002F0oXgV4roHm\u002Fkz4E2z\u002F\u002FzRc3\u002FlgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6\u002FPT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr\u002FcyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61\u002FUj\u002F9H\u002FVzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3kOAp2f1Kf0Weony7pn\u002FcPydvhQYV+eFOfmOu7VB\u002FViPe34\u002FEN3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO\u002FuOvHofxjrV\u002FTNS6iMJS+4TcSTgk9n5agJdBQbB\u002F\u002FIfF\u002FHpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ\u002FptaJq5T\u002F7WcgAZywR\u002FXlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN\u002Fi1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi\u002FhnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX\u002Fe6479yZcLwCBmTxiawEwrOcleuu12t3tbLv\u002FN4RLYIBhYexm7Fcn4OJcn0+zc+s8\u002FVfPeddZHAGN6TT8eGczHdR\u002FGts1\u002FMzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG\u002FvsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fopendatalab\u002FMinerU)\n\n## Local Deployment\n\n\n> [!WARNING]\n> **Pre-installation Notice—Hardware and Software Environment Support**\n>\n> To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.\n>\n> By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.\n>\n> In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth rowspan=\"2\">Parsing Backend\u003C\u002Fth>\n      \u003Cth rowspan=\"2\">pipeline\u003C\u002Fth>\n      \u003Cth colspan=\"2\">*-auto-engine\u003C\u002Fth>\n      \u003Cth colspan=\"2\">*-http-client\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>hybrid\u003C\u002Fth>\n      \u003Cth>vlm\u003C\u002Fth>\n      \u003Cth>hybrid\u003C\u002Fth>\n      \u003Cth>vlm\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Cth>Backend Features\u003C\u002Fth>\n      \u003Ctd >Good Compatibility\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\">High Hardware Requirements\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\">For OpenAI Compatible Servers\u003Csup>2\u003C\u002Fsup>\u003C\u002Ftd>\n    \u003C\u002Ftr> \n    \u003Ctr>\n      \u003Cth>Accuracy\u003Csup>1\u003C\u002Fsup>\u003C\u002Fth>\n      \u003Ctd style=\"text-align:center;\">85+\u003C\u002Ftd>\n      \u003Ctd colspan=\"4\" style=\"text-align:center;\">95+\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Operating System\u003C\u002Fth>\n      \u003Ctd colspan=\"5\" style=\"text-align:center;\">Linux\u003Csup>3\u003C\u002Fsup> \u002F Windows\u003Csup>4\u003C\u002Fsup> \u002F macOS\u003Csup>5\u003C\u002Fsup>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Pure CPU Support\u003C\u002Fth>\n      \u003Ctd style=\"text-align:center;\">✅\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">❌\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">✅\u003C\u002Ftd>\n    \u003C\u002Ftr>\n        \u003Ctr>\n      \u003Cth>GPU Acceleration\u003C\u002Fth>\n      \u003Ctd colspan=\"4\" style=\"text-align:center;\">Volta and later architecture GPUs or Apple Silicon\u003C\u002Ftd>\n      \u003Ctd rowspan=\"2\">Not Required\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Min VRAM\u003C\u002Fth>\n      \u003Ctd style=\"text-align:center;\">4GB\u003C\u002Ftd>\n      \u003Ctd style=\"text-align:center;\">8GB\u003C\u002Ftd>\n      \u003Ctd style=\"text-align:center;\">8GB\u003C\u002Ftd>\n      \u003Ctd style=\"text-align:center;\">2GB\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>RAM\u003C\u002Fth>\n      \u003Ctd colspan=\"3\" style=\"text-align:center;\">Min 16GB, Recommended 32GB or more\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">Min 16GB\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Disk Space\u003C\u002Fth>\n      \u003Ctd colspan=\"3\" style=\"text-align:center;\">Min 20GB, SSD Recommended\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">Min 2GB\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Python Version\u003C\u002Fth>\n      \u003Ctd colspan=\"5\" style=\"text-align:center;\">3.10-3.13\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\u003Csup>1\u003C\u002Fsup> Accuracy metrics are the End-to-End Evaluation Overall scores from OmniDocBench (v1.6), based on the latest version of `MinerU`.  \n\u003Csup>2\u003C\u002Fsup> Servers compatible with OpenAI API, such as local model servers or remote model services deployed via inference frameworks like `vLLM`\u002F`SGLang`\u002F`LMDeploy`.  \n\u003Csup>3\u003C\u002Fsup> Linux only supports distributions from 2019 and later.  \n\u003Csup>4\u003C\u002Fsup> Since the key dependency `ray` does not support Python 3.13 on Windows, only versions 3.10~3.12 are supported.  \n\u003Csup>5\u003C\u002Fsup> macOS requires version 14.0 or later.\n\n\n### Install MinerU\n\n#### Install MinerU using pip or uv\n```bash\npip install --upgrade pip\npip install uv\nuv pip install -U \"mineru[all]\"\n```\n\n#### Install MinerU from source code\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU.git\ncd MinerU\nuv pip install -e .[all]\n```\n\n> [!TIP]\n> - `mineru[all]` includes all core features, compatible with Windows \u002F Linux \u002F macOS systems, suitable for most users.\n> - If CUDA acceleration is unavailable after installing on Windows, see the [Windows CUDA acceleration FAQ](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Ffaq\u002F#windows-cuda-acceleration).\n> - If you need to specify the inference framework for the VLM model, or only intend to install a lightweight client on an edge device, please refer to the documentation [Extension Modules Installation Guide](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fquick_start\u002Fextension_modules\u002F).\n\n---\n \n#### Deploy MinerU using Docker\nMinerU provides a convenient Docker deployment method, which helps quickly set up the environment and solve some tricky environment compatibility issues.\n\n> [!TIP]\n> - Docker deployment is only supported on Linux and Windows environments with WSL2 support;\n> - macOS users should refer to the two installation methods above for installation instead of using Docker deployment.\n\nYou can get the [Docker Deployment Instructions](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fquick_start\u002Fdocker_deployment\u002F) in the documentation.\n\n---\n\n### Using MinerU\n\n\nIf your device meets the GPU acceleration requirements in the table above, you can use a simple command line for document parsing:\n```bash\nmineru -p \u003Cinput_path> -o \u003Coutput_path>\n```\nIf your device does not meet the GPU acceleration requirements, you can specify the backend as `pipeline` to run in a pure CPU environment:\n```bash\nmineru -p \u003Cinput_path> -o \u003Coutput_path> -b pipeline\n```\n\n`mineru` currently supports local `PDF`, image, `DOCX`, `PPTX`, and `XLSX` file or directory inputs, and can be used for document parsing through the CLI, API, WebUI, and `mineru-router`. For detailed instructions, please refer to the [Usage Guide](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fusage\u002F).\n\n# FAQ\n\n- If you encounter any issues during usage, you can first check the [FAQ](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Ffaq\u002F) for solutions.  \n- If your issue remains unresolved, you may also use [DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fopendatalab\u002FMinerU) to interact with an AI assistant, which can address most common problems.  \n- If you still cannot resolve the issue, you are welcome to join our community via [Discord](https:\u002F\u002Fdiscord.gg\u002FTdedn9GTXq) or [WeChat](https:\u002F\u002Fmineru.net\u002Fcommunity-portal\u002F?aliasId=3c430f94) to discuss with other users and developers.\n\n# All Thanks To Our Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Fcontrib.rocks\u002Fimage?repo=opendatalab\u002FMinerU\" \u002F>\n\u003C\u002Fa>\n\n# License Information\n\nThis repository is licensed under the [MinerU Open Source License](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fblob\u002Fmaster\u002FLICENSE.md), based on Apache 2.0 with additional conditions.\n\n# Acknowledgments\n\n- [UniMERNet](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FUniMERNet)\n- [TableStructureRec](https:\u002F\u002Fgithub.com\u002FRapidAI\u002FTableStructureRec)\n- [PaddleOCR](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleOCR)\n- [PaddleOCR2Pytorch](https:\u002F\u002Fgithub.com\u002Ffrotms\u002FPaddleOCR2Pytorch)\n- [fast-langdetect](https:\u002F\u002Fgithub.com\u002FLlmKira\u002Ffast-langdetect)\n- [pypdfium2](https:\u002F\u002Fgithub.com\u002Fpypdfium2-team\u002Fpypdfium2)\n- [pdftext](https:\u002F\u002Fgithub.com\u002Fdatalab-to\u002Fpdftext)\n- [pdfminer.six](https:\u002F\u002Fgithub.com\u002Fpdfminer\u002Fpdfminer.six)\n- [pypdf](https:\u002F\u002Fgithub.com\u002Fpy-pdf\u002Fpypdf)\n- [magika](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmagika)\n- [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)\n- [LMDeploy](https:\u002F\u002Fgithub.com\u002FInternLM\u002Flmdeploy)\n\n# Citation\n\n```bibtex\n@article{wang2026mineru2,\n  title={MinerU2. 5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale},\n  author={Wang, Bin and He, Tianyao and Ouyang, Linke and Wu, Fan and Zhao, Zhiyuan and Chu, Tao and Qu, Yuan and Jin, Zhenjiang and Zeng, Weijun and Miao, Ziyang and others},\n  journal={arXiv preprint arXiv:2604.04771},\n  year={2026}\n}\n\n@article{dong2026minerudiffusion,\n  title={MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding},\n  author={Dong, Hejun and Niu, Junbo and Wang, Bin and Zeng, Weijun and Zhang, Wentao and He, Conghui},\n  journal={arXiv preprint arXiv:2603.22458},\n  year={2026}\n}\n\n@article{niu2025mineru2,\n  title={Mineru2. 5: A decoupled vision-language model for efficient high-resolution document parsing},\n  author={Niu, Junbo and Liu, Zheng and Gu, Zhuangcheng and Wang, Bin and Ouyang, Linke and Zhao, Zhiyuan and Chu, Tao and He, Tianyao and Wu, Fan and Zhang, Qintong and others},\n  journal={arXiv preprint arXiv:2509.22186},\n  year={2025}\n}\n\n@article{wang2024mineru,\n  title={Mineru: An open-source solution for precise document content extraction},\n  author={Wang, Bin and Xu, Chao and Zhao, Xiaomeng and Ouyang, Linke and Wu, Fan and Zhao, Zhiyuan and Xu, Rui and Liu, Kaiwen and Qu, Yuan and Shang, Fukai and others},\n  journal={arXiv preprint arXiv:2409.18839},\n  year={2024}\n}\n\n@article{he2024opendatalab,\n  title={Opendatalab: Empowering general artificial intelligence with open datasets},\n  author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua},\n  journal={arXiv preprint arXiv:2407.13773},\n  year={2024}\n}\n```\n\n# Star History\n\n\u003Ca>\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=opendatalab\u002FMinerU&type=Date&theme=dark\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=opendatalab\u002FMinerU&type=Date\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=opendatalab\u002FMinerU&type=Date\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n\n# Links\n- [MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU-Diffusion)\n- [Easy Data Preparation with latest LLMs-based Operators and Pipelines](https:\u002F\u002Fgithub.com\u002FOpenDCAI\u002FDataFlow)\n- [Vis3 (OSS browser based on s3)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FVis3)\n- [LabelU (A Lightweight Multi-modal Data Annotation Tool)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FlabelU)\n- [LabelLLM (An Open-source LLM Dialogue Annotation Platform)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FLabelLLM)\n- [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit)\n- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FOmniDocBench)\n- [Magic-HTML (Mixed web page extraction tool)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002Fmagic-html)\n- [Magic-Doc (Fast speed ppt\u002Fpptx\u002Fdoc\u002Fdocx\u002Fpdf extraction tool)](https:\u002F\u002Fgithub.com\u002FInternLM\u002Fmagic-doc) \n- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https:\u002F\u002Fgithub.com\u002FMigoXLab\u002Fdingo)\n","MinerU 是一个用于将复杂文档（如PDF和Office文档）转换为适合大语言模型使用的Markdown或JSON格式的工具。它利用了先进的OCR、布局分析以及解析技术，能够准确地提取并转换文档中的数据结构与内容，支持多种文件类型包括PDF、DOCX、PPTX和XLSX等。该工具特别适用于需要对大量非结构化文本数据进行预处理以供后续AI应用开发的场景，比如构建基于文档的知识图谱、自动化报告生成系统等。其Python实现使得集成到现有工作流中变得简单高效。",2,"2026-06-11 02:36:36","top_all"]