[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72011":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},72011,"MegaParse","QuivrHQ\u002FMegaParse","QuivrHQ","File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs. ","https:\u002F\u002Fmegaparse.com",null,"Python",7384,420,34,25,0,16,45.47,"Apache License 2.0",false,"main",[23,24,25,26,27],"docx","llm","parser","pdf","powerpoint","2026-06-11 04:05:09","# MegaParse - Your Parser for every type of documents\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FQuivrHQ\u002FMegaParse\u002Fmain\u002Flogo.png\" alt=\"Quivr-logo\" width=\"30%\"  style=\"border-radius: 50%; padding-bottom: 20px\"\u002F>\n\u003C\u002Fdiv>\n\nMegaParse is a powerful and versatile parser that can handle various types of documents with ease. Whether you're dealing with text, PDFs, Powerpoint presentations, Word documents MegaParse has got you covered. Focus on having no information loss during parsing.\n\n## Key Features 🎯\n\n- **Versatile Parser**: MegaParse is a powerful and versatile parser that can handle various types of documents with ease.\n- **No Information Loss**: Focus on having no information loss during parsing.\n- **Fast and Efficient**: Designed with speed and efficiency at its core.\n- **Wide File Compatibility**: Supports Text, PDF, Powerpoint presentations, Excel, CSV, Word documents.\n- **Open Source**: Freedom is beautiful, and so is MegaParse. Open source and free to use.\n\n## Support\n\n- Files: ✅ PDF ✅ Powerpoint ✅ Word\n- Content: ✅ Tables ✅ TOC ✅ Headers ✅ Footers ✅ Images\n\n### Example\n\nhttps:\u002F\u002Fgithub.com\u002FQuivrHQ\u002FMegaParse\u002Fassets\u002F19614572\u002F1b4cdb73-8dc2-44ef-b8b4-a7509bc8d4f3\n\n## Installation\n\nrequired python version >= 3.11\n\n```bash\npip install megaparse\n```\n\n## Usage\n\n1. Add your OpenAI or Anthropic API key to the .env file\n\n2. Install poppler on your computer (images and PDFs)\n\n3. Install tesseract on your computer (images and PDFs)\n\n4. If you have a mac, you also need to install libmagic ```brew install libmagic```\n\nUse MegaParse as it is : \n```python\nfrom megaparse import MegaParse\nfrom langchain_openai import ChatOpenAI\n\nmegaparse = MegaParse()\nresponse = megaparse.load(\".\u002Ftest.pdf\")\nprint(response)\n```\n\n### Use MegaParse Vision\n\n```python\nfrom megaparse.parser.megaparse_vision import MegaParseVision\n\nmodel = ChatOpenAI(model=\"gpt-4o\", api_key=os.getenv(\"OPENAI_API_KEY\"))  # type: ignore\nparser = MegaParseVision(model=model)\nresponse = parser.convert(\".\u002Ftest.pdf\")\nprint(response)\n\n```\n**Note**: The model supported by MegaParse Vision are the multimodal ones such as claude 3.5, claude 4, gpt-4o and gpt-4.\n\n## Use as an API\nThere is a MakeFile for you, simply use :\n```make dev```\nat the root of the project and you are good to go.\n\nSee localhost:8000\u002Fdocs for more info on the different endpoints !\n\n## BenchMark\n\n\u003C!---BENCHMARK-->\n| Parser                        | similarity_ratio |\n| ----------------------------- | ---------------- |\n| megaparse_vision              | 0.87             |\n| unstructured_with_check_table | 0.77             |\n| unstructured                  | 0.59             |\n| llama_parser                  | 0.33             |\n\u003C!---END_BENCHMARK-->\n\n_Higher the better_\n\nNote: Want to evaluate and compare your Megaparse module with ours ? Please add your config in ```evaluations\u002Fscript.py``` and then run ```python evaluations\u002Fscript.py```. If it is better, do a PR, I mean, let's go higher together .\n\n## In Construction 🚧\n- Improve table checker\n- Create Checkers to add **modular postprocessing** ⚙️\n- Add Structured output, **let's get computer talking** 🤖\n\n\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=QuivrHQ\u002FMegaParse&type=Date)](https:\u002F\u002Fstar-history.com\u002F#QuivrHQ\u002FMegaParse&Date)\n","MegaParse 是一个专为大型语言模型（LLM）设计的文件解析工具，能够高效地处理 PDF、Docx 和 PPTx 等多种文档格式。其核心功能包括无信息损失的文档解析、快速高效的处理速度以及广泛的文件兼容性，支持文本、PDF、PowerPoint、Word 文档等多种格式。此外，MegaParse 还具备强大的表格、目录、页眉页脚和图片解析能力，并且是开源免费的。适用于需要将各种类型文档转换为适合 LLM 处理格式的场景，如知识管理、文档自动化处理等。",2,"2026-06-11 03:39:56","high_star"]