[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72488":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},72488,"PIKE-RAG","microsoft\u002FPIKE-RAG","microsoft","PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation","https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.11551",null,"Python",2387,224,27,8,0,3,60.86,"MIT License",false,"main",true,[24,25,26,27],"domain-specific","industrial-ai","knowledge-extraction","rag","2026-06-12 04:01:06","\u003Cp align=\"center\">\r\n    \u003Cimg src=\".\u002Fdocs\u002Fsource\u002Fimages\u002Flogo\u002FPIKE-RAG_horizontal_black-font.svg\" alt=\"PIKE-RAG\" style=\"width: 80%; max-width: 100%; height: auto;\">\r\n\u003C\u002Fp>\r\n\r\n\u003Cp align=\"center\">\r\n    \u003Ca href=\"https:\u002F\u002Fpike-rag.azurewebsites.net\u002F\">🌐Online Demo\u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.11551\">📊Technical Report\u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Fopenreview.net\u002Fpdf?id=PAjCdkkNaU\">📑ICML 2025 Paper\u003C\u002Fa>\r\n\u003C\u002Fp>\r\n\r\n[![License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fmicrosoft\u002FPIKE-RAG)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPIKE-RAG\u002Fblob\u002Fmain\u002FLICENSE)\r\n[![CodeQL](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPIKE-RAG\u002Factions\u002Fworkflows\u002Fgithub-code-scanning\u002Fcodeql\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPIKE-RAG\u002Factions\u002Fworkflows\u002Fgithub-code-scanning\u002Fcodeql)\r\n[![Release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fmicrosoft\u002FPIKE-RAG)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPIKE-RAG\u002Freleases)\r\n[![ReleaseDate](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease-date-pre\u002Fmicrosoft\u002FPIKE-RAG)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPIKE-RAG\u002Freleases)\r\n[![Commits](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommits-since\u002Fmicrosoft\u002FPIKE-RAG\u002Flatest\u002Fmain)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPIKE-RAG\u002Fcommits\u002Fmain)\r\n[![Pull Requests](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-pr\u002Fmicrosoft\u002FPIKE-RAG)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPIKE-RAG\u002Fpulls)\r\n[![Issues](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fmicrosoft\u002FPIKE-RAG)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPIKE-RAG\u002Fissues)\r\n\r\n# PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation\r\n\r\n## Why PIKE-RAG?\r\n\r\nIn recent years, Retrieval Augmented Generation (RAG) systems have made significant progress in extending the capabilities of Large Language Models (LLM) through external retrieval. However, these systems still face challenges in meeting the complex and diverse needs of real-world industrial applications. Relying solely on direct retrieval is insufficient for extracting deep domain-specific knowledge from professional corpora and performing logical reasoning. To address this issue, we propose the PIKE-RAG (sPecIalized KnowledgE and Rationale Augmented Generation) method, which focuses on extracting, understanding, and applying domain-specific knowledge while building coherent reasoning logic to gradually guide LLMs toward accurate responses.\r\n\r\n\u003Cp align=\"center\">\r\n    \u003Cimg src=\"docs\u002Fsource\u002Fimages\u002Freadme\u002Fpipeline.png\" alt=\"Overview of PIKE-RAG Framework\" style=\"width: 80%; max-width: 100%; height: auto;\">\r\n\u003C\u002Fp>\r\n\r\nPIKE-RAG framework mainly consists of several basic modules, including document parsing, knowledge extraction, knowledge storage, knowledge retrieval, knowledge organization, knowledge-centric reasoning, and task decomposition and coordination. By adjusting the submodules within the main modules, it is possible to achieve RAG systems that focus on different capabilities to meet the diverse needs of real-world scenarios.\r\n\r\nFor example, in case *patient's historical medical records searching*, it focuses on the *factual information retrieval capability*. The main challenges are that (1) the understanding and extraction of knowledge are often hindered by inappropriate knowledge segmentation, disrupting semantic coherence, leading to a complex and inefficient retrieval process; (2) commonly used embedding-based knowledge retrieval is limited by embedding models' ability to align professional terms and aliases, reducing system accuracy. With PIKE-RAG, we can improve the accuracy of knowledge extraction and retrieval by using context-aware segmentation techniques, automatic term label alignment techniques, and multi-granularity knowledge extraction methods during the knowledge extraction process, thereby enhancing factual information retrieval capability, as shown in the pipeline below.\r\n\r\n\u003Cp align=\"center\">\r\n    \u003Cimg src=\"docs\u002Fsource\u002Fimages\u002Freadme\u002FL1_pipeline.png\" alt=\"A Pipeline Focusing on Factual Information Retrieval\" style=\"width: 80%; max-width: 100%; height: auto;\">\r\n\u003C\u002Fp>\r\n\r\nFor complex task like *reasonable treatment plans and coping measures suggestions for patients*, it requires more advanced capabilities: strong domain-specific knowledge are required to accurately understand the task and sometimes reasonably decompose it; advanced data retrieval, processing and organization techniques are also required for potential tendency prediction; while multi-agents planning will also be useful to take considerations of both creativity and reliance. In such case, a richer pipeline below can be initialized to achieve this.\r\n\r\n\u003Cp align=\"center\">\r\n    \u003Cimg src=\"docs\u002Fsource\u002Fimages\u002Freadme\u002FL4_pipeline.png\" alt=\"A Pipeline Focusing on Fact-based Innovation and Generation\" style=\"width: 80%; max-width: 100%; height: auto;\">\r\n\u003C\u002Fp>\r\n\r\nIn public benchmark tests, PIKE-RAG demonstrated excellent performance on several multi-hop question answering datasets such as HotpotQA, 2WikiMultiHopQA, and MuSiQue. Compared to existing benchmark methods, PIKE-RAG excelled in metrics like accuracy and F1 score. On the HotpotQA dataset, PIKE-RAG achieved an accuracy of 87.6%, on 2WikiMultiHopQA it reached 82.0%, and on the more challenging MuSiQue dataset, it achieved 59.6%. These results indicate that PIKE-RAG has significant advantages in handling complex reasoning tasks, especially in scenarios that require integrating multi-source information and performing multi-step reasoning.\r\n\r\nPIKE-RAG has been tested and significantly improved question answering accuracy in fields such as industrial manufacturing, mining, and pharmaceuticals. In the future, we will continue to explore its application in more fields. Additionally, we will continue to explore other forms of knowledge and logic and their optimal adaptation to specific scenarios.\r\n\r\n## For More Details\r\n\r\n- 📊 [Technical Report](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.11551) will illustrate the industrial RAG problem classification, introduce the main components in PIKE-RAG, and show some experimental results in public benchmarks.\r\n- 🌐 [Online Demo](https:\u002F\u002Fpike-rag.azurewebsites.net\u002F) is a show-case of our Knowledge-Aware decomposition pipeline for L2 RAG task.\r\n- 📑 [ICML 2025 Paper](https:\u002F\u002Fopenreview.net\u002Fpdf?id=PAjCdkkNaU) From Complex to Atomic: Enhancing Augmented Generation via Knowledge-Aware Dual Rewriting and Reasoning.\r\n\r\n## Quick Start\r\n\r\n1. Clone this repo and set up the Python environment, refer to [this document](docs\u002Fguides\u002Fenvironment.md);\r\n2. Create a `.env` file to save your endpoint information (and some other environment variables if needed), refer to [this document](docs\u002Fguides\u002Fenv_file.md);\r\n3. Modify the *yaml config* files and try the scripts under *examples\u002F*, refer to [this document](docs\u002Fguides\u002Fexamples.md);\r\n4. Build up your own pipeline and\u002For add your own components!\r\n\r\n🚀 Document ready [here](docs\u002Fguides\u002Fmusique_example.md) for quick re-producing experiments on MuSiQue as shown in the technical report!\r\n\r\n## Contributing\r\n\r\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\r\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\r\nthe rights to use your contribution. For details, visit https:\u002F\u002Fcla.opensource.microsoft.com.\r\n\r\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\r\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\r\nprovided by the bot. You will only need to do this once across all repos using our CLA.\r\n\r\nThis project has adopted the [Microsoft Open Source Code of Conduct](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F).\r\nFor more information see the [Code of Conduct FAQ](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F) or\r\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\r\n\r\n## Trademarks\r\n\r\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft\r\ntrademarks or logos is subject to and must follow\r\n[Microsoft's Trademark & Brand Guidelines](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Flegal\u002Fintellectualproperty\u002Ftrademarks\u002Fusage\u002Fgeneral).\r\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\r\nAny use of third-party trademarks or logos are subject to those third-party's policies.\r\n\r\n","PIKE-RAG 是一个专注于领域特定知识和增强生成的框架，旨在通过外部检索扩展大型语言模型的能力。其核心功能包括文档解析、知识提取与存储、知识检索及组织、以知识为中心的推理等，能够从专业语料库中抽取深层次的知识，并构建连贯的推理逻辑，逐步引导语言模型生成准确的回答。该框架采用模块化设计，用户可根据实际需求调整各子模块，以适应不同场景下的能力要求。PIKE-RAG 特别适合于需要处理复杂且多样化的工业级应用场合，如医疗记录搜索中的事实信息检索任务。",2,"2026-06-11 03:42:16","high_star"]