[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80518":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":27,"discoverSource":28},80518,"sira","facebookresearch\u002Fsira","facebookresearch","Superintelligent Retrieval Agent (SIRA)",null,"Rust",87,15,2,0,3,6,14,9,54.01,"MIT License",false,"main",[],"2026-06-12 04:01:28","\u003C!-- Copyright (c) Meta Platforms, Inc. and affiliates.\nThis source code is licensed under the MIT license found in the\nLICENSE file in the root directory of this source tree. -->\n# SIRA  \n\n**S**uper**I**ntelligent **R**etrieval **A**gent\n\nSIRA is a multi-stage retrieval pipeline that uses LLMs to enrich both documents and queries, improving BM25 retrieval quality without training. The pipeline consists of five stages: data preparation, BM25 indexing, corpus enrichment (LLM-generated indexing phrases for documents), query expansion (LLM-generated search terms), and LLM-based pointwise reranking. SIRA achieves state-of-the-art results on BEIR benchmarks using only inference-time compute.\n\nPaper: https:\u002F\u002Farxiv.org\u002Fpdf\u002F2605.06647\n## Requirements\n\n- Python >= 3.12\n- CUDA-capable GPU(s) (tested on NVIDIA H100)\n- Rust toolchain (for building the bm25x extension)\n- Conda (recommended for environment management)\n\n## Setup\n\n```bash\n# Create and activate the conda environment\nconda create -n sira312 python=3.12 -y\nconda activate sira312\npip install -e .\n\n# Activate the development sandbox\nsource sandbox.sh\n```\n\n## Quick Start\n\n```bash\n# Run the full pipeline on a single dataset (auto-starts LLM server)\npython scripts\u002Frun_pipeline.py data=scifact server.auto_start=true\n\n# Run on multiple datasets\npython scripts\u002Frun_pipeline.py datasets='[scifact,arguana,fiqa]' server.auto_start=true\n\n# Run specific stages only\npython scripts\u002Frun_pipeline.py data=scifact stages='[enrich_query,rerank]'\n```\n\nSee [scripts\u002FREADME.md](scripts\u002FREADME.md) for the full pipeline documentation, configuration options, and data layout.\n\n## Citation\n\n```bibtex\n@article{yang2026sira,\n  title={Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval},\n  author={Yang, Zeyu and Ma, Qi and Chen, Jason and Shrivastava, Anshumali},\n  journal={arXiv preprint arXiv:2605.06647},\n  year={2026}\n}\n```\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\nThe `src\u002Fsira\u002Fbm25x\u002F` directory contains code derived from [bm25x](https:\u002F\u002Fgithub.com\u002Flightonai\u002Fbm25x) by LightOn, licensed under Apache 2.0. See the [NOTICE](src\u002Fsira\u002Fbm25x\u002FNOTICE) file for details.\n","SIRA是一个多阶段检索管道，利用大型语言模型（LLM）增强文档和查询，以提升BM25检索质量，无需额外训练。其核心功能包括数据准备、BM25索引、语料库丰富（使用LLM生成索引短语）、查询扩展（通过LLM生成搜索词）以及基于LLM的逐点重排序。该项目采用Rust编写，并结合Python环境进行管理与运行，适用于需要高质量文本检索的场景，如科研文献查找、企业知识库构建等。SIRA在BEIR基准测试中取得了领先的结果，证明了其在信息检索领域的高效性和先进性。","2026-06-11 04:01:04","CREATED_QUERY"]