[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80352":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":13,"stars7d":13,"stars30d":13,"stars90d":13,"forks30d":13,"starsTrendScore":13,"compositeScore":14,"rankGlobal":9,"rankLanguage":9,"license":15,"archived":16,"fork":16,"defaultBranch":17,"hasWiki":16,"hasPages":16,"topics":18,"createdAt":9,"pushedAt":9,"updatedAt":19,"readmeContent":20,"aiSummary":21,"trendingCount":13,"starSnapshotCount":13,"syncStatus":22,"lastSyncTime":23,"discoverSource":24},80352,"augur","willgitdata\u002Faugur","willgitdata","Adaptive retrieval orchestration layer for RAG — picks vector \u002F keyword \u002F hybrid \u002F rerank per query with explainable traces",null,"TypeScript",58,1,0,0.9,"MIT License",false,"main",[],"2026-06-12 02:04:01","\u003Cpicture>\n  \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"assets\u002Faugur-wordmark-dark.svg\">\n  \u003Cimg src=\"assets\u002Faugur-wordmark-light.svg\" alt=\"Augur\">\n\u003C\u002Fpicture>\n\n###### Named after the ancient Roman augurs who interpreted signs to foresee the best path forward. To augur is to predict, and this package predicts the optimal retrieval method for your use case.\n\nAdaptive retrieval orchestration for RAG and semantic search in TypeScript. Augur sits on top of your vector DB (pgvector, Pinecone, Turbopuffer, or in-memory) and picks per query which strategy to run: vector, BM25 keyword, weighted hybrid, or vector-then-cross-encoder rerank. The routing decision and timings come back in every search response.\n\n```ts\nimport { Augur, LocalEmbedder } from \"@augur-rag\u002Fcore\";\n\nconst augr = new Augur({ embedder: new LocalEmbedder() });\n\nawait augr.index([\n  { id: \"1\", content: \"PostgreSQL supports vector indexing via pgvector.\" },\n  { id: \"2\", content: \"Pinecone is a managed vector database.\" },\n]);\n\nconst { results, trace } = await augr.search({\n  query: \"How do I store vectors in Postgres?\",\n});\n\n\u002F\u002F results[0].chunk.documentId === \"1\"\n\u002F\u002F trace.decision.strategy === \"vector\"\n\u002F\u002F trace.decision.reasons === [\"natural-language question → semantic search\", ...]\n```\n\n## Why\n\nMost RAG pipelines pick one retrieval strategy and run it for everything. Pure vector misses exact matches (error codes, SKUs, named entities). Pure BM25 misses paraphrases. Hybrid is better but a fixed mix is wrong for whichever side the current query needs less of. Augur routes per query from cheap heuristics on query signals, with the cross-encoder reranker as the final precision stage. When the auto choice is wrong, the trace shows you why.\n\n## Performance\n\nOn-device stack: `Xenova\u002Fall-MiniLM-L6-v2` (22 MB embedder) + `Xenova\u002Fms-marco-MiniLM-L-6-v2` (22 MB cross-encoder). 44 MB total. No network at query time, no API keys, no per-corpus tuning.\n\nBEIR NDCG@10:\n\n| Dataset                          |    Auto |  BM25 | BM25+rerank | Contriever | ColBERTv2 | BGE-large (1.3GB) |\n| -------------------------------- | ------: | ----: | ----------: | ---------: | --------: | ----------------: |\n| SciFact (scientific claims)      |   0.707 | 0.665 |       0.688 |      0.677 |     0.694 |             0.745 |\n| FiQA (finance Q&A, 57K docs)     |   0.345 | 0.236 |       0.347 |      0.329 |     0.356 |             0.450 |\n| NFCorpus (medical literature)    |   0.324 | 0.325 |       0.350 |      0.328 |     0.339 |             0.380 |\n\nAuto numbers measured by the [`Eval matrix`](.github\u002Fworkflows\u002Feval.yml) workflow. Baseline columns are the published numbers from the BEIR, BGE, E5, and ColBERTv2 papers. Same router across all three corpora, no per-dataset tuning. Swap in BGE-large as the embedder if you want to match the 1.3 GB column.\n\n## Adapters\n\n| Adapter             | Capabilities                                          |\n| ------------------- | ----------------------------------------------------- |\n| `InMemoryAdapter`   | Zero-dep, BM25 + brute-force vector. Dev \u002F small datasets. |\n| `PgVectorAdapter`   | Postgres + `pgvector`. Vector + tsvector + RRF hybrid. |\n| `PineconeAdapter`   | Pinecone REST. Vector only.                           |\n| `TurbopufferAdapter`| Native vector + BM25 + hybrid.                        |\n\nA custom adapter is five methods. See [`examples\u002Fcustom-adapter`](.\u002Fexamples\u002Fcustom-adapter\u002Findex.ts).\n\n## Install\n\n```bash\nnpm install @augur-rag\u002Fcore @huggingface\u002Ftransformers\n```\n\n`@huggingface\u002Ftransformers` is an optional peer needed for `LocalEmbedder` and `LocalReranker`. Skip it if you're wiring a hosted embedder via the one-method `Embedder` interface.\n\n## Quick start\n\n```bash\npnpm install && pnpm build\npnpm --filter example-basic-search start\n\n# Or the HTTP server:\ndocker compose up\n# http:\u002F\u002Flocalhost:3001\u002Fdocs   (Swagger UI)\n```\n\n## Docs\n\n- [docs\u002Farchitecture.md](.\u002Fdocs\u002Farchitecture.md) — how the pieces fit together\n- [docs\u002Fexamples.md](.\u002Fdocs\u002Fexamples.md) — hosted embedders, contextual retrieval, pgvector, MMR, trace inspection\n- [CHANGELOG.md](.\u002FCHANGELOG.md)\n\n## License\n\nMIT.\n","Augur 是一个用于检索增强生成（RAG）和语义搜索的自适应检索编排层，它能够根据每个查询选择最合适的检索策略（向量、关键词、混合或重排序），并提供可解释的追踪信息。项目使用 TypeScript 编写，支持多种向量数据库如 pgvector 和 Pinecone，并且通过本地嵌入模型实现了无网络依赖的高效检索。其核心功能在于智能地为每条查询选择最优的检索路径，包括基于自然语言处理的语义搜索以及针对特定场景优化的混合检索方案。Augur 特别适合需要灵活高效信息检索的应用场景，例如构建知识库查询系统、客服聊天机器人等，能够显著提升检索质量和用户体验。",2,"2026-06-11 04:00:25","CREATED_QUERY"]