[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-85981":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":8,"language":10,"languages":8,"totalLinesOfCode":8,"stars":11,"forks":12,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":8,"pushedAt":24,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":14,"starSnapshotCount":14,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},85981,"benchpublic","primitive-bench\u002Fbenchpublic","primitive-bench",null,"https:\u002F\u002Fwww.primitivebench.com\u002F","Python",116,3,7,0,5,65,75,75.31,"Apache License 2.0",false,"main",true,[],"2026-06-20 18:43:50","2026-06-20 09:21:09","\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Fassets\u002Fprimitive-bench-logo.svg\" alt=\"Primitive Bench\" width=\"110\">\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">Primitive Bench\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cb>The vendor-neutral benchmark for AI infrastructure primitives.\u003C\u002Fb>\u003Cbr>\n  OCR · web search · vector DBs · rerankers · retrieval · extraction · chunking · crawling · memory\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"LICENSE\">\u003Cimg alt=\"License: Apache 2.0\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg\">\u003C\u002Fa>\n  \u003Ca href=\"CONTRIBUTING.md\">\u003Cimg alt=\"PRs welcome\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen.svg\">\u003C\u002Fa>\n  \u003Cimg alt=\"Python 3.11+\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11%2B-blue.svg\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.primitivebench.com\u002F\">Website\u003C\u002Fa> ·\n  \u003Ca href=\"apps\u002Fdocs\">Docs\u003C\u002Fa> ·\n  \u003Ca href=\"apps\u002Fdocs\u002Fmethodology\u002Fv3.md\">Methodology\u003C\u002Fa> ·\n  \u003Ca href=\"CONTRIBUTING.md\">Contributing\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fprimitive-bench\">LinkedIn\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n## The vendor-neutral benchmark for AI infrastructure primitives\n\nModern AI products are assembled from **infrastructure primitives** — OCR, web search, vector\ndatabases, rerankers, retrieval, extraction, chunking, crawling, memory. Choosing the right one is\nmostly folklore today. **Primitive Bench turns that choice into evidence.**\n\n> ### \"One winner is a lie.\"\n> No primitive wins every slice. We publish **per-slice, per-constraint** results with confidence\n> intervals and statistical separability — never a single global leaderboard.\n\nThis repo is the **public trust anchor**: the harness engine, the statistics library, the adapter\nSDK, the per-primitive eval packages, and the **public dev splits**. Held-out golden answers never\nlive here — they sit behind the private eval server, so the scores stay honest.\n\n## What we benchmark\n\n| Primitive | Package | What it measures | Status |\n|---|---|---|---|\n| Web search | `eval-websearch` | hit@k against golden-URL equivalence classes, sliced by intent | ✅ Live |\n| Extraction | `eval-extraction` | token survival of clean main-content extraction | ✅ Live |\n| OCR | `eval-ocr` | text fidelity across document types | 🚧 Planned |\n| Vector DBs | `eval-vectordb` | recall \u002F latency \u002F cost across index configs | 🚧 Planned |\n| Rerankers | `eval-reranker` | nDCG \u002F MAP uplift over first-stage retrieval | 🚧 Planned |\n| Retrieval | `eval-retrieval` | nDCG@k, MAP@k, MRR@k, Recall@k | 🚧 Planned |\n| Chunking | `eval-chunking` | downstream retrieval quality by chunk strategy | 🚧 Planned |\n| Crawling | `eval-crawl` | coverage & freshness of fetched content | 🚧 Planned |\n| Memory | — | long-horizon recall (LoCoMo-style) | 🗺️ Roadmap |\n\nFilling in a 🚧 is the highest-impact **first contribution** — see [`CONTRIBUTING.md`](CONTRIBUTING.md).\n\n## Why Primitive Bench\n\n- **No fake #1.** A winner is named for a slice only when it's **statistically separable** from the\n  runner-up (McNemar *p* \u003C α, non-overlapping CIs); otherwise we publish a tie band.\n- **Real evals, not reviews.** Every claim is backed by a **canonical, citable** statistic — McNemar,\n  Wilson intervals, seeded bootstrap, Bradley-Terry \u002F Elo.\n- **Reproducible by anyone.** Deterministic seeds, pinned versions, and **public dev splits** reproduce\n  public runs bit-for-bit.\n- **Neutral arbiter.** No pay-to-rank. Three-tier ground truth (verified-external,\n  authoritative-registry, sentinel-planted) with canary markers for contamination detection.\n\n## Quickstart\n\n> Packages aren't on PyPI yet — run from a clone for now.\n\n```bash\nuv sync\nuv run bench run --primitive ocr --config configs\u002Focr.yaml\nuv run bench view .\u002Fruns\u002F\u003Crun_id>\n```\n\nThe `bench` CLI scaffolds a config (`bench init`), runs an eval (`bench run`), summarizes slices with\nseparability badges (`bench view`), and submits to the held-out eval server for scores only\n(`bench submit`).\n\n## How it works\n\nPrimitive Bench uses the proven harness shape — **dataset → Task → Adapter → Scorer → result schema**\n(converging with EleutherAI lm-eval, UK AISI Inspect, and Stanford HELM).\n\n**The Gate.** `bench-schemas` is the **frozen contract** (`v0.1.0`): every package imports types only\nfrom it and writes only files it owns — no shared mutable state. That boundary is what lets the build\nlanes run in parallel without colliding. See [`apps\u002Fdocs\u002FDECISIONS.md`](apps\u002Fdocs\u002FDECISIONS.md) (D-03)\nand the [methodology](apps\u002Fdocs\u002Fmethodology\u002Fv3.md).\n\n## Repo layout\n\n```\npackages\u002F\n  bench-schemas\u002F   # THE FROZEN CONTRACT — RunManifest, ItemResult, SliceResult, ScorerOutput, AdapterSpec\n  bench-core\u002F      # harness engine: deterministic seeding, run\u002Fmanifest, per-run dirs\n  bench-stats\u002F     # McNemar, Wilson, bootstrap CIs, hit@k, nDCG\u002FMAP\u002FMRR, Bradley-Terry\n  bench-adapters\u002F  # provider\u002Fprimitive adapter SDK (lm-eval registry pattern)\n  eval-*\u002F          # one package per primitive: public golden dev set + scorer + slice defs\napps\u002F\n  cli\u002F             # the `bench` CLI: init \u002F run \u002F view \u002F submit\n  docs\u002F            # methodology + DECISIONS.md\ngolden-sets-public\u002F  # PUBLIC dev splits only (canary-marked). Held-out answers NEVER here.\n```\n\n## Contributing\n\nWe love contributions big and small — a new vendor adapter, a slice that separates two adapters, or a\nwhole stubbed primitive. Start with [`CONTRIBUTING.md`](CONTRIBUTING.md); the best first issue is\nimplementing one of the 🚧 verticals using `eval-websearch` \u002F `eval-extraction` as the template.\n\n## License\n\n- **Code:** [Apache-2.0](LICENSE).\n- **Public datasets** under [`golden-sets-public\u002F`](golden-sets-public\u002F): [CC-BY-4.0](golden-sets-public\u002FLICENSE-DATA).\n- Third-party attribution is in [`NOTICE`](NOTICE). We learn from lm-evaluation-harness, Inspect, HELM,\n  ann-benchmarks, VectorDBBench, and OmniDocBench — and we do **not** vendor GPL\u002Fcommercial-dual code.\n\n---\n\n\u003Cp align=\"center\">\n  Built by the \u003Cb>Primitive Bench\u003C\u002Fb> team ·\n  \u003Ca href=\"https:\u002F\u002Fwww.primitivebench.com\u002F\">primitivebench.com\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fprimitive-bench\">LinkedIn\u003C\u002Fa>\n\u003C\u002Fp>\n","Primitive Bench 是一个针对AI基础设施原语的中立基准测试平台，支持OCR、网络搜索、向量数据库、重排序器、检索、提取、分块、爬虫和记忆等功能。其核心功能在于提供基于证据的选择依据，通过统计学方法评估不同工具在特定场景下的表现，并非单一全局排行榜。技术特点包括Python 3.11+开发环境、开源适配器SDK以及公开的数据集分割。适合需要客观比较各类AI组件性能的企业或研究者使用，在选择最适合特定需求的技术栈时提供可靠参考。",2,"2026-06-21 04:02:12","CREATED_QUERY"]