[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11426":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},11426,"geometry-of-consolidation","niashwin\u002Fgeometry-of-consolidation","niashwin","NeurIPS 2026 paper: The Geometry of Consolidation — follow-up to HIDE and No-Escape.",null,"Python",110,8,44,0,4,2.86,"MIT License",false,"main",[],"2026-06-12 02:02:31","# The Geometry of Consolidation\n\n> **A scientific paper about a geometric law.** When a semantic memory replaces $n$ cluster members with $m\u003Cn$ representatives, what geometry decides whether retrieval still recovers the members? We prove a single inequality, measure it across six encoders and seven corpora, and derive the consolidation algorithm it implies.\n\n\u003Cp align=\"left\">\n  \u003Ca href=\"paper\u002Farxiv\u002Fmain.pdf\">\u003Cimg alt=\"arXiv version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpaper-arXiv_version-1f2937?style=flat-square\">\u003C\u002Fa>\n  \u003Ca href=\"paper\u002Fneurips\u002Fmain.pdf\">\u003Cimg alt=\"NeurIPS 2026 submission\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpaper-NeurIPS_2026_submission-1f2937?style=flat-square\">\u003C\u002Fa>\n  \u003Ca href=\"paper\u002Fneurips\u002Fsupp.pdf\">\u003Cimg alt=\"NeurIPS supplementary\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpaper-NeurIPS_supplementary-1f2937?style=flat-square\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg alt=\"MIT License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-4b5563?style=flat-square\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n## The trilogy\n\nThis paper is the third in a sequence on the geometry of meaning-organised memory. Each one asks the same question from a different angle: *what does it cost when a memory organises itself by meaning?*\n\n| Paper | One-line claim |\n|---|---|\n| **[The Price of Meaning](https:\u002F\u002Farxiv.org\u002Fabs\u002Fsubmit\u002F7412286)** · Barman et al., 2026 | Under mild axioms, any memory that organises itself by meaning must obey a strict no-escape trade-off between capacity and identity. |\n| **[The Geometry of Forgetting](https:\u002F\u002Farxiv.org\u002Fabs\u002Fsubmit\u002F7411865)** · Barman et al., 2026 | The shape of that trade-off is geometric, not informational: a cluster's *mean within-cluster distance* $\\bar d$ and *effective dimension* $d_{\\text{eff}}$ decide when retrieval must fail. |\n| **The Geometry of Consolidation** · this repo | The same geometry governs *compression*. Replacing $n$ cluster members with $m\u003Cn$ representatives does not escape the trap — it trades one failure mode for another along the same geometric axis. |\n\n---\n\n## The law (one sentence)\n\nFor any consolidator, whenever the retrieval cap half-angle $\\theta' = 1-\\theta$ is smaller than the mean within-cluster cosine distance $\\bar d$, the identity error satisfies\n\n$$\\varepsilon_{\\text{id}} \\;\\geq\\; 1 - c_1 \\, m \\, (\\theta' \u002F \\bar d)^{d_{\\text{eff}}\u002F2}$$\n\nwhere $c_1$ is an absolute constant and $d_{\\text{eff}}$ is the local participation-ratio dimension. The law separates a **tight regime** ($\\bar d \u003C \\theta'$, identity cheap) from a **spread regime** ($\\bar d \\geq \\theta'$, identity forced to collapse under any non-trivial compression).\n\nThe papers in `paper\u002F` give the proof, the experiments, and the algorithm it implies. The rest of this repo is the evidence.\n\n---\n\n## What's in this repository\n\n```\npaper\u002F\n  arxiv\u002F        Long-form arXiv version (48 pp): full exposition, proofs, intuition panels.\n  neurips\u002F      NeurIPS 2026 submission: 14-pp main + 11-pp supplementary.\n\ngac\u002F            Geometry-Aware Consolidation — the algorithm the law implies.\n                pip-installable; one interface, five baselines for comparison.\n\nexperiments\u002F    One script per experimental block (E1–E9 in the paper).\n                Each block is self-contained, resumable, and reproduces one figure.\n\nresults\u002F        Raw parquet outputs from each E-block, plus c_1 calibration JSON.\n                These are the numbers every figure and every table in the paper reads from.\n\nscripts\u002F\n  make_figures.py       Regenerates every PDF figure in the paper from results\u002F.\n  calibrate_c1.py       Reconstructs the tight \u002F spread \u002F global c_1 table.\n\nmodal_app\u002F      Modal orchestrator (H100 pool, resumable, cost-tracked) for reruns.\ndata\u002F           Corpus build scripts: Wikipedia, MS MARCO, ArXiv, NQ, HotpotQA, DRM.\ntests\u002F          Unit tests for GAC and the checkpointing layer.\n```\n\n---\n\n## The algorithm, in a line\n\n**GAC** (Geometry-Aware Consolidation) routes each cluster to the consolidator the geometry calls for: centroid when $\\bar d \u003C \\theta'$, residual-budgeted medoid otherwise. It Pareto-dominates centroid, medoid, importance-weighted, selective-pruning, PQ, OPQ, LSH, and HNSW-prune across 6 encoders × 7 corpora × 10K→1M scale.\n\n```python\nfrom gac import GACConsolidator, consolidate\n\n# The geometry-aware router: centroid when the cluster is tight,\n# residual-budgeted medoid when it is spread.\nstore = consolidate(embeddings, labels, strategy=\"gac\", theta=0.85)\nreps  = store.vectors   # the m \u003C n L2-normalised representatives the law says you can afford\n```\n\nSee [`paper\u002Farxiv\u002Fmain.pdf`](paper\u002Farxiv\u002Fmain.pdf) §7 for the algorithm and §8 for the Pareto frontier.\n\n---\n\n## Reproducing the paper\n\nEvery number in the paper is reproducible from `results\u002F`. Every figure is reproducible from the scripts.\n\n```bash\npip install -e .\n\n# Reproduce every figure in the paper (reads from results\u002F, writes to paper\u002F*\u002Ffigs\u002F).\npython scripts\u002Fmake_figures.py --out paper\u002Farxiv\u002Ffigs\npython scripts\u002Fmake_figures.py --out paper\u002Fneurips\u002Ffigs\n\n# Reconstruct the c_1 calibration table (tight \u002F spread \u002F global, with coverage).\npython scripts\u002Fcalibrate_c1.py\n\n# Rebuild the PDFs.\ncd paper\u002Farxiv  && pdflatex main && pdflatex main\ncd paper\u002Fneurips && pdflatex main && bibtex main && pdflatex main && pdflatex main\ncd paper\u002Fneurips && pdflatex supp && pdflatex supp && pdflatex supp\n```\n\nTo rerun experiments from scratch on Modal (H100 pool, ~$30 for the full sweep):\n\n```bash\nmodal run scripts\u002Fh100_concurrency_probe.py       # verify quota\nmodal run modal_app\u002Fapp.py::run_all               # everything, resumable\nmodal run modal_app\u002Fapp.py::run_one --exp e1      # single block\n```\n\n---\n\n## Citation\n\n```bibtex\n@article{vangara2026consolidation,\n  title   = {The Geometry of Consolidation},\n  author  = {Vangara, Anirudh Bharadwaj and Gopinath, Ashwin},\n  year    = {2026},\n  note    = {Submitted to NeurIPS 2026. arXiv version in this repository.}\n}\n```\n\nThe sister papers:\n\n```bibtex\n@article{barman2026price,\n  title  = {The Price of Meaning},\n  author = {Barman and {collaborators}},\n  year   = {2026},\n  note   = {arXiv:submit\u002F7412286}\n}\n\n@article{barman2026forgetting,\n  title  = {The Geometry of Forgetting},\n  author = {Barman and {collaborators}},\n  year   = {2026},\n  note   = {arXiv:submit\u002F7411865}\n}\n```\n\n---\n\n## License & contact\n\nMIT (see [`LICENSE`](LICENSE)).\n\n- **Anirudh Bharadwaj Vangara** · Sentra; University of Waterloo Computer Engineering\n- **Ashwin Gopinath** · Sentra; MIT Mechanical Engineering · [ashwin@sentra.app](mailto:ashwin@sentra.app) (corresponding)\n","该项目研究了语义记忆在用较少代表替换多个集群成员时，决定检索能否恢复原始成员的几何法则。其核心功能包括证明了一个关键不等式，并通过六个编码器和七个语料库测量了该不等式的有效性，进而推导出相应的整合算法。技术上采用Python实现，具备严谨的数学证明与广泛的实验验证。适合于需要理解或优化基于意义组织的记忆系统性能的研究场景，如自然语言处理中的信息压缩与检索任务。",2,"2026-06-11 03:31:51","CREATED_QUERY"]