[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74049":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},74049,"Engram","deepseek-ai\u002FEngram","deepseek-ai","Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models","",null,"Python",4448,340,41,14,0,5,11,58,15,29.6,"Apache License 2.0",false,"main",[],"2026-06-12 02:03:21","\u003C!-- markdownlint-disable first-line-h1 -->\n\u003C!-- markdownlint-disable html -->\n\u003C!-- markdownlint-disable no-duplicate-header -->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Flogo.svg?raw=true\" width=\"60%\" alt=\"DeepSeek-V3\" \u002F>\n\u003C\u002Fdiv>\n\u003Chr>\n\u003Cdiv align=\"center\" style=\"line-height: 1;\">\n  \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\">\u003Cimg alt=\"Homepage\"\n    src=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Fbadge.svg?raw=true\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fchat.deepseek.com\u002F\">\u003Cimg alt=\"Chat\"\n    src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\">\u003Cimg alt=\"Hugging Face\"\n    src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white\"\u002F>\u003C\u002Fa>\n  \u003Cbr>\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTc7c45Zzu5\">\u003Cimg alt=\"Discord\"\n    src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Fqr.jpeg?raw=true\">\u003Cimg alt=\"Wechat\"\n    src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fdeepseek_ai\">\u003Cimg alt=\"Twitter Follow\"\n    src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-deepseek_ai-white?logo=x&logoColor=white\"\u002F>\u003C\u002Fa>\n  \u003Cbr>\n  \u003Ca href=\"LICENSE\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache 2.0-f5de53?&color=f5de53\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Cbr>\n\u003C\u002Fdiv>\n\n## 1. Introduction\n\nThis repository contains the official implementation for the paper: **[Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models](Engram_paper.pdf)**.\n\n> **Abstract:** While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup. To address this, we explore **conditional memory** as a complementary sparsity axis, instantiated via **Engram**, a module that modernizes classic $N$-gram embeddings for $\\mathcal{O}(1)$ lookup.\n\n**Key Contributions:**\n- **Sparsity Allocation:** We formulate the trade-off between neural computation (MoE) and static memory (Engram), identifying a U-shaped scaling law that guides optimal capacity allocation.\n- **Empirical Verification:** Under strict iso-parameter and iso-FLOPs constraints, the Engram-27B model demonstrates consistent improvements over MoE baselines across knowledge, reasoning, code and math domains.\n- **Mechanistic Analysis:** Our analysis suggests that Engram relieves early layers from static pattern reconstruction, potentially preserving effective depth for complex reasoning.\n- **System Efficiency:** The module employs deterministic addressing, enabling the offloading of massive embedding tables to host memory with minimal inference overhead.\n\n\n## 2. Architecture\n\nThe Engram module augments the backbone by retrieving static $N$-gram memory and fusing it with dynamic hidden states. The architecture is shown below ([drawio provided](drawio\u002FEngram.drawio)):\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"75%\" src=\"figures\u002Farch.png\" alt=\"Engram Architecture\">\n\u003C\u002Fp>\n\n## 3. Evaluation\n\n### Scaling Law\n\u003Cp align=\"center\">\n  \u003Cimg width=\"90%\" src=\"figures\u002Fscaling_law.png\" alt=\"Scaling Law\">\n\u003C\u002Fp>\n\n---\n\n### Large Scale Pre-training\n\u003Cp align=\"center\">\n  \u003Cimg width=\"80%\" src=\"figures\u002F27b_exp_results.png\" alt=\"Pre-training Results\">\n\u003C\u002Fp>\n\n---\n\n### Long-context Training\n\u003Cp align=\"center\">\n  \u003Cimg width=\"80%\" src=\"figures\u002Flong_context_results.png\" alt=\"Long Context Results\">\n\u003C\u002Fp>\n\n\n## 4. Case Study of Engram\n\u003Cp align=\"center\">\n  \u003Cimg width=\"80%\" src=\"figures\u002Fcase.png\" alt=\"Long Context Results\">\n\u003C\u002Fp>\n\n## 5. Quick Start\n\nWe recommend using Python 3.8+ and PyTorch.\n```bash\npip install torch numpy transformers sympy\n```\nWe provide a standalone implementation to demonstrate the core logic of the Engram module:\n```bash\npython engram_demo_v1.py\n```\n\n> ⚠️ **Note:** The provided code is a demonstration version intended to illustrate the data flow. It mocks standard components (like Attention\u002FMoE\u002FmHC) to focus on the Engram module. \n\n\n## 6. License\nThe use of Engram models is subject to [the Model License](LICENSE).\n\n## 7. Contact\n\nIf you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).","Engram项目旨在通过可扩展查找实现条件记忆，为大型语言模型引入新的稀疏性轴。其核心功能包括利用现代N-gram嵌入技术实现O(1)时间复杂度的查找，同时结合Mixture-of-Experts（MoE）方法来优化计算资源与静态内存之间的分配。研究发现，在严格的等参数和等FLOPs条件下，Engram-27B模型在知识、推理、代码及数学等多个领域均优于传统的MoE基线模型。此外，该模块采用确定性寻址方案，可以将大规模嵌入表卸载到主机内存中，从而以最小的推理开销提高系统效率。适用于需要高效处理大量数据并进行快速查询的自然语言处理应用场景。",2,"2026-06-11 03:48:33","high_star"]