[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74065":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":10,"languages":10,"totalLinesOfCode":10,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":15,"starSnapshotCount":15,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},74065,"open-infra-index","deepseek-ai\u002Fopen-infra-index","deepseek-ai","Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation","",null,8004,287,464,1,0,18,38.38,"Creative Commons Zero v1.0 Universal",false,"main",[],"2026-06-12 02:03:21","\u003C!-- markdownlint-disable first-line-h1 -->\n\u003C!-- markdownlint-disable html -->\n\u003C!-- markdownlint-disable no-duplicate-header -->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Flogo.svg?raw=true\" width=\"60%\" alt=\"DeepSeek-Open-Infra\" \u002F>\n\u003C\u002Fdiv>\n\u003Chr>\n\n# Hello, DeepSeek Open Infra!\n\n## 202505 Industry Track Paper (ISCA25)  \n### Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures\n[**📄 Arxiv Paper Link**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.09343)\n\n## 202504 [The Path to Open-Sourcing the DeepSeek Inference Engine](OpenSourcing_DeepSeek_Inference_Engine\u002FREADME.md)\n\n## 202502 Open-Source Week\nWe're a tiny team @deepseek-ai pushing our limits in AGI exploration.\n\nStarting **this week** , Feb 24, 2025 we'll open-source 5 repos – one daily drop – not because we've made grand claims, \nbut simply as developers sharing our small-but-sincere progress with full transparency.\n\nThese are humble building blocks of our online service: documented, deployed, and battle-tested in production. \nNo vaporware, just sincere code that moved our tiny yet ambitious dream forward.\n\nWhy? Because every line shared becomes collective momentum that accelerates the journey.\nDaily unlocks begin soon. No ivory towers - just pure garage-energy and community-driven innovation 🔧\n\nStay tuned – let's geek out in the open together.\n\n### Day 1 - [FlashMLA](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FFlashMLA)\n\n**Efficient MLA Decoding Kernel for Hopper GPUs**  \nOptimized for variable-length sequences, battle-tested in production  \n\n🔗 [**FlashMLA GitHub Repo**](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FFlashMLA)  \n✅ BF16 support  \n✅ Paged KV cache (block size 64)  \n⚡ Performance: 3000 GB\u002Fs memory-bound | BF16 580 TFLOPS compute-bound on H800\n\n### Day 2 - [DeepEP](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepEP)\n\nExcited to introduce **DeepEP** - the first open-source EP communication library for MoE model training and inference.\n\n🔗 [**DeepEP GitHub Repo**](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepEP)  \n✅ Efficient and optimized all-to-all communication  \n✅ Both intranode and internode support with NVLink and RDMA  \n✅ High-throughput kernels for training and inference prefilling  \n✅ Low-latency kernels for inference decoding  \n✅ Native FP8 dispatch support  \n✅ Flexible GPU resource control for computation-communication overlapping  \n\n### Day 3 - [DeepGEMM](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepGEMM)\n\nIntroducing **DeepGEMM** - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3\u002FR1 training and inference.\n\n🔗 [**DeepGEMM GitHub Repo**](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepGEMM)  \n⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs  \n✅ No heavy dependency, as clean as a tutorial  \n✅ Fully Just-In-Time compiled  \n✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes  \n✅ Supports dense layout and two MoE layouts  \n\n### Day 4 - Optimized Parallelism Strategies\n\n✅ **DualPipe** - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3\u002FR1 training.  \n🔗 [**GitHub Repo**](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDualPipe)  \n\n✅ **EPLB** - an expert-parallel load balancer for V3\u002FR1.  \n🔗 [**GitHub Repo**](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Feplb)  \n\n📊 Analyze computation-communication overlap in V3\u002FR1.  \n🔗 [**GitHub Repo**](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fprofile-data)  \n\n### Day 5 - 3FS, Thruster for All DeepSeek Data Access\n\n**Fire-Flyer File System (3FS)** - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.\n\n⚡ 6.6 TiB\u002Fs aggregate read throughput in a 180-node cluster  \n⚡ 3.66 TiB\u002Fmin throughput on GraySort benchmark in a 25-node cluster  \n⚡ 40+ GiB\u002Fs peak throughput per client node for KVCache lookup  \n🧬 Disaggregated architecture with strong consistency semantics  \n✅ Training data preprocessing, dataset loading, checkpoint saving\u002Freloading, embedding vector search & KVCache lookups for inference in V3\u002FR1\n\n📥 **3FS** → 🔗[**GitHub Repo**](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS)  \n⛲ **Smallpond** - data processing framework on 3FS → 🔗[**GitHub Repo**](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fsmallpond)\n\n### Day 6 - One More Thing: DeepSeek-V3\u002FR1 Inference System Overview\n\nOptimized throughput and latency via:  \n🔧 Cross-node EP-powered batch scaling  \n🔄 Computation-communication overlap  \n⚖️ Load balancing  \n\nProduction data of V3\u002FR1 online services:  \n⚡ **73.7k\u002F14.8k** input\u002Foutput tokens per second per H800 node  \n🚀 Cost profit margin **545%**  \n\n![Cost And Theoretical Income.jpg](202502OpenSourceWeek\u002Ffigures\u002FCost%20And%20Theoretical%20Income.jpg)\n\n💡 We hope this week's insights offer value to the community and contribute to our shared AGI goals.\n\n📖 Deep Dive: 🔗[Day 6 - One More Thing: DeepSeek-V3\u002FR1 Inference System Overview](202502OpenSourceWeek\u002Fday_6_one_more_thing_deepseekV3R1_inference_system_overview.md)  \n📖 中文版: 🔗[DeepSeek-V3 \u002F R1 推理系统概览](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F27181462601)\n\n## 2024 AI Infrastructure Paper (SC24)  \n### Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning\n\n[**📄 Paper Link**](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1109\u002FSC41406.2024.00089)  \n[**📄 Arxiv Paper Link**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.14158)\n","deepseek-ai\u002Fopen-infra-index 是一个经过生产环境验证的AI基础设施工具集，旨在促进高效的AGI开发和社区驱动的创新。该项目提供了多个核心组件，包括FlashMLA（针对Hopper GPU优化的高效MLA解码内核）、DeepEP（首个开源的用于MoE模型训练和推理的EP通信库）以及DeepGEMM（支持FP8 GEMM运算的高性能库）。这些工具在实际部署中表现出色，特别是在需要处理大规模数据集、加速计算密集型任务及优化跨节点通信效率的场景下尤为适用。通过提供简洁高效的代码实现与灵活的资源管理机制，该工具集为开发者们构建下一代人工智能系统提供了坚实的基础。",2,"2026-06-11 03:48:39","high_star"]