[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72370":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":15,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},72370,"DualPipe","deepseek-ai\u002FDualPipe","deepseek-ai","A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3\u002FR1 training.","",null,"Python",2965,326,31,3,0,7,14,9,70.44,"MIT License",false,"main",[],"2026-06-12 04:01:04","# DualPipe\n\nDualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the [DeepSeek-V3 Technical Report](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.19437). It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to the [profile data](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fprofile-data).\n\n### Schedules\n\n![dualpipe](images\u002Fdualpipe.png)\n\nExample DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions.\nThe micro-batches in the reverse direction are symmetric to those in the forward direction, so\nwe omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border\nhave mutually overlapped computation and communication\n\n## DualPipeV\n\nDualPipeV is a concise V-shape schedule derived from DualPipe using a \"cut-in-half\" procedure, introduced by Sea AI Lab as \"Cut-in-half\" in their [blog post](https:\u002F\u002Fhackmd.io\u002F@ufotalent\u002Fr1lVXsa9Jg). Thanks to them for this efficient schedule!\n\n### Schedules\n\n![dualpipev](images\u002Fdualpipev.png)\n\nExample DualPipeV scheduling for 4 PP ranks (8 PP stages) and 10 micro-batches.\n\n## Pipeline Bubbles and Memory Usage Comparison (based on the same number of PP stages)\n\n| Method      | Bubble                          | Parameter Per Device | Activation Per Device | #Devices |\n|-------------|---------------------------------|----------------------|-----------------------|----------|\n| 1F1B        | (*PP*-1)(𝐹+𝐵)                   | 1×                   | *PP*                  | *PP*     |\n| ZB1P        | (*PP*-1)(𝐹+𝐵-2𝑊)               | 1×                   | *PP*                  | *PP*     |\n| DualPipe    | (*PP*\u002F2-1)(𝐹&𝐵+𝐵-3𝑊)           | 2×                   | *PP*+1                | *PP*     |\n| DualPipeV   | (*PP*\u002F2-1)(𝐹&𝐵+𝐵-3𝑊)           | 2×                   | *PP*+1                | *PP*\u002F2   |\n\n*PP* denotes the number of pp stages (even).\n𝐹 denotes the execution time of a forward chunk, 𝐵 denotes the execution time of a\nfull backward chunk, 𝑊 denotes the execution time of a \"backward for weights\" chunk, and 𝐹&𝐵\ndenotes the execution time of two mutually overlapped forward and backward chunks.\n\n## Quick Start\n\nThe usage is shown in the following example:\n\n```bash\npython examples\u002Fexample_dualpipe.py\npython examples\u002Fexample_dualpipev.py\n```\n\nNote: For real-world applications, you will need to implement a custom `overlapped_forward_backward` method tailored to your specific module.\n\n## Requirements\n\n- PyTorch 2.0 and above\n\n## Developers\n\nDualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.\n\n## Citation\n\n```bibtex\n@misc{deepseekai2025deepseekv3technicalreport,\n      title={DeepSeek-V3 Technical Report}, \n      author={DeepSeek-AI},\n      year={2025},\n      eprint={2412.19437},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.19437}, \n}\n```\n","DualPipe是一个用于DeepSeek V3\u002FR1训练中的双向流水线并行算法，旨在实现前向和后向计算-通信阶段的完全重叠，并减少流水线气泡。该项目通过创新的时间调度策略，在保持高效数据传输的同时最大化利用计算资源，适用于需要大规模深度学习模型训练且对计算效率有高要求的场景。技术上，DualPipe支持PyTorch 2.0及以上版本，并提供了两种调度方式：标准DualPipe及简化版DualPipeV，后者采用“切半”过程进一步优化了内存使用与执行效率。对于希望提高分布式训练效率的研究者或工程师而言，DualPipe提供了一个实用而高效的解决方案。",2,"2026-06-11 03:41:32","high_star"]