[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2757":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},2757,"Megatron-LM","NVIDIA\u002FMegatron-LM","NVIDIA","Ongoing research training transformer models at scale","https:\u002F\u002Fdocs.nvidia.com\u002Fmegatron-core\u002Fdeveloper-guide\u002Flatest\u002Fget-started\u002Fquickstart.html",null,"Python",16668,4065,167,356,0,12,85,371,64,45,"Other",false,"main",[26,27,28],"large-language-models","model-para","transformers","2026-06-12 02:00:43","\u003Cdiv align=\"center\">\n\nMegatron-LM and Megatron Core\n=============================\n\n\u003Ch4>GPU-optimized library for training transformer models at scale\u003C\u002Fh4>\n\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-latest-brightgreen.svg?style=flat)](https:\u002F\u002Fdocs.nvidia.com\u002Fmegatron-core\u002Fdeveloper-guide\u002Flatest\u002Findex.html)\n[![version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Frelease-0.15.0-green)](.\u002FCHANGELOG.md)\n[![license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache-blue)](.\u002FLICENSE)\n\n\u003Cdiv align=\"left\">\n\n## About\n\nThis repository contains two components: **Megatron-LM** and **Megatron Core**.\n\n**Megatron-LM** is a reference example that includes Megatron Core plus pre-configured training scripts. Best for research teams, learning distributed training, and quick experimentation.\n\n**Megatron Core** is a composable library with GPU-optimized building blocks for custom training frameworks. It provides transformer building blocks, advanced parallelism strategies (TP, PP, DP, EP, CP), mixed precision support (FP16, BF16, FP8, FP4), and model architectures. Best for framework developers and ML engineers building custom training pipelines.\n\n**[Megatron Bridge](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge)** provides bidirectional Hugging Face ↔ Megatron checkpoint conversion with production-ready recipes.\n\n## Getting Started\n\n**Install from PyPI:**\n\n```bash\nuv pip install megatron-core\n```\n\n**Or clone and install from source:**\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM.git\ncd Megatron-LM\nuv pip install -e .\n```\n\n> **Note:** Building from source can use a lot of memory. If the build runs out of memory, limit parallel compilation jobs by setting `MAX_JOBS` (e.g. `MAX_JOBS=4 uv pip install -e .`).\n\nFor NGC container setup and all installation options, see the **[Installation Guide](https:\u002F\u002Fdocs.nvidia.com\u002Fmegatron-core\u002Fdeveloper-guide\u002Flatest\u002Fget-started\u002Finstall.html)**.\n\n- **[Your First Training Run](https:\u002F\u002Fdocs.nvidia.com\u002Fmegatron-core\u002Fdeveloper-guide\u002Flatest\u002Fget-started\u002Fquickstart.html)** - End-to-end training examples with data preparation\n- **[Parallelism Strategies](https:\u002F\u002Fdocs.nvidia.com\u002Fmegatron-core\u002Fdeveloper-guide\u002Flatest\u002Fuser-guide\u002Fparallelism-guide.html)** - Scale training across GPUs with TP, PP, DP, EP, and CP\n- **[Contribution Guide](https:\u002F\u002Fdocs.nvidia.com\u002Fmegatron-core\u002Fdeveloper-guide\u002Flatest\u002Fdeveloper\u002Fcontribute.html)** - How to contribute to Megatron Core\n\n# Latest News\n\n- **[2026\u002F03]** **Deprecating Python 3.10 support:** We're officially dropping Python 3.10 support with the upcoming 0.17.0 release. Downstream applications must raise their lower boundary to 3.12 to stay compatible with MCore.\n- **[2026\u002F01]** **[Dynamic Context Parallelism](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fspeeding-up-variable-length-training-with-dynamic-context-parallelism-and-nvidia-megatron-core\u002F)** - Up to 1.48x speedup for variable-length sequence training with adaptive CP sizing.\n- **[2025\u002F12]** **Megatron Core development has moved to GitHub!** All development and CI now happens in the open. We welcome community contributions.\n- **[2025\u002F10]** **[Megatron Dev Branch](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Ftree\u002Fdev)** - early access branch with experimental features.\n- **[2025\u002F10]** **[Megatron Bridge](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge)** - Bidirectional converter for interoperability between Hugging Face and Megatron checkpoints, featuring production-ready recipes for popular models.\n- **[2025\u002F08]** **[MoE Q3-Q4 2025 Roadmap](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Fissues\u002F1729)** - Comprehensive roadmap for MoE features including DeepSeek-V3, Qwen3, advanced parallelism strategies, FP8 optimizations, and Blackwell performance enhancements.\n- **[2025\u002F08]** **[GPT-OSS Model](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Fissues\u002F1739)** - Advanced features including YaRN RoPE scaling, attention sinks, and custom activation functions are being integrated into Megatron Core.\n- **[2025\u002F06]** **[Megatron MoE Model Zoo](https:\u002F\u002Fgithub.com\u002Fyanring\u002FMegatron-MoE-ModelZoo)** - Best practices and optimized configurations for training DeepSeek-V3, Mixtral, and Qwen3 MoE models with performance benchmarking and checkpoint conversion tools.\n- **[2025\u002F05]** Megatron Core v0.11.0 brings new capabilities for multi-data center LLM training ([blog](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fturbocharge-llm-training-across-long-haul-data-center-networks-with-nvidia-nemo-framework\u002F)).\n\n\u003Cdetails>\n\u003Csummary>Previous News\u003C\u002Fsummary>\n\n- **[2024\u002F07]** Megatron Core v0.7 improves scalability and training resiliency and adds support for multimodal training ([blog](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Ftrain-generative-ai-models-more-efficiently-with-new-nvidia-Megatron-Core-functionalities\u002F)).\n- **[2024\u002F06]** Megatron Core added supports for Mamba-based models. Check out our paper [An Empirical Study of Mamba-based Language Models](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.07887) and [code example](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Ftree\u002Fssm\u002Fexamples\u002Fmamba).\n- **[2024\u002F01 Announcement]** NVIDIA has released the core capabilities in **Megatron-LM** into [**Megatron Core**](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Ftree\u002Fmain\u002Fmegatron\u002Fcore) in this repository. Megatron Core expands upon Megatron-LM's GPU-optimized techniques with more cutting-edge innovations on system-level optimizations, featuring composable and modular APIs.\n\n\u003C\u002Fdetails>\n\n# Project Structure\n\n```\nMegatron-LM\u002F\n├── megatron\u002F\n│   ├── core\u002F                    # Megatron Core (kernels, parallelism, building blocks)\n│   │   ├── models\u002F              # Transformer models\n│   │   ├── transformer\u002F         # Transformer building blocks\n│   │   ├── tensor_parallel\u002F     # Tensor parallelism\n│   │   ├── pipeline_parallel\u002F   # Pipeline parallelism\n│   │   ├── distributed\u002F         # Distributed training (FSDP, DDP)\n│   │   ├── optimizer\u002F           # Optimizers\n│   │   ├── datasets\u002F            # Dataset loaders\n│   │   ├── inference\u002F           # Inference engines and server\n│   │   └── export\u002F              # Model export (e.g. TensorRT-LLM)\n│   ├── training\u002F                # Training scripts\n│   ├── legacy\u002F                  # Legacy components\n│   ├── post_training\u002F           # Post-training (quantization, distillation, pruning, etc.)\n│   └── rl\u002F                      # Reinforcement learning (RLHF, etc.)\n├── examples\u002F                    # Ready-to-use training examples\n├── tools\u002F                       # Utility tools\n├── tests\u002F                       # Comprehensive test suite\n└── docs\u002F                        # Documentation\n```\n\n# Performance Benchmarking\n\nFor our latest performance benchmarking results, please refer to [NVIDIA Megatron Bridge Performance Summary](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fmegatron-bridge\u002Flatest\u002Fperformance-summary.html).\n\nOur codebase efficiently trains models from 2B to 462B parameters across thousands of GPUs, achieving up to **47% Model FLOP Utilization (MFU)** on H100 clusters.\n\n![Model table](images\u002Fmodel_table.png)\n\n**Benchmark Configuration:**\n\n- **Vocabulary size**: 131,072 tokens\n- **Sequence length**: 4096 tokens\n- **Model scaling**: Varied hidden size, attention heads, and layers to achieve target parameter counts\n- **Communication optimizations**: Fine-grained overlapping with DP (`--overlap-grad-reduce`, `--overlap-param-gather`), TP (`--tp-comm-overlap`), and PP (enabled by default)\n\n**Key Results:**\n\n- **6144 H100 GPUs**: Successfully benchmarked 462B parameter model training\n- **Superlinear scaling**: MFU increases from 41% to 47-48% with model size\n- **End-to-end measurement**: Throughputs include all operations (data loading, optimizer steps, communication, logging)\n- **Production ready**: Full training pipeline with checkpointing and fault tolerance\n- *Note: Performance results measured without training to convergence*\n\n## Weak Scaling Results\n\nOur weak scaled results show superlinear scaling (MFU increases from 41% for the smallest model considered to 47-48% for the largest models); this is because larger GEMMs have higher arithmetic intensity and are consequently more efficient to execute.\n\n![Weak scaling](images\u002Fweak_scaling.png)\n\n## Strong Scaling Results\n\nWe also strong scaled the standard GPT-3 model (our version has slightly more than 175 billion parameters due to larger vocabulary size) from 96 H100 GPUs to 4608 GPUs, using the same batch size of 1152 sequences throughout. Communication becomes more exposed at larger scale, leading to a reduction in MFU from 47% to 42%.\n\n![Strong scaling](images\u002Fstrong_scaling.png)\n\n# Roadmaps\n\n- **[MoE Roadmap](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Fissues\u002F1729)** - DeepSeek-V3, Qwen3, advanced parallelism, FP8 optimizations, and Blackwell enhancements\n\n# Resources\n\n## Getting Help\n\n- 📖 **[Documentation](https:\u002F\u002Fdocs.nvidia.com\u002Fmegatron-core\u002Fdeveloper-guide\u002Flatest\u002Findex.html)** - Official documentation\n- 🐛 **[Issues](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Fissues)** - Bug reports and feature requests\n\n## Contributing\n\nWe ❤️ contributions! Ways to contribute:\n\n- 🐛 **Report bugs** - Help us improve reliability\n- 💡 **Suggest features** - Shape the future of Megatron Core\n- 📝 **Improve docs** - Make Megatron Core more accessible\n- 🔧 **Submit PRs** - Contribute code improvements\n\n**→ [Contributing Guide](https:\u002F\u002Fdocs.nvidia.com\u002Fmegatron-core\u002Fdeveloper-guide\u002Flatest\u002Fdeveloper\u002Fcontribute.html)**\n\n## Citation\n\nIf you use Megatron in your research or project, we appreciate that you use the following citations:\n\n```bibtex\n@article{megatron-lm,\n  title={Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism},\n  author={Shoeybi, Mohammad and Patwary, Mostofa and Puri, Raul and LeGresley, Patrick and Casper, Jared and Catanzaro, Bryan},\n  journal={arXiv preprint arXiv:1909.08053},\n  year={2019}\n}\n```\n","NVIDIA\u002FMegatron-LM是一个用于大规模训练Transformer模型的GPU优化库。其核心功能包括提供高效的Transformer构建块、多种并行策略（如张量并行TP、流水线并行PP等）、以及对混合精度的支持，旨在帮助研究人员和开发者实现更高效的大规模模型训练。Megatron-LM特别适合需要进行分布式训练的研究团队、希望快速实验新想法的学习者，以及正在开发自定义训练框架的机器学习工程师。此外，通过Megatron Bridge还可以实现与Hugging Face模型之间的双向转换，进一步扩展了其应用场景。",2,"2026-06-11 02:51:07","top_language"]