[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79853":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":9,"totalLinesOfCode":9,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":9,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":9,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":16,"starSnapshotCount":16,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},79853,"Megatron-Bridge","NVIDIA-NeMo\u002FMegatron-Bridge","NVIDIA-NeMo","Training library for Megatron-based models with bidirectional Hugging Face conversion capability",null,"https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge","Python",712,357,8,212,0,20,29,42,60,11.66,false,"main","2026-06-12 02:03:54","\u003Cdiv align=\"center\">\n\n# NeMo Megatron Bridge\n\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgithub\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fgraph\u002Fbadge.svg?token=4NMKZVOW2Z)](https:\u002F\u002Fcodecov.io\u002Fgithub\u002FNVIDIA-NeMo\u002FMegatron-Bridge)\n[![CICD NeMo](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Factions\u002Fworkflows\u002Fcicd-main.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Factions\u002Fworkflows\u002Fcicd-main.yml)\n[![Python 3.10+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-3100\u002F)\n[![GitHub Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNVIDIA-NeMo\u002FMegatron-Bridge.svg?style=social&label=Star&cacheSeconds=14400)](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fstargazers\u002F)\n\n[Documentation](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fmegatron-bridge\u002Flatest\u002F) | [Supported Models](#supported-models) | [Examples](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples) | [Contributing](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002FCONTRIBUTING.md)\n\u003C\u002Fdiv>\n\n## 📣 News\n- [05\u002F28\u002F2026] [**Step-3.7-Flash**](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fstepfun\u002Fstep37) is now merged on **main**! See the [examples README](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fstepfun\u002Fstep37\u002FREADME.md) for sft training details.\n\n- [05\u002F20\u002F2026] [**DeepSeek V4**](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fdeepseek_v4) is now merged on **main**! See the [examples README](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Fmodels\u002Fdeepseek_v4\u002FREADME.md) for conversion and inference details.\n\n- [05\u002F20\u002F2026] [**Nemotron-3 Nano Omni**](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FNemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16) day-0 branch support is now merged on **main**! The 30B-A3B MoE multimodal model supports image, video, audio, and text workflows with checkpoint conversion, inference, SFT, and PEFT (LoRA) examples. Read the [NVIDIA Blog](https:\u002F\u002Fblogs.nvidia.com\u002Fblog\u002Fnemotron-3-nano-omni-multimodal-ai-agents\u002F) and see the [examples README](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Fmodels\u002Fnemotron\u002Fnemotron_3_omni\u002FREADME.md) for the full walkthrough.\n\n- [05\u002F19\u002F2026] [**Nemotron-Labs Diffusion**](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fdiffusion\u002Frecipes\u002Fnemotron_labs_diffusion) is now supported on **main** with autoregressive-to-diffusion conversion, continuous pretraining, checkpoint conversion, and inference workflows. Read the [NVIDIA Research blog](https:\u002F\u002Fresearch.nvidia.com\u002Fpublication\u002F2026-05_nemotron-labs-diffusion-tri-mode-language-model-unifying-autoregressive) for the tri-mode language model overview.\n\n- [05\u002F06\u002F2026] [**Gemma 4 VL 26B-A4B**](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fgemma\u002Fgemma4_vl) is now supported! Checkpoint conversion, SFT, and PEFT (LoRA) recipes for Google's MoE vision-language model (26B total \u002F 4B active params, 128 experts top-k=8, dual sliding\u002Fglobal attention with K=V tying on full-attention layers) are available on **main**. See the [examples README](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Fmodels\u002Fgemma\u002Fgemma4_vl\u002FREADME.md) for the full walkthrough.\n\n- [04\u002F28\u002F2026] Day 0 support for [**Nemotron-3 Nano Omni**](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FNemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16), a 30B-A3B MoE multimodal model that jointly processes image, video, audio, and text. Checkpoint conversion, SFT, and LoRA recipes are available on **main** — see the [examples README](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Fmodels\u002Fnemotron\u002Fnemotron_3_omni\u002FREADME.md) for the full walkthrough.\n\n- [04\u002F19\u002F2026] [**Qwen3.6-35B-A3B**](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3.6-35B-A3B) is now supported! Qwen3.6 uses the same architecture as Qwen3.5 VL MoE (`Qwen3_5MoeForConditionalGeneration`) and works with the existing [Qwen3.5-VL bridge](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Fmodels\u002Fqwen_vl) out of the box — no code changes needed. HF→Megatron conversion and inference verified.\n\n- [04\u002F16\u002F2026] **Megatron Bridge 0.4.0 released!** New model support (Kimi 2.5, Nemotron 3 Super, Qwen 3.5 VL, MiniMax M2, Sarvam, MiMo, and more), diffusion model collection, sequence-packing improvements, FP8 export, pruning & quantization, Transformers 5.x compatibility, and Python 3.12 migration. Huge thanks to our community contributors: [@HollowMan6](https:\u002F\u002Fgithub.com\u002FHollowMan6), [@shaltielshmid](https:\u002F\u002Fgithub.com\u002Fshaltielshmid), [@jaeminh](https:\u002F\u002Fgithub.com\u002Fjaeminh), [@pavelgein](https:\u002F\u002Fgithub.com\u002Fpavelgein), [@ShiftyBlock](https:\u002F\u002Fgithub.com\u002FShiftyBlock), [@erictang000](https:\u002F\u002Fgithub.com\u002Ferictang000), [@eternally-z](https:\u002F\u002Fgithub.com\u002Feternally-z), [@Hayak3](https:\u002F\u002Fgithub.com\u002FHayak3), and [@mohit-sarvam](https:\u002F\u002Fgithub.com\u002Fmohit-sarvam)! See the [full release notes](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Freleases\u002Ftag\u002Fv0.4.0).\n\n- [04\u002F12\u002F2026] [**MiniMax-M2.5 \u002F M2.7**](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fminimax\u002Fminimax_m2) are now supported! Both models share the same architecture as MiniMax-M2 and work with the existing bridge out of the box — checkpoint conversion and inference verified on real FP8 checkpoints.\n\n- [04\u002F10\u002F2026] [**Qwen3-ASR**](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fqwen\u002Fqwen3_asr) is now supported! Checkpoint conversion and inference for [Qwen3's ASR model](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR) are available on **main**.\n\n- [04\u002F09\u002F2026] [**Bailing MoE V2**](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fbailing) is now supported! Checkpoint conversion and inference for the Bailing MoE V2 model are available on **main**. Thank you to [@ccclyu](https:\u002F\u002Fgithub.com\u002Fccclyu) for the community contribution!\n\n- [04\u002F07\u002F2026] Megatron Bridge’s PEFT support was featured at [PyTorch Conference Europe 2026 Talk](https:\u002F\u002Fpytorchconferenceeu2026.sched.com\u002Fevent\u002F2Juce\u002Foptimizing-reinforcement-learning-at-trillion-parameter-scale-songlin-jiang-aalto-university-mind-lab).\n\n- [04\u002F01\u002F2026] [**Kimi K2.5 VL**](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fkimi\u002Fkimi_k25_vl) is now supported! Checkpoint conversion, inference, and training recipes for [Moonshot AI’s Kimi-K2.5-VL](https:\u002F\u002Fhuggingface.co\u002Fmoonshotai\u002FKimi-K2.5) vision-language model are available on **main**.\n\n- [03\u002F31\u002F2026] **Agent Skills for Megatron Bridge!** We've added a [`skills\u002F`](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fskills) directory with structured guides that AI coding agents (Cursor, Claude Code, Codex, etc.) can use to help you add model support, set up dev environments, tune performance, and more. Try them out, and PRs to improve or add new skills are very welcome!\n\n- [03\u002F26\u002F2026] [**Nemotron 3 Super**](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fnemotron\u002Fnemotron_3) is now on **main**! Checkpoint conversion and SFT\u002FLoRA recipes (120B-A12B) are available in the main branch. Read the [blog post](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fintroducing-nemotron-3-super-an-open-hybrid-mamba-transformer-moe-for-agentic-reasoning\u002F).\n\n- [03\u002F12\u002F2026] **Deprecating Python 3.10 support:** We're officially dropping Python 3.10 support with the upcoming 0.4.0 release. Downstream applications must raise their lower boundary to 3.12 to stay compatible with Megatron-Bridge.\n\n- [12\u002F16\u002F2025] [Mind Lab](https:\u002F\u002Fmacaron.im\u002Fmindlab) successfully used Megatron-bridge and [VeRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) to trained GRPO Lora for Trillion-parameter model on 64 H800 - See their [techblog](https:\u002F\u002Fmacaron.im\u002Fmindlab\u002Fresearch\u002Fbuilding-trillion-parameter-reasoning-rl-with-10-gpus).\n\n- [12\u002F15\u002F2025] Day 0 support for [NVIDIA-NeMotron-3-Nano-30B-A3B-FP8](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FNVIDIA-Nemotron-3-Nano-30B-A3B-FP8)! [Reproducible code](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fnemotron\u002Fnemotron_3\u002Fnano) and custom NGC container: [nvcr.io\u002Fnvidia\u002Fnemo:25.11.nemotron_3_nano](https:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fcontainers\u002Fnemo?version=25.11.nemotron_3_nano)\n\n## Overview\n\nNeMo Megatron Bridge is a PyTorch-native library within the [NeMo Framework](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo) that provides pretraining, SFT and LoRA for popular language, vision-language, audio, and multimodal models. It serves as a powerful **bridge, conversion, and verification layer** between 🤗 Hugging Face and [Megatron Core](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Ftree\u002Fmain\u002Fmegatron\u002Fcore). It provides bidirectional checkpoint conversion between these formats, enabling other projects to leverage Megatron Core's parallelism capabilities or export models for various inference engines. The bridge includes built-in verification mechanisms to ensure conversion accuracy and checkpoint integrity across different model formats.\n\nOn top of the bridge, NeMo Megatron Bridge provides a performant and scalable PyTorch-native training loop that leverages [Megatron Core](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Ftree\u002Fmain\u002Fmegatron\u002Fcore) to deliver state-of-the-art training throughput. It supports pretraining and fine-tuning with features like tensor and pipeline parallelism, and mixed precision (FP8, BF16, FP4, etc.). Users can either use existing 🤗 Hugging Face models or define custom PyTorch model definitions for flexible end-to-end workflows.\n\nNeMo Megatron Bridge is a refactor of the [previous NeMo](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNeMo) training stack that adopts a PyTorch-native training loop to provide greater flexibility and customizability for developers.\n\n![image](Repo-Mbridge.png)\n\n## 🔧 Installation\n\n### 🐳 NeMo Framework container\n\nThe best experience, highest performance, and full feature support are provided by the [NeMo Framework container](https:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fcontainers\u002Fnemo\u002Ftags). Fetch the most recent $TAG and run the following to start a container:\n\n```bash\ndocker run --rm -it -w \u002Fworkdir -v $(pwd):\u002Fworkdir \\\n  --entrypoint bash \\\n  --gpus all \\\n  nvcr.io\u002Fnvidia\u002Fnemo:${TAG}\n```\n\nFor development installation and additional details, please refer to our [Contribution guide](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002FCONTRIBUTING.md).\n\n### Megatron-Core Submodule (main & dev)\n\nMegatron Bridge pins [Megatron-Core](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM) as a git submodule at `3rdparty\u002FMegatron-LM`. The repository tracks two pinned commits — one from the upstream **main** branch (default) and one from **dev** — managed by `scripts\u002Fswitch_mcore.sh`.\n\nThe submodule committed to the repo always points to the **main** commit. Use the **dev** commit when you need a Megatron-Core feature or fix that has not yet landed on main, or to validate forward-compatibility with upcoming MCore changes:\n\n```bash\n.\u002Fscripts\u002Fswitch_mcore.sh status   # Show current commit\n.\u002Fscripts\u002Fswitch_mcore.sh dev      # Switch to dev; then run: uv sync\n.\u002Fscripts\u002Fswitch_mcore.sh main     # Switch back; then run: uv sync --locked\n```\n\n> **Note:** `uv.lock` is generated against the main commit. After switching to dev, use `uv sync` (without `--locked`). After switching back to main, use `uv sync --locked`.\n\nThe dev branch follows Megatron-LM's upstream [dev branch philosophy](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Ftree\u002Fdev) — features are experimental, follow a streamlined review process, and must graduate to stable within 6 months or be deprecated.\n\n## ⚡ Quickstart\n\nTo get started, install Megatron Bridge or download a NeMo Framework container as described [above](#-installation).\n\nLog in to Hugging Face Hub:\n\n```sh\nhuggingface-cli login --token \u003Cyour token>\n```\n\nConversion-only quickstart (✅ Core):\n\n```python\nfrom megatron.bridge import AutoBridge\n\n# 1) Create a bridge from a Hugging Face model (hub or local path)\nbridge = AutoBridge.from_hf_pretrained(\"meta-llama\u002FLlama-3.2-1B\", trust_remote_code=True)\n\n# 2) Get a Megatron provider and configure parallelism before instantiation\nprovider = bridge.to_megatron_provider()\nprovider.tensor_model_parallel_size = 1\nprovider.pipeline_model_parallel_size = 1\nprovider.finalize()\n# 3) Materialize Megatron Core model(s)\nmodel = provider.provide_distributed_model(wrap_with_ddp=False)\n\n# 4a) Export Megatron → Hugging Face (full HF folder with config\u002Ftokenizer\u002Fweights)\nbridge.save_hf_pretrained(model, \".\u002Fhf_exports\u002Fllama32_1b\")\n\n# 4b) Or stream only weights (Megatron → HF)\nfor name, weight in bridge.export_hf_weights(model, cpu=True):\n    print(name, tuple(weight.shape))\n```\n\nTraining quickstart using pre-configured recipes:\n\n```python\nfrom megatron.bridge.recipes.llama import llama32_1b_pretrain_config\nfrom megatron.bridge.training.gpt_step import forward_step\nfrom megatron.bridge.training.pretrain import pretrain\n\nif __name__ == \"__main__\":\n    # The recipe uses the Llama 3.2 1B model configuration from HuggingFace\n    cfg = llama32_1b_pretrain_config()\n\n    # Override training parameters\n    cfg.train.train_iters = 10\n    cfg.scheduler.lr_decay_iters = 10000\n    cfg.model.vocab_size = 8192\n    cfg.tokenizer.vocab_size = cfg.model.vocab_size\n\n    pretrain(cfg, forward_step)\n```\n\nYou can launch the above script with:\n\n```sh\nuv run python -m torch.distributed.run --nproc-per-node=\u003Cnum devices> \u002Fpath\u002Fto\u002Fscript.py\n```\n\nMore examples:\n\n- [Conversion scripts overview](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Fconversion\u002FREADME.md)\n- [Import\u002FExport checkpoints](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Fconversion\u002Fconvert_checkpoints.py)\n- [Generation with bridge](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Fconversion\u002Fhf_to_megatron_generate_text.py)\n- [Multi-GPU loading from HF](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Fconversion\u002Fhf_megatron_roundtrip_multi_gpu.py)\n- [Compare HF vs Megatron outputs](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Fconversion\u002Fcompare_models.py)\n- [Toy RLHF with Bridge (HF inference + Megatron training)](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fexamples\u002Frl\u002Frlhf_with_bridge.py)\n\nFor a deeper dive into conversion design and advanced usage, see the [models README](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Fmodels\u002FREADME.md).\n\n## 🚀 Key Features\n\n- **Bridge with 🤗 Hugging Face**: Seamless bidirectional conversion between 🤗 Hugging Face and Megatron formats for interoperability ([model bridges](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Fmodels), [auto bridge](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Fmodels\u002Fconversion\u002Fauto_bridge.py), [conversion examples](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fconversion))\n  - Online import\u002Fexport without intermediate full checkpoints\n  - Parallelism-aware (TP\u002FPP\u002FVPP\u002FCP\u002FEP\u002FETP) during conversion\n  - Memory-efficient per-parameter streaming\n  - Simple high-level `AutoBridge` API with architecture auto-detection\n  - Optimized paths when Transformer Engine is available\n- **Flexible to Customize**: Lightweight custom training loop making it easy to configure custom logic in data loading, distributed training, checkpointing, evaluation and logging ([training framework](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Ftraining), [training utilities](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Ftraining\u002Futils))\n- **Supervised & Parameter-Efficient Finetuning**: SFT & PEFT implementation tailored for Megatron-based models that supports LoRA, DoRA, and user-defined PEFT methods ([PEFT implementations](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Fpeft), [finetune module](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Ftraining\u002Ffinetune.py), [SFT dataset](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Fdata\u002Fdatasets\u002Fsft.py))\n- **SOTA Training Recipes**: Pre-configured production-ready training recipes for popular models like Llama 3, with optimized hyperparameters and distributed training configuration ([Llama recipes](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Frecipes\u002Fllama), [recipe examples](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels))\n- **Performance Optimization**: Built-in support for FP8 training, model parallelism, and memory-efficient techniques to offer high utilization and near-linear scalability to thousands of nodes. ([mixed precision](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Ftraining\u002Fmixed_precision.py), [communication overlap](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Ftraining\u002Fcomm_overlap.py), [optimizer utilities](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Frecipes\u002Futils\u002Foptimizer_utils.py))\n\n## Supported Models\n\nMegatron Bridge provides out-of-the-box bridges and training recipes for a wide range of models, built on top of base model architectures from [Megatron Core](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM\u002Ftree\u002Fmain\u002Fmegatron\u002Fcore). Refer to the [models directory](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Ftree\u002Fmain\u002Fsrc\u002Fmegatron\u002Fbridge\u002Fmodels) for the full list of model bridges.\n\n| Family | Supported variants |\n|----------------|--------------------|\n| [**Bailing**](docs\u002Fmodels\u002Fbailing\u002Findex.md) | Ling 2.0 (Bailing) |\n| [**DeepSeek**](docs\u002Fmodels\u002Fdeepseek\u002Findex.md) | DeepSeek V2 \u002F V2 Lite, DeepSeek V3, DeepSeek V4 |\n| [**Falcon**](docs\u002Fmodels\u002Ffalcon\u002Findex.md) | Falcon H1 |\n| [**Gemma**](docs\u002Fmodels\u002Fgemma\u002Findex.md) | Gemma \u002F Gemma 2, Gemma 3, Gemma 3-VL, Gemma 4-VL (26B-A4B MoE) |\n| [**GLM**](docs\u002Fmodels\u002Fglm\u002Findex.md) | GLM-4.5 \u002F 4.7 \u002F 4.7-Flash, GLM-4.5V, GLM-5 \u002F 5.1 |\n| [**GPT-OSS**](docs\u002Fmodels\u002Fgpt_oss\u002Findex.md) | GPT-oss |\n| [**Kimi**](docs\u002Fmodels\u002Fkimi\u002Findex.md) | Kimi K2, Kimi-K2.5-VL |\n| [**Llama**](docs\u002Fmodels\u002Fllama\u002Findex.md) | Llama 2, Llama 3 \u002F 3.1 \u002F 3.2 \u002F 3.3 |\n| [**MiniMax**](docs\u002Fmodels\u002Fminimax\u002Findex.md) | MiniMax-M2 \u002F M2.5 \u002F M2.7 |\n| [**Mistral**](docs\u002Fmodels\u002Fmistral\u002Findex.md) | Mistral, Ministral 3 (3B\u002F8B\u002F14B) |\n| [**Xiaomi-MiMo**](docs\u002Fmodels\u002Fmimo\u002Findex.md) | Xiaomi-MiMo |\n| [**Moonlight**](docs\u002Fmodels\u002Fmoonlight\u002Findex.md) | Moonlight |\n| [**Nemotron**](docs\u002Fmodels\u002Fnemotron\u002Findex.md) | Nemotron H, Nemotron Nano v2, Nemotron-3 Nano, Nemotron-3 Super, Llama Nemotron, Nemotron Nano v2 VL, Nemotron-3 Nano Omni |\n| [**OLMoE**](docs\u002Fmodels\u002Folmoe\u002Findex.md) | OLMoE |\n| [**Qwen**](docs\u002Fmodels\u002Fqwen\u002Findex.md) | Qwen2 \u002F Qwen2.5, Qwen3, Qwen3-MoE, Qwen3 Next, Qwen2.5-VL, Qwen3-VL, Qwen3.5-VL, Qwen3.6-VL, Qwen2 Audio, Qwen2.5-Omni, Qwen3-Omni, Qwen3-ASR |\n| [**Sarvam**](docs\u002Fmodels\u002Fsarvam\u002Findex.md) | Sarvam |\n| [**StepFun**](docs\u002Fmodels\u002Fstepfun\u002Findex.md) | Step-3.5-Flash |\n\n### Launching Recipes\n\nFor a conceptual overview of how recipes are structured, overridden, and launched with either `torchrun` or NeMo-Run, read the [Using Recipes guide](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fmegatron-bridge\u002Flatest\u002Frecipe-usage.html).\n\nRunnable tutorials live in `tutorials\u002Frecipes\u002Fllama` that covers:\n\n- `00_quickstart_pretrain.py` for mock-data pretraining\n- `01_quickstart_finetune.py` + LoRA configs\n- YAML-driven flows and launch helpers\n\n## Performance Benchmarks\n\nFor detailed performance benchmarks including throughput metrics across different GPU systems (DGX-GB200, DGX-B200, DGX-H100) and model configurations, see the [Performance Summary](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fmegatron-bridge\u002Flatest\u002Fperformance-summary.html) in our documentation.\n\n## Project Structure\n\n```\nMegatron-Bridge\u002F\n├── examples\u002F\n│   ├── models\u002F                  # Bridge usage examples\n│   └── recipes\u002F                 # Training examples\n├── src\u002Fmegatron\u002Fbridge\u002F\n│   ├── data\u002F                    # Dataloaders and iterators\n│   ├── models\u002F                  # Hugging Face bridge infrastructure and model-specific implementations\n│   │   ├── llama\u002F               # Llama model providers\n│   │   └── ...\u002F                 # Other models (gpt, t5, etc.)\n│   ├── peft\u002F                    # PEFT transformations and wrappers\n│   ├── recipes\u002F                 # Complete training recipes\n│   ├── training\u002F                # Training loop components\n│   │   ├── tokenizers\u002F          # Tokenizer library\n│   │   └── utils\u002F               # Training-specific utilities\n│   └── utils\u002F                   # Generic utilities for repo-wide usage\n└── tests\u002F                       # Comprehensive test suite\n```\n\n## Acknowledgement & Contributing\n\nMegatron-Bridge is the continuation of [MBridge](https:\u002F\u002Fgithub.com\u002FISEEKYAN\u002Fmbridge) by [Yan Bai](https:\u002F\u002Fgithub.com\u002FISEEKYAN). We appreciate all the contribution and adoptions by the community partners:\n\n- [Mind Lab](https:\u002F\u002Fmacaron.im\u002Fmindlab) successfully used Megatron-bridge and [VeRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) to trained GRPO Lora for Trillion-parameter model on 64 H800 - See their [techblog](https:\u002F\u002Fmacaron.im\u002Fmindlab\u002Fresearch\u002Fbuilding-trillion-parameter-reasoning-rl-with-10-gpus).\n- [VeRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) has adopted Megatron-Bridge as a connector to Megatron-Core and for LoRA support.\n- [Slime](https:\u002F\u002Fgithub.com\u002FTHUDM\u002Fslime) has adopted Megatron-Bridge as Megatron-Core checkpoint converter.\n- [SkyRL](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL) has adopted Megatron-Bridge as Megatron-Core connector.\n- [Nemo-RL](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnemo-rl) has adopted Megatron-Bridge as Megatron-Core connector.\n- Community contributions: Special thanks to [Guanyou He](https:\u002F\u002Fgithub.com\u002FThaurun) and [Junyu Wu](https:\u002F\u002Fgithub.com\u002Fnrailg) from Weixin Group Infrastructure Center.\n\nPlease see our [Contributor Guidelines](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FMegatron-Bridge\u002Fblob\u002Fmain\u002FCONTRIBUTING.md) for more information on how to get involved.\n","NeMo Megatron Bridge 是一个用于基于Megatron模型训练的库，支持双向Hugging Face模型转换。其核心功能包括高效的模型训练、灵活的模型转换以及对多种预训练模型的支持，如DeepSeek V4、Nemotron-3 Nano Omni等，并提供了从自回归到扩散模型的转换能力。技术上，该项目采用Python语言编写，确保了良好的兼容性和易用性。适用于需要大规模语言模型训练和多模态处理的应用场景，例如自然语言处理、图像识别及音频分析等领域。通过提供详细的文档和示例代码，帮助开发者快速上手并部署相关模型。",2,"2026-06-11 03:58:17","trending"]