[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72555":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72555,"picotron","huggingface\u002Fpicotron","huggingface","Minimalistic 4D-parallelism distributed training framework for education purpose","",null,"Python",2217,190,13,0,8,16,39,24,28.84,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:03:04","# picotron\nIn the spirit of [NanoGPT](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002FnanoGPT), we created Picotron: The minimalist & most-hackable repository for pre-training Llama-like models with [4D Parallelism](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.21783) (Data, Tensor, Pipeline, Context parallel). It is designed with simplicity and **educational** purposes in mind, making it an excellent tool for learning and experimentation.\n\n![](assets\u002Fbanière.png)\n- The code itself is simple and readable: `train.py`, `model.py` and `[data|tensor|pipeline|context]_parallel.py` are all under **300** lines of code.\n\n- Performance is not the best but still under active development. We observed 38% MFU on a LLaMA-2-7B model using 64 H100 GPUs and nearly 50% MFU on the SmolLM-1.7B model with 8 H100 GPUs. Benchmarks will come soon\n- Compared to [Nanotron](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fnanotron\u002Ftree\u002Fmain), Picotron is primarily for educational purposes, helping people quickly get familiar with all the techniques in distributed training\n\n# Tutorial videos\n\n- A step by step tutorial on how to build Picotron distributed training framework form scratch:\n    - [Picotron tutorial (playlist)](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL-_armZiJvAnhcRr6yTJ0__f3Oi-LLi9S) 🎬\n    - [Picotron tutorial (codebase)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpicotron_tutorial) 👷🏻‍♂️\n\n# Install\n\n```\npip install -e .\n```\n\n# Quick start\n- Get a HF token [here](https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftokens) to download models from HuggingFace\n\n- GPU\n    ```sh\n    # To create a config file in json format under tmp by default\n    python create_config.py --out_dir tmp --exp_name llama-1B --dp 8 --model_name HuggingFaceTB\u002FSmolLM-1.7B --num_hidden_layers 15  --grad_acc_steps 32 --mbs 4 --seq_len 1024 --hf_token \u003CHF_TOKEN>\n\n    # Locally\n    torchrun --nproc_per_node 8 train.py --config tmp\u002Fllama-1B\u002Fconfig.json \n\n    # 3D Parallelism\n    python create_config.py --out_dir tmp --dp 4 --tp 2 --pp 2 --pp_engine 1f1b --exp_name llama-7B --model_name meta-llama\u002FLlama-2-7b-hf  --grad_acc_steps 32 --mbs 4 --seq_len 1024 --hf_token \u003CHF_TOKEN>\n\n    # Slurm\n    python submit_slurm_jobs.py --inp_dir tmp\u002Fllama-7B --qos high --hf_token \u003CHF_TOKEN>\n    ```\n\n-  CPU (expect it to be slow)\n    ```sh\n    # 3D Parallelism on CPU\n    python create_config.py --out_dir tmp --exp_name llama-1B-cpu --dp 2 --tp 2 --pp 2 --pp_engine 1f1b --model_name HuggingFaceTB\u002FSmolLM-1.7B --num_hidden_layers 5  --grad_acc_steps 2 --mbs 4 --seq_len 128 --hf_token \u003CHF_TOKEN> --use_cpu\n\n    # Locally\n    torchrun --nproc_per_node 8 train.py --config tmp\u002Fllama-1B-cpu\u002Fconfig.json\n    ```\n\n# Citation\nIf you use Picotron, please cite it as:\n\n```bibtex\n@misc{zhao2025picotron,\n  author = {Haojun Zhao and Ferdinand Mom},\n  title = {Picotron: Distributed training framework for education and research experimentation},\n  year = {2025},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpicotron}}\n}\n```\n\n# Acknowledgements\n\n- [Megatron-LM](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM)\n- [FairScale](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairscale)\n- [LitGPT](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flit-gpt)\n","Picotron 是一个专为教育目的设计的极简4D并行分布式训练框架，用于预训练类似Llama的模型。它支持数据、张量、流水线和上下文四种并行方式，代码简洁易读，核心文件如`train.py`、`model.py`及各并行模块均不超过300行，便于学习者快速掌握分布式训练技术。尽管性能不是最优，但仍在持续优化中，使用64个H100 GPU对LLaMA-2-7B模型进行训练时可达到38%的最大频率利用率。该项目非常适合教学场景或个人研究实验，尤其是对于希望深入了解现代大规模语言模型训练背后原理的研究人员和学生而言。",2,"2026-06-11 03:42:34","high_star"]