[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72116":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},72116,"build-nanogpt","karpathy\u002Fbuild-nanogpt","karpathy","Video+code lecture on building nanoGPT from scratch","",null,"Python",5254,828,49,18,0,15,36,257,45,39.76,false,"master",[],"2026-06-12 02:02:58","# build nanoGPT\n\nThis repo holds the from-scratch reproduction of [nanoGPT](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002FnanoGPT\u002Ftree\u002Fmaster). The git commits were specifically kept step by step and clean so that one can easily walk through the git commit history to see it built slowly. Additionally, there is an accompanying [video lecture on YouTube](https:\u002F\u002Fyoutu.be\u002Fl8pRSuU81PU) where you can see me introduce each commit and explain the pieces along the way.\n\nWe basically start from an empty file and work our way to a reproduction of the [GPT-2](https:\u002F\u002Fd4mucfpksywv.cloudfront.net\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf) (124M) model. If you have more patience or money, the code can also reproduce the [GPT-3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.14165) models. While the GPT-2 (124M) model probably trained for quite some time back in the day (2019, ~5 years ago), today, reproducing it is a matter of ~1hr and ~$10. You'll need a cloud GPU box if you don't have enough, for that I recommend [Lambda](https:\u002F\u002Flambdalabs.com).\n\nNote that GPT-2 and GPT-3 and both simple language models, trained on internet documents, and all they do is \"dream\" internet documents. So this repo\u002Fvideo this does not cover Chat finetuning, and you can't talk to it like you can talk to ChatGPT. The finetuning process (while quite simple conceptually - SFT is just about swapping out the dataset and continuing the training) comes after this part and will be covered at a later time. For now this is the kind of stuff that the 124M model says if you prompt it with \"Hello, I'm a language model,\" after 10B tokens of training:\n\n```\nHello, I'm a language model, and my goal is to make English as easy and fun as possible for everyone, and to find out the different grammar rules\nHello, I'm a language model, so the next time I go, I'll just say, I like this stuff.\nHello, I'm a language model, and the question is, what should I do if I want to be a teacher?\nHello, I'm a language model, and I'm an English person. In languages, \"speak\" is really speaking. Because for most people, there's\n```\n\nAnd after 40B tokens of training:\n\n```\nHello, I'm a language model, a model of computer science, and it's a way (in mathematics) to program computer programs to do things like write\nHello, I'm a language model, not a human. This means that I believe in my language model, as I have no experience with it yet.\nHello, I'm a language model, but I'm talking about data. You've got to create an array of data: you've got to create that.\nHello, I'm a language model, and all of this is about modeling and learning Python. I'm very good in syntax, however I struggle with Python due\n```\n\nLol. Anyway, once the video comes out, this will also be a place for FAQ, and a place for fixes and errata, of which I am sure there will be a number :)\n\nFor discussions and questions, please use [Discussions tab](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fbuild-nanogpt\u002Fdiscussions), and for faster communication, have a look at my [Zero To Hero Discord](https:\u002F\u002Fdiscord.gg\u002F3zy8kqD9Cp), channel **#nanoGPT**:\n\n[![](https:\u002F\u002Fdcbadge.vercel.app\u002Fapi\u002Fserver\u002F3zy8kqD9Cp?compact=true&style=flat)](https:\u002F\u002Fdiscord.gg\u002F3zy8kqD9Cp)\n\n## Video\n\n[Let's reproduce GPT-2 (124M) YouTube lecture](https:\u002F\u002Fyoutu.be\u002Fl8pRSuU81PU)\n\n## Errata\n\nMinor cleanup, we forgot to delete `register_buffer` of the bias once we switched to flash attention, fixed with a recent PR.\n\nEarlier version of PyTorch may have difficulty converting from uint16 to long. Inside `load_tokens`, we added `npt = npt.astype(np.int32)` to use numpy to convert uint16 to int32 before converting to torch tensor and then converting to long.\n\nThe `torch.autocast` function takes an arg `device_type`, to which I tried to stubbornly just pass `device` hoping it works ok, but PyTorch actually really wants just the type and creates errors in some version of PyTorch. So we want e.g. the device `cuda:3` to get stripped to `cuda`. Currently, device `mps` (Apple Silicon) would become `device_type` CPU, I'm not 100% sure this is the intended PyTorch way.\n\nConfusingly, `model.require_backward_grad_sync` is actually used by both the forward and backward pass. Moved up the line so that it also gets applied to the forward pass. \n\n## Prod\n\nFor more production-grade runs that are very similar to nanoGPT, I recommend looking at the following repos:\n\n- [litGPT](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flitgpt)\n- [TinyLlama](https:\u002F\u002Fgithub.com\u002Fjzhang38\u002FTinyLlama)\n\n## FAQ\n\n## License\n\nMIT\n","这个项目是关于从零开始构建nanoGPT的视频教程及代码实现。核心功能包括逐步重现GPT-2（124M）模型，通过详细的Git提交记录和YouTube视频讲解，让学习者能够清晰地理解每一步开发过程。采用Python语言编写，适合对自然语言处理特别是基于Transformer架构的语言模型感兴趣的开发者、研究人员或学生学习使用。该项目不仅提供了一个低成本复现GPT-2的方法，还为深入理解大规模语言模型的工作原理提供了宝贵资源。注意，本项目不涵盖聊天微调等内容。",2,"2026-06-11 03:40:27","high_star"]