[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83217":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":10,"rankLanguage":10,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":10,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":26,"discoverSource":27},83217,"LLM-Step-by-Step","P-Slark\u002FLLM-Step-by-Step","P-Slark","Building a small LLM in PyTorch from scratch - a step-by-step journey inspired by Stanford CS336.","",null,"Python",105,2,0,33,1.43,"MIT License",false,"main",[21,22,23],"cs336","llm","pytorch","2026-06-12 02:04:32","\u003Cp align=\"center\">\n  \u003Cimg alt=\"LLM step by step\" src=\"docs\u002Flogo.png\" width=\"55%\">\n\u003C\u002Fp>\n\n\u003Ch3 align=\"center\">\nWhat worked, what didn't, and why — a small LLM, end-to-end\n\u003C\u002Fh3>\n\n\u003Cp align=\"center\">\n| \u003Ca href=\".\u002FJOURNEY.md\">\u003Cb>Journey\u003C\u002Fb>\u003C\u002Fa> |\n\u003Ca href=\"https:\u002F\u002Fstanford-cs336.github.io\u002F\">\u003Cb>CS336 (curriculum)\u003C\u002Fb>\u003C\u002Fa> |\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FP-Slark\u002FLLM-Step-by-Step\u002Fissues\">\u003Cb>Issues\u003C\u002Fb>\u003C\u002Fa> |\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Ca href=\"LICENSE\">\u003Cimg alt=\"License: MIT\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-blue.svg\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.python.org\u002F\">\u003Cimg alt=\"Python 3.12+\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.12+-blue.svg\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fpytorch.org\u002F\">\u003Cimg alt=\"PyTorch ~2.11\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPyTorch-~2.11-red.svg\">\u003C\u002Fa>\n\u003Cimg alt=\"Status: WIP\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstatus-work%20in%20progress-yellow.svg\">\n\u003C\u002Fp>\n\n\n## 👉 Two things to read\n\n\u003Ctable>\n\u003Ctr>\n\u003Ctd width=\"50%\" valign=\"top\">\n\n### 📖 [JOURNEY.md](.\u002FJOURNEY.md)\n\n**The writing.** A chapter-by-chapter narrative of *what I tried, what was slow, and what each round of optimization actually bought me.* Each chapter maps to a commit. Most \"build an LLM\" tutorials show you the final code — this one shows the path.\n\nIf you only read **one** thing in this repo, read this.\n\n\u003C\u002Ftd>\n\u003Ctd width=\"50%\" valign=\"top\">\n\n### 💻 [`cs336_basics\u002F`](.\u002Fcs336_basics\u002F)\n\n**The code.** Clean, dependency-light implementations of every piece: BPE tokenizer, RMSNorm\u002FRoPE\u002FSwiGLU\u002FMHA, AdamW, cosine LR schedule, gradient clipping, training loop, sampling.\n\nRead alongside JOURNEY.md. Each module is short and meant to be read top-to-bottom.\n\n\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\nEverything else in the repo (`scripts\u002F`, `tests\u002F`, fixtures, the assignment PDF) is scaffolding around those two things.\n\n---\n\nThe curriculum is borrowed from Stanford's [CS336](https:\u002F\u002Fstanford-cs336.github.io\u002F)\n(\"Language Models from Scratch\").\n\n## About\n\n- **🧱 Built from PyTorch primitives, not framework abstractions.** No `torch.nn.Transformer`, no HuggingFace, no `transformers` library, no pre-trained weights. Every layer, optimizer, and training step is written from scratch in this repo — autograd and tensor ops are the only things borrowed.\n- **📖 A step-by-step journey, not just code.** [JOURNEY.md](.\u002FJOURNEY.md) walks through *what was tried, what was slow, and what each round of optimization actually bought* — chapter by chapter, commit by commit. Most \"build an LLM\" repos hand you the final code. This one shows the path.\n\n## What's in here\n\n| Part | Topic | Code | Journey chapter |\n|---|---|---|---|\n| I  | BPE tokenizer (train + encode\u002Fdecode, parallel + incremental) | `cs336_basics\u002Fbpe.py`, `tokenizer.py` | [Iterations 1–4](.\u002FJOURNEY.md#iteration-1) |\n| II | Transformer model (RMSNorm, RoPE, SwiGLU, MHA, tied embeddings) | `cs336_basics\u002Fmodel.py` | [Part II](.\u002FJOURNEY.md#part-ii--transformer-building-blocks) |\n| III | Training building blocks (cross-entropy, AdamW, cosine LR, grad clip) | `cs336_basics\u002Foptim.py`, `training.py`, `nn_utils.py` | [Part III](.\u002FJOURNEY.md#part-iii--training) |\n| IV | Training loop (data loader, checkpointing, `scripts\u002Ftrain.py`) | `cs336_basics\u002Fdata.py`, `scripts\u002F` | [Part IV](.\u002FJOURNEY.md#part-iv--training-loop) |\n| V  | Text generation (temperature, top-k, top-p, EOS stopping) | `cs336_basics\u002Fdecoding.py`, `scripts\u002Fgenerate.py` | [Part V](.\u002FJOURNEY.md#part-v--text-generation) |\n\n## Quickstart\n\n```sh\n# Install (uses uv for env management)\npip install uv\nuv sync\n\n# Run the unit tests\nuv run pytest\n\n# Download TinyStories\nmkdir -p data && cd data\nwget https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Froneneldan\u002FTinyStories\u002Fresolve\u002Fmain\u002FTinyStoriesV2-GPT4-train.txt\nwget https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Froneneldan\u002FTinyStories\u002Fresolve\u002Fmain\u002FTinyStoriesV2-GPT4-valid.txt\ncd ..\n\n# Train a BPE tokenizer, encode the corpus, train a small model, generate text\n# (see JOURNEY.md for the full pipeline)\nuv run scripts\u002Fencode_corpus.py --help\nuv run scripts\u002Ftrain.py --help\nuv run scripts\u002Fgenerate.py --help\n```\n\n\n## Credits & honest scope\n\nThe curriculum, the spec, and the test fixtures all come from Stanford's\nCS336 (\"Language Models from Scratch\"), which generously publishes its\nmaterials online. All the implementations and the writing in JOURNEY.md\nare mine. If you're a current CS336 student, please respect your course's\ncollaboration policy — this repo is a learning log, not a copy-paste solution.\n\nThe original assignment scaffolding (handout PDF, test adapters,\nsubmission script) lives under [`assignment\u002F`](.\u002Fassignment\u002F).\n","2026-06-11 04:10:26","CREATED_QUERY"]