[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75098":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},75098,"TorchCode","duoan\u002FTorchCode","duoan","🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.","https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fduoan\u002FTorchCode",null,"Jupyter Notebook",4150,349,9,8,0,27,118,252,81,106.63,false,"master",true,[26,27,28],"interview","leetcode","pytorch","2026-06-12 04:01:17","---\ntitle: TorchCode\nemoji: 🔥\ncolorFrom: red\ncolorTo: yellow\nsdk: docker\napp_port: 7860\npinned: false\n---\n\n\u003Cdiv align=\"center\">\n\n# 🔥 TorchCode\n\n**Crack the PyTorch interview.**\n\nPractice implementing operators and architectures from scratch — the exact skills top ML teams test for.\n\n*Like LeetCode, but for tensors. Self-hosted. Jupyter-based. Instant feedback.*\n\n[![PyTorch](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPyTorch-ee4c2c?style=for-the-badge&logo=pytorch&logoColor=white)](https:\u002F\u002Fpytorch.org)\n[![Jupyter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJupyter-F37626?style=for-the-badge&logo=jupyter&logoColor=white)](https:\u002F\u002Fjupyter.org)\n[![Docker](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocker-2496ED?style=for-the-badge&logo=docker&logoColor=white)](https:\u002F\u002Fwww.docker.com)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython_3.11-3776AB?style=for-the-badge&logo=python&logoColor=white)](https:\u002F\u002Fpython.org)\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow?style=for-the-badge)](LICENSE)\n\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fduoan\u002FTorchCode?style=social)](https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode)\n[![GitHub Container Registry](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fghcr.io-TorchCode-blue?style=flat-square&logo=github)](https:\u002F\u002Fghcr.io\u002Fduoan\u002Ftorchcode)\n[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Spaces-TorchCode-blue?style=flat-square)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fduoan\u002FTorchCode)\n![Problems](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fproblems-40-orange?style=flat-square)\n![GPU](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGPU-not%20required-brightgreen?style=flat-square)\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=duoan\u002FTorchCode&type=Date)](https:\u002F\u002Fstar-history.com\u002F#duoan\u002FTorchCode&Date)\n\n\u003C\u002Fdiv>\n\n---\n\n## 🎯 Why TorchCode?\n\nTop companies (Meta, Google DeepMind, OpenAI, etc.) expect ML engineers to implement core operations **from memory on a whiteboard**. Reading papers isn't enough — you need to write `softmax`, `LayerNorm`, `MultiHeadAttention`, and full Transformer blocks code.\n\nTorchCode gives you a **structured practice environment** with:\n\n| | Feature | |\n|---|---|---|\n| 🧩 | **40 curated problems** | The most frequently asked PyTorch interview topics |\n| ⚖️ | **Automated judge** | Correctness checks, gradient verification, and timing |\n| 🎨 | **Instant feedback** | Colored pass\u002Ffail per test case, just like competitive programming |\n| 💡 | **Hints when stuck** | Nudges without full spoilers |\n| 📖 | **Reference solutions** | Study optimal implementations after your attempt |\n| 📊 | **Progress tracking** | What you've solved, best times, and attempt counts |\n| 🔄 | **One-click reset** | Toolbar button to reset any notebook back to its blank template — practice the same problem as many times as you want |\n| [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](#) | **Open in Colab** | Every notebook has an \"Open in Colab\" badge + toolbar button — run problems in Google Colab with zero setup |\n\nNo cloud. No signup. No GPU needed. Just `make run` — or try it instantly on Hugging Face.\n\n---\n\n## 🚀 Quick Start\n\n### Option 0 — Try it online (zero install)\n\n**[Launch on Hugging Face Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fduoan\u002FTorchCode)** — opens a full JupyterLab environment in your browser. Nothing to install.\n\nOr open any problem directly in Google Colab — every notebook has an [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F01_relu.ipynb) badge.\n\n### Option 0b — Use the judge in Colab (pip)\n\nIn Google Colab, install the judge from PyPI so you can run `check(...)` without cloning the repo:\n\n```bash\n!pip install torch-judge\n```\n\nThen in a notebook cell:\n\n```python\nfrom torch_judge import check, status, hint, reset_progress\nstatus()           # list all problems and your progress\ncheck(\"relu\")      # run tests for the \"relu\" task\nhint(\"relu\")       # show a hint\n```\n\n### Option 1 — Pull the pre-built image (fastest)\n\n```bash\ndocker run -p 8888:8888 -e PORT=8888 ghcr.io\u002Fduoan\u002Ftorchcode:latest\n```\n\nIf the registry image is unavailable for your platform, use Option 2 instead. This is the common path on Apple Silicon \u002F `arm64`.\n\n### Option 2 — Build locally\n\n```bash\nmake run\n```\n\n`make run` will try the prebuilt image first and automatically fall back to a local build when needed.\n\nOpen **\u003Chttp:\u002F\u002Flocalhost:8888>** — that's it. Works with both Docker and Podman (auto-detected).\n\n---\n\n## 📋 Problem Set\n\n> **Frequency**: 🔥 = very likely in interviews, ⭐ = commonly asked, 💡 = emerging \u002F differentiator\n\n### 🧱 Fundamentals — \"Implement X from scratch\"\n\nThe bread and butter of ML coding interviews. You'll be asked to write these without `torch.nn`.\n\n| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |\n|:---:|---------|----------------------|:----------:|:----:|--------------|\n| 1 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F01_relu.ipynb\" target=\"_blank\">ReLU\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F01_relu.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `relu(x)` | ![Easy](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEasy-4CAF50?style=flat-square) | 🔥 | Activation functions, element-wise ops |\n| 2 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F02_softmax.ipynb\" target=\"_blank\">Softmax\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F02_softmax.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `my_softmax(x, dim)` | ![Easy](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEasy-4CAF50?style=flat-square) | 🔥 | Numerical stability, exp\u002Flog tricks |\n| 16 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F16_cross_entropy.ipynb\" target=\"_blank\">Cross-Entropy Loss\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F16_cross_entropy.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `cross_entropy_loss(logits, targets)` | ![Easy](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEasy-4CAF50?style=flat-square) | 🔥 | Log-softmax, logsumexp trick |\n| 17 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F17_dropout.ipynb\" target=\"_blank\">Dropout\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F17_dropout.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `MyDropout` (nn.Module) | ![Easy](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEasy-4CAF50?style=flat-square) | 🔥 | Train\u002Feval mode, inverted scaling |\n| 18 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F18_embedding.ipynb\" target=\"_blank\">Embedding\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F18_embedding.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `MyEmbedding` (nn.Module) | ![Easy](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEasy-4CAF50?style=flat-square) | 🔥 | Lookup table, `weight[indices]` |\n| 19 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F19_gelu.ipynb\" target=\"_blank\">GELU\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F19_gelu.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `my_gelu(x)` | ![Easy](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEasy-4CAF50?style=flat-square) | ⭐ | Gaussian error linear unit, `torch.erf` |\n| 20 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F20_weight_init.ipynb\" target=\"_blank\">Kaiming Init\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F20_weight_init.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `kaiming_init(weight)` | ![Easy](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEasy-4CAF50?style=flat-square) | ⭐ | `std = sqrt(2\u002Ffan_in)`, variance scaling |\n| 21 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F21_gradient_clipping.ipynb\" target=\"_blank\">Gradient Clipping\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F21_gradient_clipping.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `clip_grad_norm(params, max_norm)` | ![Easy](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEasy-4CAF50?style=flat-square) | ⭐ | Norm-based clipping, direction preservation |\n| 31 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F31_gradient_accumulation.ipynb\" target=\"_blank\">Gradient Accumulation\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F31_gradient_accumulation.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `accumulated_step(model, opt, ...)` | ![Easy](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEasy-4CAF50?style=flat-square) | 💡 | Micro-batching, loss scaling |\n| 40 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F40_linear_regression.ipynb\" target=\"_blank\">Linear Regression\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F40_linear_regression.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `LinearRegression` (3 methods) | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | 🔥 | Normal equation, GD from scratch, nn.Linear |\n| 3 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F03_linear.ipynb\" target=\"_blank\">Linear Layer\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F03_linear.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `SimpleLinear` (nn.Module) | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | 🔥 | `y = xW^T + b`, Kaiming init, `nn.Parameter` |\n| 4 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F04_layernorm.ipynb\" target=\"_blank\">LayerNorm\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F04_layernorm.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `my_layer_norm(x, γ, β)` | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | 🔥 | Normalization, running stats, affine transform |\n| 7 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F07_batchnorm.ipynb\" target=\"_blank\">BatchNorm\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F07_batchnorm.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `my_batch_norm(x, γ, β)` | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | ⭐ | Batch vs layer statistics, train\u002Feval behavior |\n| 8 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F08_rmsnorm.ipynb\" target=\"_blank\">RMSNorm\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F08_rmsnorm.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `rms_norm(x, weight)` | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | ⭐ | LLaMA-style norm, simpler than LayerNorm |\n| 15 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F15_mlp.ipynb\" target=\"_blank\">SwiGLU MLP\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F15_mlp.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `SwiGLUMLP` (nn.Module) | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | ⭐ | Gated FFN, `SiLU(gate) * up`, LLaMA\u002FMistral-style |\n| 22 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F22_conv2d.ipynb\" target=\"_blank\">Conv2d\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F22_conv2d.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `my_conv2d(x, weight, ...)` | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | 🔥 | Convolution, unfold, stride\u002Fpadding |\n\n### 🧠 Attention Mechanisms — The heart of modern ML interviews\n\nIf you're interviewing for any role touching LLMs or Transformers, expect at least one of these.\n\n| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |\n|:---:|---------|----------------------|:----------:|:----:|--------------|\n| 23 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F23_cross_attention.ipynb\" target=\"_blank\">Cross-Attention\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F23_cross_attention.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `MultiHeadCrossAttention` (nn.Module) | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | ⭐ | Encoder-decoder, Q from decoder, K\u002FV from encoder |\n| 5 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F05_attention.ipynb\" target=\"_blank\">Scaled Dot-Product Attention\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F05_attention.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `scaled_dot_product_attention(Q, K, V)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 🔥 | `softmax(QK^T\u002F√d_k)V`, the foundation of everything |\n| 6 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F06_multihead_attention.ipynb\" target=\"_blank\">Multi-Head Attention\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F06_multihead_attention.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `MultiHeadAttention` (nn.Module) | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 🔥 | Parallel heads, split\u002Fconcat, projection matrices |\n| 9 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F09_causal_attention.ipynb\" target=\"_blank\">Causal Self-Attention\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F09_causal_attention.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `causal_attention(Q, K, V)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 🔥 | Autoregressive masking with `-inf`, GPT-style |\n| 10 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F10_gqa.ipynb\" target=\"_blank\">Grouped Query Attention\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F10_gqa.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `GroupQueryAttention` (nn.Module) | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | ⭐ | GQA (LLaMA 2), KV sharing across heads |\n| 11 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F11_sliding_window.ipynb\" target=\"_blank\">Sliding Window Attention\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F11_sliding_window.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `sliding_window_attention(Q, K, V, w)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | ⭐ | Mistral-style local attention, O(n·w) complexity |\n| 12 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F12_linear_attention.ipynb\" target=\"_blank\">Linear Attention\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F12_linear_attention.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `linear_attention(Q, K, V)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 💡 | Kernel trick, `φ(Q)(φ(K)^TV)`, O(n·d²) |\n| 14 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F14_kv_cache.ipynb\" target=\"_blank\">KV Cache Attention\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F14_kv_cache.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `KVCacheAttention` (nn.Module) | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 🔥 | Incremental decoding, cache K\u002FV, prefill vs decode |\n| 24 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F24_rope.ipynb\" target=\"_blank\">RoPE\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F24_rope.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `apply_rope(q, k)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 🔥 | Rotary position embedding, relative position via rotation |\n| 25 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F25_flash_attention.ipynb\" target=\"_blank\">Flash Attention\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F25_flash_attention.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `flash_attention(Q, K, V, block_size)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 💡 | Tiled attention, online softmax, memory-efficient |\n\n### 🏗️ Architecture & Adaptation — Put it all together\n\n| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |\n|:---:|---------|----------------------|:----------:|:----:|--------------|\n| 26 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F26_lora.ipynb\" target=\"_blank\">LoRA\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F26_lora.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `LoRALinear` (nn.Module) | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | ⭐ | Low-rank adaptation, frozen base + `BA` update |\n| 27 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F27_vit_patch.ipynb\" target=\"_blank\">ViT Patch Embedding\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F27_vit_patch.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `PatchEmbedding` (nn.Module) | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | 💡 | Image → patches → linear projection |\n| 13 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F13_gpt2_block.ipynb\" target=\"_blank\">GPT-2 Block\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F13_gpt2_block.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `GPT2Block` (nn.Module) | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | ⭐ | Pre-norm, causal MHA + MLP (4x, GELU), residual connections |\n| 28 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F28_moe.ipynb\" target=\"_blank\">Mixture of Experts\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F28_moe.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `MixtureOfExperts` (nn.Module) | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | ⭐ | Mixtral-style, top-k routing, expert MLPs |\n\n### ⚙️ Training & Optimization\n\n| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |\n|:---:|---------|----------------------|:----------:|:----:|--------------|\n| 29 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F29_adam.ipynb\" target=\"_blank\">Adam Optimizer\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F29_adam.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `MyAdam` | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | ⭐ | Momentum + RMSProp, bias correction |\n| 30 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F30_cosine_lr.ipynb\" target=\"_blank\">Cosine LR Scheduler\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F30_cosine_lr.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `cosine_lr_schedule(step, ...)` | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | ⭐ | Linear warmup + cosine annealing |\n\n### 🎯 Inference & Decoding\n\n| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |\n|:---:|---------|----------------------|:----------:|:----:|--------------|\n| 32 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F32_topk_sampling.ipynb\" target=\"_blank\">Top-k \u002F Top-p Sampling\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F32_topk_sampling.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `sample_top_k_top_p(logits, ...)` | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | 🔥 | Nucleus sampling, temperature scaling |\n| 33 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F33_beam_search.ipynb\" target=\"_blank\">Beam Search\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F33_beam_search.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `beam_search(log_prob_fn, ...)` | ![Medium](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMedium-FF9800?style=flat-square) | 🔥 | Hypothesis expansion, pruning, eos handling |\n| 34 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F34_speculative_decoding.ipynb\" target=\"_blank\">Speculative Decoding\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F34_speculative_decoding.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `speculative_decode(target, draft, ...)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 💡 | Accept\u002Freject, draft model acceleration |\n\n### 🔬 Advanced — Differentiators\n\n| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |\n|:---:|---------|----------------------|:----------:|:----:|--------------|\n| 35 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F35_bpe.ipynb\" target=\"_blank\">BPE Tokenizer\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F35_bpe.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `SimpleBPE` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 💡 | Byte-pair encoding, merge rules, subword splits |\n| 36 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F36_int8_quantization.ipynb\" target=\"_blank\">INT8 Quantization\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F36_int8_quantization.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `Int8Linear` (nn.Module) | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 💡 | Per-channel quantize, scale\u002Fzero-point, buffer vs param |\n| 37 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F37_dpo_loss.ipynb\" target=\"_blank\">DPO Loss\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F37_dpo_loss.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `dpo_loss(chosen, rejected, ...)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 💡 | Direct preference optimization, alignment training |\n| 38 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F38_grpo_loss.ipynb\" target=\"_blank\">GRPO Loss\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F38_grpo_loss.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `grpo_loss(logps, rewards, group_ids, eps)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 💡 | Group relative policy optimization, RLAIF, within-group normalized advantages |\n| 39 | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F39_ppo_loss.ipynb\" target=\"_blank\">PPO Loss\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fduoan\u002FTorchCode\u002Fblob\u002Fmaster\u002Ftemplates\u002F39_ppo_loss.ipynb\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\" height=\"20\">\u003C\u002Fa> | `ppo_loss(new_logps, old_logps, advantages, clip_ratio)` | ![Hard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHard-F44336?style=flat-square) | 💡 | PPO clipped surrogate loss, policy gradient, trust region |\n\n---\n\n## ⚙️ How It Works\n\nEach problem has **two** notebooks:\n\n| File | Purpose |\n|------|---------|\n| `01_relu.ipynb` | ✏️ Blank template — write your code here |\n| `01_relu_solution.ipynb` | 📖 Reference solution — check when stuck |\n\n### Workflow\n\n```text\n1. Open a blank notebook           →  Read the problem description\n2. Implement your solution         →  Use only basic PyTorch ops\n3. Debug freely                    →  print(x.shape), check gradients, etc.\n4. Run the judge cell              →  check(\"relu\")\n5. See instant colored feedback    →  ✅ pass \u002F ❌ fail per test case\n6. Stuck? Get a nudge              →  hint(\"relu\")\n7. Review the reference solution   →  01_relu_solution.ipynb\n8. Click 🔄 Reset in the toolbar  →  Blank slate — practice again!\n```\n\n### In-Notebook API\n\n```python\nfrom torch_judge import check, hint, status\n\ncheck(\"relu\")               # Judge your implementation\nhint(\"causal_attention\")    # Get a hint without full spoiler\nstatus()                    # Progress dashboard — solved \u002F attempted \u002F todo\n```\n\n---\n\n## 📅 Suggested Study Plan\n\n> **Total: ~12–16 hours spread across 3–4 weeks. Perfect for interview prep on a deadline.**\n\n| Week | Focus | Problems | Time |\n|:----:|-------|----------|:----:|\n| **1** | 🧱 Foundations | ReLU → Softmax → CE Loss → Dropout → Embedding → GELU → Linear → LayerNorm → BatchNorm → RMSNorm → SwiGLU MLP → Conv2d | 2–3 hrs |\n| **2** | 🧠 Attention Deep Dive | SDPA → MHA → Cross-Attn → Causal → GQA → KV Cache → Sliding Window → RoPE → Linear Attn → Flash Attn | 3–4 hrs |\n| **3** | 🏗️ Architecture + Training | GPT-2 Block → LoRA → MoE → ViT Patch → Adam → Cosine LR → Grad Clip → Grad Accumulation → Kaiming Init | 3–4 hrs |\n| **4** | 🎯 Inference + Advanced | Top-k\u002Fp Sampling → Beam Search → Speculative Decoding → BPE → INT8 Quant → DPO Loss → GRPO Loss → PPO Loss + speed run | 3–4 hrs |\n\n---\n\n## 🏛️ Architecture\n\n```text\n┌──────────────────────────────────────────┐\n│           Docker \u002F Podman Container      │\n│                                          │\n│  JupyterLab (:8888)                      │\n│    ├── templates\u002F  (reset on each run)   │\n│    ├── solutions\u002F  (reference impl)      │\n│    ├── torch_judge\u002F (auto-grading)       │\n│    ├── torchcode-labext (JLab plugin)    │\n│    │     🔄 Reset — restore template     │\n│    │     🔗 Colab — open in Colab        │\n│    └── PyTorch (CPU), NumPy              │\n│                                          │\n│  Judge checks:                           │\n│    ✓ Output correctness (allclose)       │\n│    ✓ Gradient flow (autograd)            │\n│    ✓ Shape consistency                   │\n│    ✓ Edge cases & numerical stability    │\n└──────────────────────────────────────────┘\n```\n\nSingle container. Single port. No database. No frontend framework. No GPU.\n\n## 🛠️ Commands\n\n```bash\nmake run    # Build & start (http:\u002F\u002Flocalhost:8888)\nmake stop   # Stop the container\nmake clean  # Stop + remove volumes + reset all progress\n```\n\n## 🧩 Adding Your Own Problems\n\nTorchCode uses auto-discovery — just drop a new file in `torch_judge\u002Ftasks\u002F`:\n\n```python\nTASK = {\n    \"id\": \"my_task\",\n    \"title\": \"My Custom Problem\",\n    \"difficulty\": \"medium\",\n    \"function_name\": \"my_function\",\n    \"hint\": \"Think about broadcasting...\",\n    \"tests\": [ ... ],\n}\n```\n\nNo registration needed. The judge picks it up automatically.\n\n---\n\n## 📦 Publishing `torch-judge` to PyPI (maintainers)\n\nThe judge is published as a separate package so Colab\u002Fusers can `pip install torch-judge` without cloning the repo.\n\n### Automatic (GitHub Action)\n\nPushing to `master` after changing the package version triggers [`.github\u002Fworkflows\u002Fpypi-publish.yml`](.github\u002Fworkflows\u002Fpypi-publish.yml), which builds and uploads to PyPI. No git tag is required.\n\n1. **Bump version** in `torch_judge\u002F_version.py` (e.g. `__version__ = \"0.1.1\"`).\n2. **Configure PyPI Trusted Publisher** (one-time):\n   - PyPI → Your project **torch-judge** → **Publishing** → **Add a new pending publisher**\n   - Owner: `duoan`, Repository: `TorchCode`, Workflow: `pypi-publish.yml`, Environment: (leave empty)\n   - Run the workflow once (push a version bump to `master` or **Actions → Publish torch-judge to PyPI → Run workflow**); PyPI will then link the publisher.\n3. **Release**: commit the version bump and `git push origin master`.\n\nAlternatively, use an API token: add repository secret `PYPI_API_TOKEN` (value = `pypi-...` from PyPI) and set `TWINE_USERNAME=__token__` and `TWINE_PASSWORD` from that secret in the workflow if you prefer not to use Trusted Publishing.\n\n### Manual\n\n```bash\npip install build twine\npython -m build\ntwine upload dist\u002F*\n```\n\nVersion is in `torch_judge\u002F_version.py`; bump it before each release.\n\n---\n\n## ❓ FAQ\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Do I need a GPU?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\nNo. Everything runs on CPU. The problems test correctness and understanding, not throughput.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Can I keep my solutions between runs?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\nBlank templates reset on every \u003Ccode>make run\u003C\u002Fcode> so you practice from scratch. Save your work under a different filename if you want to keep it. You can also click the \u003Cb>🔄 Reset\u003C\u002Fb> button in the notebook toolbar at any time to restore the blank template without restarting.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Can I use Google Colab instead?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\nYes! Every notebook has an \u003Cb>Open in Colab\u003C\u002Fb> badge at the top. Click it to open the problem directly in Google Colab — no Docker or local setup needed. You can also use the \u003Cb>Colab\u003C\u002Fb> toolbar button inside JupyterLab.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>How are solutions graded?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\nThe judge runs your function against multiple test cases using \u003Ccode>torch.allclose\u003C\u002Fcode> for numerical correctness, verifies gradients flow properly via autograd, and checks edge cases specific to each operation.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Who is this for?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\nAnyone preparing for ML\u002FAI engineering interviews at top tech companies, or anyone who wants to deeply understand how PyTorch operations work under the hood.\n\u003C\u002Fdetails>\n\n---\n\n## 🤝 Contributors\n\nThanks to everyone who has contributed to TorchCode.\n\n\u003C!-- readme: contributors -start -->\n\u003Ctable>\n\t\u003Ctbody>\n\t\t\u003Ctr>\n            \u003Ctd align=\"center\">\n                \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fduoan\">\n                    \u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F2378740?v=4\" width=\"100;\" alt=\"duoan\"\u002F>\n                    \u003Cbr \u002F>\n                    \u003Csub>\u003Cb>duoan\u003C\u002Fb>\u003C\u002Fsub>\n                \u003C\u002Fa>\n            \u003C\u002Ftd>\n            \u003Ctd align=\"center\">\n                \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAndo233\">\n                    \u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F74404658?v=4\" width=\"100;\" alt=\"Ando233\"\u002F>\n                    \u003Cbr \u002F>\n                    \u003Csub>\u003Cb>Ando233\u003C\u002Fb>\u003C\u002Fsub>\n                \u003C\u002Fa>\n            \u003C\u002Ftd>\n            \u003Ctd align=\"center\">\n                \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FThierryHJ\">\n                    \u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F51846529?v=4\" width=\"100;\" alt=\"ThierryHJ\"\u002F>\n                    \u003Cbr \u002F>\n                    \u003Csub>\u003Cb>ThierryHJ\u003C\u002Fb>\u003C\u002Fsub>\n                \u003C\u002Fa>\n            \u003C\u002Ftd>\n\t\t\u003C\u002Ftr>\n\t\u003Ctbody>\n\u003C\u002Ftable>\n\u003C!-- readme: contributors -end -->\n\nAuto-generated from the [GitHub contributors graph](https:\u002F\u002Fgithub.com\u002Fduoan\u002FTorchCode\u002Fgraphs\u002Fcontributors) with avatars and GitHub usernames.\n\n---\n\n\u003Cdiv align=\"center\">\n\n**Built for engineers who want to deeply understand what they build.**\n\nIf this helped your interview prep, consider giving it a ⭐\n\n---\n\n### ☕ Buy Me a Coffee\n\n\u003Ca href=\"https:\u002F\u002Fbuymeacoffee.com\u002Fduoan\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fcdn.buymeacoffee.com\u002Fbuttons\u002Fdefault-orange.png\" alt=\"Buy Me A Coffee\" height=\"41\" width=\"174\">\u003C\u002Fa>\n\n\u003Cimg src=\".\u002Fbmc_qr.png\" alt=\"BMC QR Code\" width=\"150\" height=\"150\">\n\n*Scan to support*\n\n\u003C\u002Fdiv>\n","TorchCode 是一个专为 PyTorch 学习者设计的实践平台，旨在帮助用户从零开始实现诸如 softmax、注意力机制和 GPT-2 等关键组件，并通过即时自动评分来检验代码的正确性。项目基于 Jupyter Notebook 构建，支持自托管或在线使用，提供包括40个精选问题、自动化评判系统（含正确性检查与梯度验证）、即时反馈以及进度跟踪等核心功能。特别适合准备机器学习岗位面试、希望提升 PyTorch 编程技能或对深度学习基础操作感兴趣的开发者。无需云服务注册或GPU支持即可运行。",2,"2026-06-11 03:52:18","high_star"]