[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-464":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":24,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":39,"readmeContent":40,"aiSummary":41,"trendingCount":16,"starSnapshotCount":16,"syncStatus":42,"lastSyncTime":43,"discoverSource":44},464,"annotated_deep_learning_paper_implementations","labmlai\u002Fannotated_deep_learning_paper_implementations","labmlai","🧑‍🏫 60+ Implementations\u002Ftutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠","https:\u002F\u002Fnn.labml.ai",null,"Python",66923,6707,498,28,0,31,335,14,45,"MIT License",false,"master",true,[26,27,28,29,30,31,32,33,34,35,36,37,38],"attention","deep-learning","deep-learning-tutorial","gan","literate-programming","lora","machine-learning","neural-networks","optimizers","pytorch","reinforcement-learning","transformer","transformers","2026-06-12 02:00:13","[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Flabmlai?style=social)](https:\u002F\u002Ftwitter.com\u002Flabmlai)\n\n# [labml.ai Deep Learning Paper Implementations](https:\u002F\u002Fnn.labml.ai\u002Findex.html)\n\nThis is a collection of simple PyTorch implementations of\nneural networks and related algorithms.\nThese implementations are documented with explanations,\n\n[The website](https:\u002F\u002Fnn.labml.ai\u002Findex.html)\nrenders these as side-by-side formatted notes.\nWe believe these would help you understand these algorithms better.\n\n![Screenshot](https:\u002F\u002Fnn.labml.ai\u002Fdqn-light.png)\n\nWe are actively maintaining this repo and adding new \nimplementations almost weekly.\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Flabmlai?style=social)](https:\u002F\u002Ftwitter.com\u002Flabmlai) for updates.\n\n## Paper Implementations\n\n#### ✨ [Transformers](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Findex.html)\n\n* [JAX implementation](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fjax_transformer\u002Findex.html)\n* [Multi-headed attention](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fmha.html)\n* [Triton Flash Attention](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fflash\u002Findex.html)\n* [Transformer building blocks](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fmodels.html) \n* [Transformer XL](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fxl\u002Findex.html)\n    * [Relative multi-headed attention](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fxl\u002Frelative_mha.html)\n* [Rotary Positional Embeddings](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Frope\u002Findex.html)\n* [Attention with Linear Biases (ALiBi)](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Falibi\u002Findex.html)\n* [RETRO](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fretro\u002Findex.html)\n* [Compressive Transformer](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fcompressive\u002Findex.html)\n* [GPT Architecture](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fgpt\u002Findex.html)\n* [GLU Variants](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fglu_variants\u002Fsimple.html)\n* [kNN-LM: Generalization through Memorization](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fknn)\n* [Feedback Transformer](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Ffeedback\u002Findex.html)\n* [Switch Transformer](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fswitch\u002Findex.html)\n* [Fast Weights Transformer](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Ffast_weights\u002Findex.html)\n* [FNet](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Ffnet\u002Findex.html)\n* [Attention Free Transformer](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Faft\u002Findex.html)\n* [Masked Language Model](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fmlm\u002Findex.html)\n* [MLP-Mixer: An all-MLP Architecture for Vision](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fmlp_mixer\u002Findex.html)\n* [Pay Attention to MLPs (gMLP)](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fgmlp\u002Findex.html)\n* [Vision Transformer (ViT)](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fvit\u002Findex.html)\n* [Primer EZ](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fprimer_ez\u002Findex.html)\n* [Hourglass](https:\u002F\u002Fnn.labml.ai\u002Ftransformers\u002Fhour_glass\u002Findex.html)\n\n#### ✨ [Low-Rank Adaptation (LoRA)](https:\u002F\u002Fnn.labml.ai\u002Flora\u002Findex.html)\n\n#### ✨ [Eleuther GPT-NeoX](https:\u002F\u002Fnn.labml.ai\u002Fneox\u002Findex.html)\n* [Generate on a 48GB GPU](https:\u002F\u002Fnn.labml.ai\u002Fneox\u002Fsamples\u002Fgenerate.html)\n* [Finetune on two 48GB GPUs](https:\u002F\u002Fnn.labml.ai\u002Fneox\u002Fsamples\u002Ffinetune.html)\n* [LLM.int8()](https:\u002F\u002Fnn.labml.ai\u002Fneox\u002Futils\u002Fllm_int8.html)\n\n#### ✨ [Diffusion models](https:\u002F\u002Fnn.labml.ai\u002Fdiffusion\u002Findex.html)\n\n* [Denoising Diffusion Probabilistic Models (DDPM)](https:\u002F\u002Fnn.labml.ai\u002Fdiffusion\u002Fddpm\u002Findex.html)\n* [Denoising Diffusion Implicit Models (DDIM)](https:\u002F\u002Fnn.labml.ai\u002Fdiffusion\u002Fstable_diffusion\u002Fsampler\u002Fddim.html)\n* [Latent Diffusion Models](https:\u002F\u002Fnn.labml.ai\u002Fdiffusion\u002Fstable_diffusion\u002Flatent_diffusion.html)\n* [Stable Diffusion](https:\u002F\u002Fnn.labml.ai\u002Fdiffusion\u002Fstable_diffusion\u002Findex.html)\n\n#### ✨ [Generative Adversarial Networks](https:\u002F\u002Fnn.labml.ai\u002Fgan\u002Findex.html)\n* [Original GAN](https:\u002F\u002Fnn.labml.ai\u002Fgan\u002Foriginal\u002Findex.html)\n* [GAN with deep convolutional network](https:\u002F\u002Fnn.labml.ai\u002Fgan\u002Fdcgan\u002Findex.html)\n* [Cycle GAN](https:\u002F\u002Fnn.labml.ai\u002Fgan\u002Fcycle_gan\u002Findex.html)\n* [Wasserstein GAN](https:\u002F\u002Fnn.labml.ai\u002Fgan\u002Fwasserstein\u002Findex.html)\n* [Wasserstein GAN with Gradient Penalty](https:\u002F\u002Fnn.labml.ai\u002Fgan\u002Fwasserstein\u002Fgradient_penalty\u002Findex.html)\n* [StyleGAN 2](https:\u002F\u002Fnn.labml.ai\u002Fgan\u002Fstylegan\u002Findex.html)\n\n#### ✨ [Recurrent Highway Networks](https:\u002F\u002Fnn.labml.ai\u002Frecurrent_highway_networks\u002Findex.html)\n\n#### ✨ [LSTM](https:\u002F\u002Fnn.labml.ai\u002Flstm\u002Findex.html)\n\n#### ✨ [HyperNetworks - HyperLSTM](https:\u002F\u002Fnn.labml.ai\u002Fhypernetworks\u002Fhyper_lstm.html)\n\n#### ✨ [ResNet](https:\u002F\u002Fnn.labml.ai\u002Fresnet\u002Findex.html)\n\n#### ✨ [ConvMixer](https:\u002F\u002Fnn.labml.ai\u002Fconv_mixer\u002Findex.html)\n\n#### ✨ [Capsule Networks](https:\u002F\u002Fnn.labml.ai\u002Fcapsule_networks\u002Findex.html)\n\n#### ✨ [U-Net](https:\u002F\u002Fnn.labml.ai\u002Funet\u002Findex.html)\n\n#### ✨ [Sketch RNN](https:\u002F\u002Fnn.labml.ai\u002Fsketch_rnn\u002Findex.html)\n\n#### ✨ Graph Neural Networks\n\n* [Graph Attention Networks (GAT)](https:\u002F\u002Fnn.labml.ai\u002Fgraphs\u002Fgat\u002Findex.html)\n* [Graph Attention Networks v2 (GATv2)](https:\u002F\u002Fnn.labml.ai\u002Fgraphs\u002Fgatv2\u002Findex.html)\n\n#### ✨ [Counterfactual Regret Minimization (CFR)](https:\u002F\u002Fnn.labml.ai\u002Fcfr\u002Findex.html)\n\nSolving games with incomplete information such as poker with CFR.\n\n* [Kuhn Poker](https:\u002F\u002Fnn.labml.ai\u002Fcfr\u002Fkuhn\u002Findex.html)\n\n#### ✨ [Reinforcement Learning](https:\u002F\u002Fnn.labml.ai\u002Frl\u002Findex.html)\n* [Proximal Policy Optimization](https:\u002F\u002Fnn.labml.ai\u002Frl\u002Fppo\u002Findex.html) with\n [Generalized Advantage Estimation](https:\u002F\u002Fnn.labml.ai\u002Frl\u002Fppo\u002Fgae.html)\n* [Deep Q Networks](https:\u002F\u002Fnn.labml.ai\u002Frl\u002Fdqn\u002Findex.html) with\n with [Dueling Network](https:\u002F\u002Fnn.labml.ai\u002Frl\u002Fdqn\u002Fmodel.html),\n [Prioritized Replay](https:\u002F\u002Fnn.labml.ai\u002Frl\u002Fdqn\u002Freplay_buffer.html)\n and Double Q Network.\n\n#### ✨ [Optimizers](https:\u002F\u002Fnn.labml.ai\u002Foptimizers\u002Findex.html)\n* [Adam](https:\u002F\u002Fnn.labml.ai\u002Foptimizers\u002Fadam.html)\n* [AMSGrad](https:\u002F\u002Fnn.labml.ai\u002Foptimizers\u002Famsgrad.html)\n* [Adam Optimizer with warmup](https:\u002F\u002Fnn.labml.ai\u002Foptimizers\u002Fadam_warmup.html)\n* [Noam Optimizer](https:\u002F\u002Fnn.labml.ai\u002Foptimizers\u002Fnoam.html)\n* [Rectified Adam Optimizer](https:\u002F\u002Fnn.labml.ai\u002Foptimizers\u002Fradam.html)\n* [AdaBelief Optimizer](https:\u002F\u002Fnn.labml.ai\u002Foptimizers\u002Fada_belief.html)\n* [Sophia-G Optimizer](https:\u002F\u002Fnn.labml.ai\u002Foptimizers\u002Fsophia.html)\n\n#### ✨ [Normalization Layers](https:\u002F\u002Fnn.labml.ai\u002Fnormalization\u002Findex.html)\n* [Batch Normalization](https:\u002F\u002Fnn.labml.ai\u002Fnormalization\u002Fbatch_norm\u002Findex.html)\n* [Layer Normalization](https:\u002F\u002Fnn.labml.ai\u002Fnormalization\u002Flayer_norm\u002Findex.html)\n* [Instance Normalization](https:\u002F\u002Fnn.labml.ai\u002Fnormalization\u002Finstance_norm\u002Findex.html)\n* [Group Normalization](https:\u002F\u002Fnn.labml.ai\u002Fnormalization\u002Fgroup_norm\u002Findex.html)\n* [Weight Standardization](https:\u002F\u002Fnn.labml.ai\u002Fnormalization\u002Fweight_standardization\u002Findex.html)\n* [Batch-Channel Normalization](https:\u002F\u002Fnn.labml.ai\u002Fnormalization\u002Fbatch_channel_norm\u002Findex.html)\n* [DeepNorm](https:\u002F\u002Fnn.labml.ai\u002Fnormalization\u002Fdeep_norm\u002Findex.html)\n\n#### ✨ [Distillation](https:\u002F\u002Fnn.labml.ai\u002Fdistillation\u002Findex.html)\n\n#### ✨ [Adaptive Computation](https:\u002F\u002Fnn.labml.ai\u002Fadaptive_computation\u002Findex.html)\n\n* [PonderNet](https:\u002F\u002Fnn.labml.ai\u002Fadaptive_computation\u002Fponder_net\u002Findex.html)\n\n#### ✨ [Uncertainty](https:\u002F\u002Fnn.labml.ai\u002Funcertainty\u002Findex.html)\n\n* [Evidential Deep Learning to Quantify Classification Uncertainty](https:\u002F\u002Fnn.labml.ai\u002Funcertainty\u002Fevidence\u002Findex.html)\n\n#### ✨ [Activations](https:\u002F\u002Fnn.labml.ai\u002Factivations\u002Findex.html)\n\n* [Fuzzy Tiling Activations](https:\u002F\u002Fnn.labml.ai\u002Factivations\u002Ffta\u002Findex.html)\n\n#### ✨ [Langauge Model Sampling Techniques](https:\u002F\u002Fnn.labml.ai\u002Fsampling\u002Findex.html)\n* [Greedy Sampling](https:\u002F\u002Fnn.labml.ai\u002Fsampling\u002Fgreedy.html)\n* [Temperature Sampling](https:\u002F\u002Fnn.labml.ai\u002Fsampling\u002Ftemperature.html)\n* [Top-k Sampling](https:\u002F\u002Fnn.labml.ai\u002Fsampling\u002Ftop_k.html)\n* [Nucleus Sampling](https:\u002F\u002Fnn.labml.ai\u002Fsampling\u002Fnucleus.html)\n\n#### ✨ [Scalable Training\u002FInference](https:\u002F\u002Fnn.labml.ai\u002Fscaling\u002Findex.html)\n* [Zero3 memory optimizations](https:\u002F\u002Fnn.labml.ai\u002Fscaling\u002Fzero3\u002Findex.html)\n\n### Installation\n\n```bash\npip install labml-nn\n```\n","该项目是60多个深度学习论文的PyTorch实现和教程集合，附带并排注释。核心功能包括多种Transformer架构（如原始Transformer、XL、Switch等）、优化器（如Adam、AdaBelief等）、生成对抗网络（如CycleGAN、StyleGAN2等）以及强化学习算法（如PPO、DQN）。项目采用Python语言编写，支持PyTorch框架，并通过详细的旁注帮助理解复杂的神经网络结构与相关算法。适用于希望深入理解和实践最新深度学习技术的研究人员、开发者及学生。MIT许可证授权。",2,"2026-06-11 02:36:13","top_all"]