[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71123":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":16,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":16,"starSnapshotCount":16,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},71123,"lit-llama","Lightning-AI\u002Flit-llama","Lightning-AI","Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.","",null,"Python",6079,518,66,100,0,64.15,"Apache License 2.0",false,"main",[],"2026-06-12 04:00:59","\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fpl-public-data.s3.amazonaws.com\u002Fassets_lightning\u002FLit_LLaMA_Badge3x.png\" alt=\"Lit-LLaMA\" width=\"128\"\u002F>\n\n# ⚡ Lit-LLaMA ️\n\n![cpu-tests](https:\u002F\u002Fgithub.com\u002Flightning-AI\u002Flit-llama\u002Factions\u002Fworkflows\u002Fcpu-tests.yml\u002Fbadge.svg) [![Build Status](https:\u002F\u002Fdev.azure.com\u002FLightning-AI\u002Flit%20Models\u002F_apis\u002Fbuild\u002Fstatus%2FLightning-AI.lit-LLaMA?branchName=main)](https:\u002F\u002Fdev.azure.com\u002FLightning-AI\u002Flit%20Models\u002F_build\u002Flatest?definitionId=49&branchName=main) [![license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-blue.svg)](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flit-llama\u002Fblob\u002Fmaster\u002FLICENSE) [![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1077906959069626439?style=plastic)](https:\u002F\u002Fdiscord.gg\u002FVptPCZkGNa)\n\n\u003Cpre>\n\u003Cb>⚠️ Warning: Not Actively Maintained\u003C\u002Fb>\n\nThis repository is no longer actively maintained. For a more up-to-date alternative, please visit the LitGPT project:\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flitgpt\">https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flitgpt\u003C\u002Fa>, which serves as the successor to this repository.\n\nFeel free to explore, reuse, or fork, but be aware that no further updates or support will be provided.\n\u003C\u002Fpre>\n\n\u003Cimg src=\"https:\u002F\u002Fpl-public-data.s3.amazonaws.com\u002Fassets_lightning\u002FLlama_pineapple.gif\" alt=\"Lit-LLaMA and pineapple pizza\" width=\"500px\"\u002F>\n\n\u003C\u002Fdiv>\n\n# ⚡ Lit-LLaMA ️\nIndependent implementation of [LLaMA](\u003Chttps:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama>) pretraining, finetuning, and inference code that is fully open source under the **Apache 2.0 license.**\n\nThis implementation builds on [nanoGPT](\u003Chttps:\u002F\u002Fgithub.com\u002Fkarpathy\u002FnanoGPT>).\n\nThe open-source code in this repository works with the original LLaMA weights that are distributed by Meta under a [research-only license](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama\u002Fblob\u002Fmain\u002FMODEL_CARD.md#model-details).\n\n## Looking for LLaMA 2?\n\nMeta AI has since released LLaMA 2. Additionally, new Apache 2.0 licensed weights are being released as part of the [Open LLaMA project](https:\u002F\u002Fgithub.com\u002Fopenlm-research\u002Fopen_llama).\n\nTo run LLaMA 2 weights, Open LLaMA weights, or Vicuna weights (among other LLaMA-like checkpoints), **check out the [Lit-GPT repository](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flit-gpt)**.\n\n## Why?\n\nWe believe that AI should be fully open source and part of the collective knowledge.\n\nThe original [LLaMA code](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama) is [GPL licensed](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama\u002Fblob\u002Fmain\u002FLICENSE) which means any project using it must also be released under GPL.\n\nThis \"taints\" any other code and prevents integration with the rest of the ecosystem.\n\n**Lit-LLaMA solves that for good.**\n\n&nbsp;\n\n## Design principles\n**Lit-LLaMA** is:\n\n- **Simple:** Single-file implementation without boilerplate.\n- **Correct:** Numerically equivalent to the original model.\n- **Optimized:** Runs on consumer hardware or at scale.\n- **Open-source:** No strings attached.\n\n## Get involved!\n[Join our Discord](https:\u002F\u002Fdiscord.gg\u002FVptPCZkGNa) to build high-performance, truly open-source models for the common benefit of the community.\n\n&nbsp;\n\n## Setup\n\nClone the repo\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flit-llama\ncd lit-llama\n```\n\ninstall dependencies\n\n```bash\npip install -e \".[all]\"\n```\n\nYou are all set! 🎉\n\n&nbsp;\n\n## Use the model\n\nTo generate text predictions, you need to download the model weights. **If you don't have them, check out our [guide](howto\u002Fdownload_weights.md).**\n\nRun inference:\n\n```bash\npython generate.py --prompt \"Hello, my name is\"\n```\n\nThis will run the 7B model and require ~26 GB of GPU memory (A100 GPU).\n\n[Full guide for generating samples from the model](howto\u002Finference.md).\n\n### Run Lit-LLaMA on consumer devices\n\nOn GPUs with `bfloat16` support, the `generate.py` script will automatically convert the weights and consume about ~14 GB.\nFor GPUs with less memory, or ones that don't support `bfloat16`, enable quantization (`--quantize llm.int8`):\n\n```bash\npython generate.py --quantize llm.int8 --prompt \"Hello, my name is\"\n```\n\nSee `python generate.py --help` for more options.\n\nYou can also use GPTQ-style int4 quantization, but this needs conversions of the weights first:\n\n```bash\npython quantize\u002Fgptq.py --output_path checkpoints\u002Flit-llama\u002F7B\u002Fllama-gptq.4bit.pth --dtype bfloat16 --quantize gptq.int4\n```\n\nGPTQ-style int4 quantization brings GPU usage down to about ~5GB. As only the weights of the Linear layers are quantized, it is useful to also use `--dtype bfloat16` even with the quantization enabled.\n\nWith the generated quantized checkpoint generation quantization then works as usual with `--quantize gptq.int4` and the newly generated checkpoint file:\n\n```bash\npython generate.py --quantize gptq.int4 --checkpoint_path checkpoints\u002Flit-llama\u002F7B\u002Fllama-gptq.4bit.pth\n```\n\n[Full guide for generating samples from the model](howto\u002Finference.md).\n\n## Finetune the model\n\nWe provide a simple training scripts in `finetune\u002Flora.py` and `finetune\u002Fadapter.py` that instruction-tunes a pretrained model on the [Alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca) dataset using the techniques of [LoRA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.09685) and [Adapter](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.16199).\n\n1. Download the data and generate a instruction tuning dataset:\n\n   ```bash\n   python scripts\u002Fprepare_alpaca.py\n   ```\n\n2. Run the finetuning script\n\n   ```bash\n   python finetune\u002Flora.py\n   ```\n   or \n   ```bash\n   python finetune\u002Fadapter.py\n   ```\n\nIt is expected that you have downloaded the pretrained weights as described above.\nThe finetuning requires at least one GPU with ~24 GB memory (RTX 3090). Follow the instructions in the script to efficiently fit your GPU memory.\nNote: For some GPU models you might need to set `torch.backends.cuda.enable_flash_sdp(False)` (see comments at the top of the script).\n\nMore details about each finetuning method and how you can apply it to your own data can be found in our technical how-to guides.\n\n### Finetuning How-To Guides\n\nThese technical tutorials illustrate how to run the finetuning code.\n\n- [Finetune with LoRA](howto\u002Ffinetune_lora.md)\n- [Finetune with Adapters](howto\u002Ffinetune_adapter.md)\n\n### Understanding Finetuning -- Conceptual Tutorials\n\nLooking for conceptual tutorials and explanations? We have some additional articles below:\n\n- [Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to LLaMA-Adapters](https:\u002F\u002Flightning.ai\u002Fpages\u002Fcommunity\u002Farticle\u002Funderstanding-llama-adapters\u002F)\n\n## Pre-training\n\nWe provide a simple training script based on Fabric if you want to venture into pre-training on RedPajama, a reproduction of the original LLaMA dataset.\nConversion scripts for our optimized streaming `PackedDataset` are included.\n\nFollow this guide to start pre-training on the RedPajama dataset:\n\n- [Pretrain on RedPajama](howto\u002Ftrain_redpajama.md)\n\n## Get involved!\n\nWe are on a quest towards fully open source AI.\n\n\u003Cimg align=\"right\" src=\"https:\u002F\u002Fpl-public-data.s3.amazonaws.com\u002Fassets_lightning\u002FLit_LLaMA_Illustration3x.png\" alt=\"Lit-LLaMA\" width=\"128\"\u002F>\n\nJoin us and start contributing, especially on the following areas:\n\n- [ ] [Pre-training](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flit-llama\u002Flabels\u002Fpre-training)\n- [ ] [Fine-tuning (full and LoRA)](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flit-llama\u002Flabels\u002Ffine-tuning)\n- [ ] [Quantization](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flit-llama\u002Flabels\u002Fquantization)\n- [ ] [Sparsification](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flit-llama\u002Flabels\u002Fsparsification)\n\nLook at `train.py` for a starting point towards pre-training \u002F fine-tuning using [Lightning Fabric](https:\u002F\u002Flightning.ai\u002Fdocs\u002Ffabric\u002Fstable\u002F).\n\nWe welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment. \n\nUnsure about contributing? Check out our [Contributing to Lit-LLaMA: A Hitchhiker’s Guide to the Quest for Fully Open-Source AI](https:\u002F\u002Flightning.ai\u002Fpages\u002Fcommunity\u002Ftutorial\u002Fcontributing-to-lit-llama-a-hitchhikers-guide-to-the-quest-for-fully-open-source-ai\u002F) guide.\n\nDon't forget to [join our Discord](https:\u002F\u002Fdiscord.gg\u002FVptPCZkGNa)!\n\n## Acknowledgements\n\n- [@karpathy](https:\u002F\u002Fgithub.com\u002Fkarpathy) for [nanoGPT](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002FnanoGPT)\n- [@FacebookResearch](https:\u002F\u002Fgithub.com\u002Ffacebookresearch) for the original [LLaMA implementation](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama)\n- [@TimDettmers](https:\u002F\u002Fgithub.com\u002FTimDettmers) for [bitsandbytes](https:\u002F\u002Fgithub.com\u002FTimDettmers\u002Fbitsandbytes)\n- [@Microsoft](https:\u002F\u002Fgithub.com\u002Fmicrosoft) for [LoRA](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FLoRA)\n- [@IST-DASLab](https:\u002F\u002Fgithub.com\u002FIST-DASLab) for [GPTQ](https:\u002F\u002Fgithub.com\u002FIST-DASLab\u002Fgptq)\n\n## License\n\nLit-LLaMA is released under the [Apache 2.0](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flightning-llama\u002Fblob\u002Fmain\u002FLICENSE) license.\n","Lit-LLaMA 是一个基于nanoGPT实现的LLaMA语言模型，支持闪存注意力、Int8和GPTQ 4位量化、LoRA和LLaMA-Adapter微调以及预训练。该项目采用Python编写，具有简洁的单文件实现，并且在数值上与原始模型等效，同时优化了在消费级硬件或大规模环境下的运行性能。适用于需要开源许可（Apache 2.0）下进行预训练、微调及推理的语言模型应用场景。不过需要注意的是，此仓库已不再积极维护，建议关注其继任者项目LitGPT。",2,"2026-06-11 03:36:00","high_star"]