[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71031":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":22,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":16,"starSnapshotCount":16,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},71031,"LMFlow","OptimalScale\u002FLMFlow","OptimalScale","An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.","https:\u002F\u002Foptimalscale.github.io\u002FLMFlow\u002F",null,"Python",8488,828,65,77,0,4,67.16,"Apache License 2.0",false,"main",true,[24,25,26,27,28,29,30],"chatgpt","deep-learning","instruction-following","language-model","pretrained-models","pytorch","transformer","2026-06-12 04:00:58","\u003Cp align=\"center\" width=\"50%\">\n\u003Cimg src=\"docs\u002Fassets\u002Flogo.png\" alt=\"LMFlow\" style=\"width: 50%; min-width: 200px; display: block; margin: auto; background-color: transparent;\">\n\u003C\u002Fp>\n\n# LMFlow\n\n\u003Ch4 align=\"center\">\n    \u003Cp>\n        \u003Cb>English\u003C\u002Fb> |\n        \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fdocs\u002Freadme\u002FREADME_zh-hans.md\">简体中文\u003C\u002Fa> |\n        \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fdocs\u002Freadme\u002FREADME_es.md\">Español\u003C\u002Fa> |\n        \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fdocs\u002Freadme\u002FREADME_jp.md\">日本語\u003C\u002Fa> |\n        \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fdocs\u002Freadme\u002FREADME_ko.md\">한국어\u003C\u002Fa> |\n        \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fdocs\u002Freadme\u002FREADME_hindi.md\">हिंदी\u003C\u002Fa>\n    \u003Cp>\n\u003C\u002Fh4>\n\n[![Website](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-Demo-20B2AA.svg)](https:\u002F\u002Flmflow.com)\n[![Code License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode%20License-Apache_2.0-green.svg)](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002FLICENSE)\n[![Python 3.9+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.9+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-390\u002F)\n[![Doc](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-Doc-ff69b4.svg)](https:\u002F\u002Foptimalscale.github.io\u002FLMFlow\u002F)\n[![Embark](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-LMFlow-%237289da.svg?logo=discord)](https:\u002F\u002Fdiscord.gg\u002Fu9VJNpzhvA)\n[![slack badge](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-Join-blueviolet?logo=slack&amp)](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Flmflow\u002Fshared_invite\u002Fzt-1wju9nicy-woXbNtS~5MavHSAtiMxmxQ)\n[![WeChat badge](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-Join-brightgreen?logo=wechat&amp)](https:\u002F\u002Fibb.co\u002FZhM4hhn)\n\nAn extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community.\n\n\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"docs\u002Fassets\u002Ffeatures.png\" alt=\"LMFlow-features\" style=\"width: 100%; min-width: 300px; display: block; margin: auto;\">\n\u003C\u002Fp>\n\n## Latest News\n> [!IMPORTANT]\n> * :exclamation: [2025-07-09] We have a major update to LMFlow with full Accelerate support and extensive streamlining. If you're looking for the previous version, please use `git checkout v0.0.10`, or check out the [v0.0.10 branch](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Ftree\u002Fv0.0.10). View all releases [here](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Ftags).\n\n* [2024-12-02] Support [Hymba](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fhymba), a new family of small language models featuring a hybrid-head parallel architecture. Check out [Post-training Hymba](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Ftree\u002Fmain\u002Fexperimental\u002FHymba) for more details.\n* [2024-07-01] 🏆 LMFlow receives the [**Best Demo Paper Award**](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002F1TVDooAZqkNObz5ysVhDFtqnnVHR-u8wqYvgix-gzPMs\u002Fedit#slide=id.g2e55907bbcc_0_70) at **NAACL 2024**! 🎉\n* [2024-06-30] Expanding Optimization Options! We now support custom optimizer training with a variety of optimizers. Dive into the details and try out the new features with our updated script at [custom_optimizers](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fscripts\u002Frun_finetune_with_custom_optim.sh).\n* [2024-04-25] :rocket: Support conversation template! We've preset the latest [Llama-3](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3-70B) and [Phi-3](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FPhi-3-mini-128k-instruct) conversation templates as well as some frequently used templates such as `chatml` (see all templates [here](https:\u002F\u002Foptimalscale.github.io\u002FLMFlow\u002Fexamples\u002FDATASETS.html#conversation-template)), and we are working on adding more preset templates. Adding corresponding `--conversation_template` in the shell script and you are all set! :rocket:\n\n\u003Cdetails> \u003Csummary>More news...\u003C\u002Fsummary>\n\n* [2024-03-27] Support [LISA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.17919), enabling 7B training in 24G memory without offloading! \n* [2023-09-11] Support [speculative decoding](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.17192). Check out [speculative_decoding](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fscripts\u002Fspeculative_decoding\u002FREADME.md) for the usage and acceleration details.\n* [2023-08-14] Support long context inference with position interpolation (Linear & NTK scaling ) for LLaMA models. Check out [postion_interpolation](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Freadme\u002FPosition_Interpolation.md) for more details.\n* [2023-08-07] Support [Flash Attention-2](https:\u002F\u002Fcrfm.stanford.edu\u002F2023\u002F07\u002F17\u002Fflash2.html). Check out [flash_attention](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Freadme\u002Fflash_attn2.md) for more details.\n* [2023-08-02] Support [Llama2](https:\u002F\u002Fai.meta.com\u002Fllama\u002F), [ChatGLM2](https:\u002F\u002Fhuggingface.co\u002FTHUDM\u002Fchatglm2-6b), and [Baichuan](https:\u002F\u002Fhuggingface.co\u002Fbaichuan-inc\u002FBaichuan-7B) models.\n* [2023-07-23] [LMFlow multimodal chatbot](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fscripts\u002Frun_vis_chatbot_gradio_minigpt4.sh) is now available! Support multimodal inputs of images and texts. [Online Demo](http:\u002F\u002Fmultimodal.lmflow.online) is also provided (We hold the service on a single GPU, hence one may experience \"queuing\" or \"application busy\" sometimes when multiple users are accessing at the same time, please wait and attempt again later when such event happens)![image](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Frpan-vision-encoder\u002Fdocs\u002Fassets\u002Fmultimodal-chatbot-demo.gif)\n* [2023-06-22]  [LMFlow paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.12420) is out! Check out our implementation details at https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.12420\n* [2023-06-16] Our finetuned Robin-33B-V2 scored an impressive 64.1 on the Huggingface LLM leaderboard in our offline evaluation, outperforming major open-source LLMs! All checkpoints (7B, 13B, 33B, and 65B) are [released](https:\u002F\u002Fhuggingface.co\u002FOptimalScale)! Checkout the performance [here](https:\u002F\u002Fmedium.com\u002F@hkust.ml\u002Frobin-v2-launches-achieves-unparalleled-performance-on-openllm-4f6886e822c1).\n* [2023-06-07] LMFlow is now officially available on PyPI! Install it with `pip install lmflow-finetune`!\n* [2023-05-30] Release [Robin-13B-v2](https:\u002F\u002Fhuggingface.co\u002FOptimalScale\u002Frobin-13b-v2-delta) and [Robin-33B-v2](https:\u002F\u002Fhuggingface.co\u002FOptimalScale\u002Frobin-33b-v2-delta)!\n\n* [2023-05-15] Release [LMFlow-data](http:\u002F\u002Flmflow.org:5000\u002Flmflow_data.tar.gz), the training dataset of Robin-7B-v2. A new [test data](http:\u002F\u002Flmflow.org:5000\u002Flmflow_chat_en_dialog_multiturn_single_nll_text2text.tar.gz) is also released.\n* [2023-05-09] Release [Robin-7B-v2](http:\u002F\u002Flmflow.org:5000\u002Frobin-7b-v2-delta.tar.gz), achieving competitive performance on chitchat, commonsense reasoning and instruction-following tasks. Refer to our [comprehensive study](https:\u002F\u002Fmedium.com\u002F@hkust.ml\u002Flmflow-benchmark-an-automatic-evaluation-framework-for-open-source-llms-ef5c6f142418).\n* [2023-05-08] Release [LMFlow Benchmark](https:\u002F\u002Fmedium.com\u002F@hkust.ml\u002Flmflow-benchmark-an-automatic-evaluation-framework-for-open-source-llms-ef5c6f142418), an automatic evaluation framework for open-source chat-style LLMs. [Benchmark results](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1JYh4_pxNzmNA9I0YM2epgRA7VXBIeIGS64gPJBg5NHA\u002Fedit#gid=0) on 31 popular models are reported. [Participate in LMFlow Benchmark](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow#33-lmflow-benchmark).\n* [2023-04-21] Release [Robin-7B](http:\u002F\u002Flmflow.org:5000\u002Frobin-7b.tar.gz) (based on LLaMA-7B), and two models for commercial use: Parakeets-2.7B (based on GPT-NEO-2.7B) and Cokatoo-7B (based on StableLM-7B) [Download here](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Ftree\u002Fmain#model-zoo)\n* [2023-04-15] Inference: Support streaming output and ChatGLM.\n* [2023-04-10] We propose a new alignment algorithm: [Reward rAnked FineTuning (RAFT)](https:\u002F\u002Foptimalscale.github.io\u002FLMFlow\u002Fexamples\u002Fraft.html), which is more efficient than conventional (PPO-based) RLHF. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.06767)]\n* [2023-04-02] [Web service](https:\u002F\u002Flmflow.com\u002F) is online!\n* [2023-04-01] Release three instruction-tuned checkpoints and three medical checkpoints in [model zoo](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow#model-zoo): LLaMA-7B-tuned, LLaMA-13B-tuned, LLaMA-33B-tuned, LLaMA-7B-medical, LLaMA-13B-medical, and LLaMA-33B-medical.\n* [2023-03-27] Support full tuning and lora tuning for all decoder models.\n* [2023-03-27] [Tasked tuned model beats ChatGPT on medical domain](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow#model-performance).\n* [2023-03-27] Release code and checkpoints - [version 0.0.1](https:\u002F\u002Foptimalscale.github.io\u002FLMFlow\u002F)! [Our tasked-tuned model beats ChatGPT on medical domain](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow#model-performance).\n\n\u003C\u002Fdetails>\n\n## Table of Contents\n\n- [LMFlow](#lmflow)\n  - [Latest News](#latest-news)\n  - [Table of Contents](#table-of-contents)\n  - [Quick Start](#quick-start)\n    - [Setup](#setup)\n    - [Prepare Dataset](#prepare-dataset)\n    - [Finetuning](#finetuning)\n      - [Estimated Hardware Requirement](#estimated-hardware-requirement)\n      - [Full Finetuning](#full-finetuning)\n      - [LISA](#lisa)\n      - [LoRA](#lora)\n    - [Inference](#inference)\n    - [Deployment](#deployment)\n    - [Evaluation](#evaluation)\n  - [Supported Features](#supported-features)\n  - [Support](#support)\n  - [License](#license)\n  - [Citation](#citation)\n\n\n## Quick Start\n\n### Setup\n\nOur package has been tested on Linux OS (Ubuntu 20.04). Other OS platforms (MacOS, Windows) are not fully tested, where you may encounter unexpected errors. If you are using LMFlow for the first time, we recommend you to try on a Linux machine or Google Colab.\n\n```bash\ngit clone -b v1.0.0 https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow.git\ncd LMFlow\nconda create -n lmflow python=3.9 -y\nconda activate lmflow\nconda install mpi4py\npip install -e .\n```\n\n\u003Cdetails>\u003Csummary> Looking for a previous version? \u003C\u002Fsummary>\n\n```bash\ngit clone -b v0.0.10 https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow.git\ncd LMFlow\nconda create -n lmflow python=3.9 -y\nconda activate lmflow\nconda install mpi4py\npip install -e .\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary> For CUDA versions 10.3-11.7 \u003C\u002Fsummary>\n\n```bash\ngit clone -b v0.0.5 https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow.git\ncd LMFlow\nconda create -n lmflow python=3.9 -y\nconda activate lmflow\nconda install mpi4py\npip install -e .\n```\n\n\u003C\u002Fdetails>\n\n> [!TIP]\n> We use WandB to track and visualize the training process by default. Before running the training scripts, users may need to log in to WandB using the command: \n>\n>```bash\n>wandb login\n>```\n>\n> For detailed instructions, refer to the [WandB Quickstart Guide](https:\u002F\u002Fdocs.wandb.ai\u002Fquickstart\u002F). Step 1 (registration) and Step 2 (login using your WandB API key) should be sufficient to set up your environment.\n>\n> \u003Cdetails>\u003Csummary>Disabling wandb\u003C\u002Fsummary>  \n>\n> One can disable wandb by either:  \n>\n> 1. Adding environment variable before running the training command.\n>\n>```bash\n>export WANDB_MODE=disabled\n>```\n>\n> 2. OR, specifying the integrations to report the results and logs to. In the training script, add:\n>\n>```bash\n>--report_to none \\\n>```\n>\n> \u003C\u002Fdetails>\n\n### Prepare Dataset\n\nPlease refer to our [doc](https:\u002F\u002Foptimalscale.github.io\u002FLMFlow\u002Fexamples\u002FDATASETS.html).\n\n### Finetuning\n\n#### Estimated Hardware Requirement\n\n| Method                 | 0.5B |  3B  |  7B  |  14B  |  30B  |  70B  |  `x`B   |\n| ---------------------- | ---- | ---- | ---- | ----- | ----- | ----- | ------- |\n| Full `bf16`\u002F`fp16`     |  9GB | 55GB |120GB | 240GB | 600GB | 1200GB| `18x`GB |\n| LoRA                   |  1GB | 6GB  | 16GB |  32GB |  64GB | 160GB |  `2x`GB |\n| QLoRA `quant_bit=8`    | 0.7GB| 3GB  | 10GB |  20GB |  40GB |   80GB|  `x`GB  |\n| QLoRA `quant_bit=4`    | 0.4GB| 1.5GB|  6GB |  12GB |  24GB |   48GB| `x\u002F2`GB |\n\n\n#### Full Finetuning\n\nFull training updates all the parameters to finetune a language model.\nHere is an example to finetune a GPT-2 base model.\n\n```sh\ncd data && .\u002Fdownload.sh alpaca && cd -\n\nbash .\u002Fscripts\u002Frun_finetune.sh \\\n  --model_name_or_path gpt2 \\\n  --dataset_path data\u002Falpaca\u002Ftrain_conversation \\\n  --output_model_path output_models\u002Ffinetuned_gpt2\n```\n\n> [!TIP]\n> For conversation dataset, specify a conversation template for better performance by adding `--conversation_template` to the command.\n>\n> \u003Cdetails>\u003Csummary>Llama-3-8B conversation dataset example\u003C\u002Fsummary>  \n>\n>```bash\n>cd data && .\u002Fdownload.sh alpaca && cd -\n>\n>bash .\u002Fscripts\u002Frun_finetune.sh \\\n>  --model_name_or_path meta-llama\u002FMeta-Llama-3-8B \\\n>  --dataset_path data\u002Falpaca\u002Ftrain_conversation \\\n>  --conversation_template llama3 \\\n>  --output_model_path output_models\u002Ffinetuned_llama3_8b\n>```\n>\n> \u003C\u002Fdetails>\n\n#### LISA\n\n[LISA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.17919) is a memory-efficient finetuning algorithm that allows tradeoff between memory and the number of randomly unfreezed layers. This script currently is only tested in single gpus. Please stay tuned for our latest updates :smile:\n\n```sh\ncd data && .\u002Fdownload.sh alpaca && cd -\n\nbash .\u002Fscripts\u002Frun_finetune_with_lisa.sh \\\n  --model_name_or_path meta-llama\u002FLlama-2-7b-hf \\\n  --dataset_path data\u002Falpaca\u002Ftrain_conversation \\\n  --output_model_path output_models\u002Ffinetuned_llama2_7b \\\n  --lisa_activated_layers 1 \\\n  --lisa_interval_steps 20\n```\n\n> [!TIP]\n> \u003Cdetails>\u003Csummary>Llama-2-7B conversation dataset example\u003C\u002Fsummary>  \n>\n>```bash\n>cd data && .\u002Fdownload.sh alpaca && cd -\n>\n>bash .\u002Fscripts\u002Frun_finetune_with_lisa.sh \\\n>  --model_name_or_path meta-llama\u002FLlama-2-7b-hf \\\n>  --dataset_path data\u002Falpaca\u002Ftrain_conversation \\\n>  --conversation_template llama2 \\\n>  --output_model_path output_models\u002Ffinetuned_llama2_7b_lisa \\\n>  --lisa_activated_layers 1 \\\n>  --lisa_interval_steps 20\n>```\n>\n> \u003C\u002Fdetails>\n\n#### LoRA\n\nLoRA is a parameter-efficient finetuning algorithm and is more efficient than full finetuning.\n\n```sh\ncd data && .\u002Fdownload.sh alpaca && cd -\n\nbash .\u002Fscripts\u002Frun_finetune_with_lora.sh \\\n  --model_name_or_path facebook\u002Fgalactica-1.3b \\\n  --dataset_path data\u002Falpaca\u002Ftrain_conversation \\\n  --output_lora_path output_models\u002Ffinetuned_galactica_lora\n```\n\n> [!TIP]\n> \u003Cdetails>\u003Csummary>Llama-2-7B conversation dataset example\u003C\u002Fsummary>  \n>\n>```bash\n>cd data && .\u002Fdownload.sh alpaca && cd -\n>\n>bash .\u002Fscripts\u002Frun_finetune_with_lora.sh \\\n>  --model_name_or_path meta-llama\u002FLlama-2-7b-hf \\\n>  --dataset_path data\u002Falpaca\u002Ftrain_conversation \\\n>  --conversation_template llama2 \\\n>  --output_model_path output_models\u002Ffinetuned_llama2_7b_lora \\\n>```\n>\n> \u003C\u002Fdetails>\n>\n> \u003Cdetails>\u003Csummary>Merge LoRA Weight\u003C\u002Fsummary>\n>\n>Merge LoRA weight and the base model into one using:  \n>\n>```sh\n>bash .\u002Fscripts\u002Frun_merge_lora.sh \\\n>  --model_name_or_path Qwen\u002FQwen1.5-1.8B \\\n>  --lora_model_path output_models\u002Flora \\\n>  --output_model_path output_models\u002Flora_merged \\\n>```\n>\n>\u003C\u002Fdetails>\n\n### Inference\n\nAfter finetuning, you can run the following command to chat with the model.\n```sh\nbash .\u002Fscripts\u002Frun_chatbot.sh output_models\u002Ffinetuned_gpt2\n```\n\n> [!TIP]\n> We recommend using SGLang for faster batch inference.\n>\n> \u003Cdetails>\u003Csummary>Faster inference using SGLang\u003C\u002Fsummary>  \n>\n>```bash\n>bash .\u002Fscripts\u002Frun_sglang_inference.sh\n>```\n> Note: If you encounter error ModuleNotFoundError: No module named 'common_ops' when using SGLang, please try `apt-get update` and then `apt install numactl`. \n> \u003C\u002Fdetails>\n\n### Deployment\n\nIf you want to deploy your own model locally, we provide a gradio-based UI for building chatbots. \nRunning the following command will launch the demo for robin-7b:\n\n```sh\npip install gradio\npython .\u002Fexamples\u002Fchatbot_gradio.py --deepspeed configs\u002Fds_config_chatbot.json --model_name_or_path YOUR-LLAMA  --lora_model_path .\u002Frobin-7b --prompt_structure \"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: {input_text}###Assistant:\"       --end_string \"#\" --max_new_tokens 200\n```\n\n### Evaluation\n\nWe recommend using [LM Evaluation Harness](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) for most evaluation purposes.\n\n## Supported Features\n\n\u003Cdetails> \u003Csummary>Finetune Acceleration & Memory Optimization\u003C\u002Fsummary>\n\n* LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning\n  \n  LISA is a novel and memory-efficient training strategy for large language models that outperforms existing methods like LoRA by selectively freezing layers during optimization. Check out [LISA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.17919) for more details.  \n  In LMFLow, activate LISA using `--use_lisa 1` in your training command. Control the number of activation layers with `--lisa_activated_layers 2`, and adjust the freezing layers interval using `--lisa_step_interval 20`. \n\n* LoRA\n  \n  LoRA is a parameter-efficient finetuning algorithm and is more efficient than full finetuning. Check out [finetuning-lora](#finetuning-lora) for more details.\n\n* FlashAttention\n\n  LMFlow supports both FlashAttention-1 and the latest FlashAttention-2. Check out [flash_attention](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Freadme\u002Fflash_attn2.md) for more details.\n\n* Gradient Checkpointing\n  \n  [Gradient checkpointing](https:\u002F\u002Fgithub.com\u002Fcybertronai\u002Fgradient-checkpointing) is a memory optimization technique that trades compute for memory.\n  It is useful when the model is too large to fit into GPU memory. \n  Use it by just adding `--gradient_checkpointing` to your training command.\n\n* Deepspeed Zero3\n  \n  LMFlow supports [Deepspeed Zero-3 Offload](https:\u002F\u002Fwww.deepspeed.ai\u002F2021\u002F03\u002F07\u002Fzero3-offload.html). \n  We provide an example [deepspeed config](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fconfigs\u002Fds_config_zero3.json), and you can directly use it.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails> \u003Csummary>Inference Acceleration\u003C\u002Fsummary>\n\n* LLaMA Inference on CPU\n\n  Thanks to the great efforts of [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp). It is possible for everyone to run their LLaMA models on CPU by 4-bit quantization. We provide a script to convert LLaMA LoRA weights to `.pt` files. You only need to use `convert-pth-to-ggml.py` in llama.cpp to perform quantization.\n\n* FlashAttention\n\n  LMFlow supports both FlashAttention-1 and the latest FlashAttention-2. Check out [flash_attention](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Freadme\u002Fflash_attn2.md) for more details.\n\n* vLLM\n\n  Try vLLM for fast and easy-to-use LLM inference and serving. Thanks for the [great work](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)!\n\n\u003C\u002Fdetails>\n\n\u003Cdetails> \u003Csummary>Long Context\u003C\u002Fsummary>\n\n* Position Interpolation for LLaMA Models\n\n  Now LMFlow supports the latest Linear & NTK (Neural Kernel theory) scaling techniques for LLaMA models. Check out [postion_interpolation](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Freadme\u002FPosition_Interpolation.md) for more details.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails> \u003Csummary>Model Customization\u003C\u002Fsummary>\n\n* Vocabulary Extension\n\n  Now you can train your own sentencepiece tokenizer and merge it with model's origin hf tokenizer. Check out [vocab_extension](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fscripts\u002Fvocab_extension) for more details.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails> \u003Csummary>Multimodal\u003C\u002Fsummary>\n\n* Multimodal Chatbot\n\n  LMFlow supports multimodal inputs of images and texts. Check out our [LMFlow multimodal chatbot](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fscripts\u002Frun_vis_chatbot_gradio_minigpt4.sh).\n\n\u003C\u002Fdetails>\n\n\u003Cdetails> \u003Csummary>Custom Optimization\u003C\u002Fsummary>\n\n* Custom Optimization\n\n  LMFlow now supports custom optimizer training with a variety of optimizers. Elevate your model's performance with tailored optimization strategies. Dive into the details and try out the new features with our updated script at [custom_optimizers](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fscripts\u002Frun_finetune_with_custom_optim.sh).\n\n  The following table evaluates the performance of custom optimizers in the fine-tuning process of GPT-2 on the Alpaca dataset, emphasizing their individual impacts on the training loss. The specific hyperparameter settings utilize default configurations, which can be customized and adjusted at [custom_optimizers](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002Fscripts\u002Frun_finetune_with_custom_optim.sh). It is important to note that the evaluations were conducted over a duration of 0.1 epochs to provide a preliminary insight into the optimizers' effectiveness.\n\n  | Optimizer Name | Train Loss |\n  |----------------|------------|\n  | RMSprop        | 2.4016     |\n  | LION-32bit     | 2.4041     |\n  | Adam           | 2.4292     |\n  | AdamP          | 2.4295     |\n  | AdamW          | 2.4469     |\n  | AdaFactor      | 2.4543     |\n  | AdaBound       | 2.4547     |\n  | AdamWScheduleFree       | 2.4677     |\n  | Adan           | 2.5063     |\n  | NAdam          | 2.5569     |\n  | AdaBelief      | 2.5857     |\n  | AdaMax         | 2.5924     |\n  | RAdam          | 2.6104     |\n  | AdaDelta       | 2.6298     |\n  | AdaGrad        | 2.8657     |\n  | Yogi           | 2.9314     |\n  | NovoGrad       | 3.1071     |\n  | Sophia         | 3.1517     |\n  | LAMB           | 3.2350     |\n  | LARS           | 3.3329     |\n  | SGDScheduleFree        | 3.3541     |\n  | SGDP           | 3.3567     |\n  | SGD            | 3.3734     |\n\n\u003C\u002Fdetails>\n\n## Support\n\nIf you need any help, please submit a Github issue.\n\n## License\n\nThe code included in this project is licensed under the [Apache 2.0 license](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FLMFlow\u002Fblob\u002Fmain\u002FLICENSE).\nIf you wish to use the codes and models included in this project for commercial purposes, please sign this [document](https:\u002F\u002Fdocs.google.com\u002Fforms\u002Fd\u002Fe\u002F1FAIpQLSfJYcci6cbgpIvx_Fh1xDL6pNkzsjGDH1QIcm4cYk88K2tqkw\u002Fviewform?usp=pp_url) to obtain authorization.\n\n## Citation\n\nIf you find this repository useful, please consider giving ⭐ and citing our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.12420):\n\n```citation\n@article{diao2023lmflow,\n  title={Lmflow: An extensible toolkit for finetuning and inference of large foundation models},\n  author={Diao, Shizhe and Pan, Rui and Dong, Hanze and Shum, Ka Shun and Zhang, Jipeng and Xiong, Wei and Zhang, Tong},\n  journal={arXiv preprint arXiv:2306.12420},\n  year={2023}\n}\n```\n\n```citation\n@article{dong2023raft,\n  title={Raft: Reward ranked finetuning for generative foundation model alignment},\n  author={Dong, Hanze and Xiong, Wei and Goyal, Deepanshu and Pan, Rui and Diao, Shizhe and Zhang, Jipeng and Shum, Kashun and Zhang, Tong},\n  journal={arXiv preprint arXiv:2304.06767},\n  year={2023}\n}\n```\n\n```citation\n@article{pan2024lisa,\n  title={LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning}, \n  author={Pan, Rui and Liu, Xiang and Diao, Shizhe and Pi, Renjie and Zhang, Jipeng and Han, Chi and Zhang, Tong},\n  journal={arXiv preprint arXiv:2403.17919},\n  year={2024}\n}\n```\n","LMFlow 是一个用于大规模基础模型微调和推理的可扩展工具包。它基于 PyTorch 构建，支持多种预训练模型和优化器，具有用户友好、高效及可靠的特点。核心功能包括但不限于对大型语言模型的微调、指令跟随能力的增强以及自定义优化器的支持。此外，LMFlow 还提供了丰富的文档与社区支持，便于开发者快速上手。该项目特别适合需要针对特定任务或领域调整现有大模型性能的研究人员和工程师使用。",2,"2026-06-11 03:35:34","high_star"]