[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-70958":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":35,"readmeContent":36,"aiSummary":37,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":38,"discoverSource":39},70958,"LLMSurvey","RUCAIBox\u002FLLMSurvey","RUCAIBox","The official GitHub page for the survey paper \"A Survey of Large Language Models\".","https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.18223",null,"Python",12168,934,166,24,0,2,8,47.71,false,"main",true,[24,25,26,27,28,29,30,31,32,33,34],"chain-of-thought","chatgpt","in-context-learning","instruction-tuning","large-language-models","llm","llms","natural-language-processing","pre-trained-language-models","pre-training","rlhf","2026-06-06 04:04:35","# LLMSurvey\n\n\n> A collection of papers and resources related to Large Language Models. \n>\n> The organization of papers refers to our survey [**\"A Survey of Large Language Models\"**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.18223). [![Paper page](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fbadges\u002Fraw\u002Fmain\u002Fpaper-page-sm-dark.svg)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2303.18223)\n>\n> Please let us know if you find out a mistake or have any suggestions by e-mail: batmanfly@gmail.com\n>\n> (we suggest ccing another email francis_kun_zhou@163.com meanwhile, in case of any unsuccessful delivery issue.)\n>\n>\n> If you find our survey useful for your research, please cite the following paper:\n\n```\n@article{LLMSurvey,\n    title={A Survey of Large Language Models},\n    author={Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-Rong},\n    year={2023},\n    journal={arXiv preprint arXiv:2303.18223},\n    url={http:\u002F\u002Farxiv.org\u002Fabs\u002F2303.18223}\n}\n```\n\n## 🚀(New) We have released the Chinese book of our survey!\n\nThe Chinese book focuses on providing explanations for beginners in the field of LLMs, aiming to present a comprehensive framework and roadmap for LLMs. This book is suitable for senior undergraduate students and junior graduate students with a foundation in deep learning and can serve as an introductory technical book.\nYou can download the Chinese book at [https:\u002F\u002Fllmbook-zh.github.io\u002F](https:\u002F\u002Fllmbook-zh.github.io\u002F).\n\nHere is our [Chinese book sales page](https:\u002F\u002Fitem.jd.com\u002F14901508.html).\n\n![chinese_version](assets\u002Fchinese_book_cover.jpg)\n\n## 🚀(New) The content about long CoT reasoning \n\nIn our latest version, we add new content of the recent popular reasoning paradigm by allocating more time to thinking before responding to a problem. We focus on long CoT reasoning which is the mainstream approach taken by recent LLMs, such as DeepSeek-R1 and OpenAI's o-series models. We first discuss the reasoning patterns and advantages of the long CoT paradigm. Then we present the construction approaches of long CoT data, including data distillation, search-based data synthesis, and multi-agent collaboration. Moreover, we introduce the commonly-used two training methods: long CoT instruction tuning and scaling reinforcement learning training. Finally, we conduct a in-depth discussion about recent test-time scaling efforts for LLMs.\n\n\u003Cdiv align=center>\u003Cimg src=\"assets\u002Fr1_example.png\" alt=\"Cover\" width=\"60%\"\u002F>\u003C\u002Fdiv>\n\n## The trends of the number of papers related to LLMs on arXiv\n\nHere are the trends of the cumulative numbers of arXiv papers that contain the keyphrases “language model” (since June 2018)\nand “large language model” (since October 2019), respectively.\n\n![arxiv_llms](assets\u002Farxiv_llms.png)\n\nThe statistics are calculated using exact match by querying the keyphrases in title or abstract by months. We set different x-axis ranges for the two keyphrases, because “language models” have been explored at an earlier time. We label the points corresponding to important landmarks in the research progress of LLMs. A sharp increase occurs after the release of ChatGPT: the average number of published arXiv papers that contain “large language model” in title or abstract goes from 0.40 per day to 8.58 per day.\n\n\n\n## Technical Evolution of GPT-series Models\n\nA brief illustration for the technical evolution of GPT-series models. We plot this figure mainly based on the papers, blog articles and official APIs from OpenAI. Here, solid lines denote that there exists an explicit evidence (e.g., the official statement that a new model is developed based on a base model) on the evolution path between two models, while dashed lines denote a relatively weaker evolution relation.\n\n\n\n![gpt-series](assets\u002Fgpt-series.png)\n\n\n\n## Evolutionary Graph of LLaMA Family\n\nAn evolutionary graph of the research work conducted on LLaMA. Due to the huge number, we cannot include all\nthe LLaMA variants in this figure, even much excellent work. \n\n\n\n![LLaMA_family](assets\u002Fllama-0628-final.png)\n\n\n\nTo support incremental update, **we share the source file of this figure, and welcome the readers to include the desired models by submitting the pull requests on our GitHub page. If you're instrested, please request by application.**\n\n\n\n\n## Prompts\n\nWe collect some useful tips for designing prompts that are collected from online notes and experiences from our authors, where we also show the related ingredients and principles (introduced in Section 8.1). \n\n![prompt examples](assets\u002Fprompts_main.png)\n\nPlease click [here](Prompts\u002FREADME.md) to view more detailed information.\n\n**Welcome everyone to provide us with more relevant tips in the form of [issues](https:\u002F\u002Fgithub.com\u002FRUCAIBox\u002FLLMSurvey\u002Fissues\u002F34)**. After selection, we will regularly update them on GitHub and indicate the source.\n\n\n\n## Experiments\n\n### Instruction Tuning Experiments\n\nWe will explore the effect of different types of instructions in fine-tuning LLMs (i.e., 7B LLaMA26), as well as examine the usefulness of several instruction improvement strategies.\n\n\n\n![instruction_tuning_table](assets\u002Finstruction_tuning_table.png)\n\n\n\nPlease click [here](Experiments\u002FREADME.md) to view more detailed information.\n\n### Ability Evaluaition Experiments\n\nWe conduct a fine-grained evaluation on the abilities discussed in Section 7.1 and Section 7.2. For each kind of ability, we select representative tasks and datasets for conducting evaluation experiments to examine the corresponding performance of LLMs. \n\n\n\n![ability_main](assets\u002Fability_main.png)\n\n\n\nPlease click [here](Experiments\u002FREADME.md) to view more detailed information.\n\n\n\n**We also call for support of computing power for conducting more comprehensive experiments.**\n\n\n\n## Table of Contents\n\n- [LLMSurvey](#llmsurvey)\n  - [Chinese Version](#chinese-version)\n  - [🚀(New) The trends of the number of papers related to LLMs on arXiv](#new-the-trends-of-the-number-of-papers-related-to-llms-on-arxiv)\n  - [🚀(New) Technical Evolution of GPT-series Models](#new-technical-evolution-of-gpt-series-models)\n  - [🚀(New) Evolutionary Graph of LLaMA Family](#new-evolutionary-graph-of-llama-family)\n  - [🚀(New) Prompts](#new-prompts)\n  - [🚀(New) Experiments](#new-experiments)\n    - [Instruction Tuning Experiments](#instruction-tuning-experiments)\n    - [Ability Evaluaition Experiments](#ability-evaluaition-experiments)\n  - [Table of Contents](#table-of-contents)\n  - [Timeline of LLMs](#timeline-of-llms)\n  - [List of LLMs](#list-of-llms)\n  - [Paper List](#paper-list)\n    - [Resources of LLMs](#resources-of-llms)\n      - [Publicly Available Models](#publicly-available-models)\n      - [Closed-source Models](#closed-source-models)\n      - [Commonly Used Corpora](#commonly-used-corpora)\n      - [Library Resource](#library-resource)\n      - [Deep Learning Frameworks](#deep-learning-frameworks)\n    - [Pre-training](#pre-training)\n      - [Data Collection](#data-collection)\n      - [Architecture](#architecture)\n        - [Mainstream Architectures](#mainstream-architectures)\n        - [Detailed Configuration](#detailed-configuration)\n        - [Analysis](#analysis)\n      - [Training Algorithms](#training-algorithms)\n      - [Pre-training on Code](#pre-training-on-code)\n        - [LLMs for Program Synthesis](#llms-for-program-synthesis)\n        - [NLP Tasks Formatted as Code](#nlp-tasks-formatted-as-code)\n    - [Adaptation Tuning](#adaptation-tuning)\n      - [Instruction Tuning](#instruction-tuning)\n      - [Alignment Tuning](#alignment-tuning)\n      - [Parameter-Efficient Model Adaptation](#parameter-efficient-model-adaptation)\n      - [Memory-Efficient Model Adaptation](#memory-efficient-model-adaptation)\n    - [Utilization](#utilization)\n      - [In-Context Learning (ICL)](#in-context-learning-icl)\n      - [Chain-of-Thought Reasoning (CoT)](#chain-of-thought-reasoning-cot)\n      - [Planning for Complex Task Solving](#planning-for-complex-task-solving)\n    - [Capacity Evaluation](#capacity-evaluation)\n    - [The Team](#the-team)\n  - [Acknowledgments](#acknowledgments)\n  - [Update Log](#update-log)\n\n## Timeline of LLMs\n\n![LLMs_timeline](assets\u002Ffig2_updated_time_line.png)\n\n\n\n\n\n## List of LLMs\n\n\u003Ctable class=\"tg\">\n\u003Cthead>\n  \u003Ctr>\n    \u003Cth class=\"tg-nrix\" align=\"center\" rowspan=\"2\">Category\u003C\u002Fth>\n    \u003Cth class=\"tg-baqh\" align=\"center\" rowspan=\"2\">model\u003C\u002Fth>\n    \u003Cth class=\"tg-0lax\" align=\"center\" rowspan=\"2\">Release Time\u003C\u002Fth>\n    \u003Cth class=\"tg-baqh\" align=\"center\" rowspan=\"2\">Size(B)\u003C\u002Fth>\n    \u003Cth class=\"tg-0lax\" align=\"center\" rowspan=\"2\">Link\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n  \u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\n  \u003Ctr>\n    \u003Ctd class=\"tg-nrix\" align=\"center\" rowspan=\"27\">Publicly \u003Cbr>Accessbile\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">T5\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2019\u002F10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">11\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.10683\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">mT5\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F03\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">13\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.11934\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">PanGu-α\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F05\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">13\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.12369\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">CPM-2\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F05\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">198\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.10715\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">T0\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">11\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.08207\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">GPT-NeoX-20B\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F02\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">20\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.06745\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">CodeGen\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F03\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">16\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.13474\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Tk-Instruct\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F04\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\" align=\"center\">11\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.07705\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">UL2\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F02\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">20\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.05131\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">OPT\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F05\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">175\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.01068\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">YaLM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F06\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">100\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fyandex\u002FYaLM-100B\">GitHub\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">NLLB\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F07\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">55\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.04672\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">BLOOM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F07\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">176\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.05100\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">GLM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F08\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">130\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.02414\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Flan-T5\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">11\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11416\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">mT0\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F11\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">13\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.01786\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Galatica\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\" align=\"center\" align=\"center\">2022\u002F11\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\" align=\"center\">120\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.09085\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">BLOOMZ\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F11\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">176\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.01786\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">OPT-IML\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F12\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">175\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.12017\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Pythia\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F01\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">12\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01373\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">LLaMA\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F02\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">65\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.13971v1\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Vicuna\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F03\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">13\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Flmsys.org\u002Fblog\u002F2023-03-30-vicuna\u002F\">Blog\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">ChatGLM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F03\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">6\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FTHUDM\u002FChatGLM-6B\">GitHub\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">CodeGeeX\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F03\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">13\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17568\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Alpaca\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F03\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">7\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Fcrfm.stanford.edu\u002F2023\u002F03\u002F13\u002Falpaca.html\">Blog\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Koala\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F04\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">13\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Fbair.berkeley.edu\u002Fblog\u002F2023\u002F04\u002F03\u002Fkoala\u002F\">Blog\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n    \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Mistral\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F09\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">7\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Fmistral.ai\u002Fnews\u002Fannouncing-mistral-7b\u002F\">Blog\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-nrix\" align=\"center\" rowspan=\"31\">Closed\u003Cbr>Source\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">GShard\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2020\u002F01\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\" align=\"center\">600\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2006.16668v1\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">GPT-3\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2020\u002F05\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">175\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.14165\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">LaMDA\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F05\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">137\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.08239\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">HyperCLOVA\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F06\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">82\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.04650\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Codex\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F07\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">12\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.03374\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">ERNIE 3.0\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\" align=\"center\">2021\u002F07\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.02137\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Jurassic-1\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F08\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">178\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Fassets.website-files.com\u002F60fd4503684b466578c0d307\u002F61138924626a6981ee09caf6_jurassic_tech_paper.pdf\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\" align=\"center\">FLAN\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">137\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.01652\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">MT-NLG\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">530\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11990\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Yuan 1.0\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">245\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.04725\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Anthropic\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F12\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">52\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.00861\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">WebGPT\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F12\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">175\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.09332\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Gopher\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F12\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">280\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2112.11446v2\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">ERNIE 3.0 Titan\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F12\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">260\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.12731\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">GLaM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2021\u002F12\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">1200\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.06905\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">InstructGPT\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F01\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">175\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2203.02155v1\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">AlphaCode\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F02\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">41\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2203.07814v1\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Chinchilla\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F03\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">70\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.15556\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">PaLM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F04\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">540\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.02311\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n    \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Cohere\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F06\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">54\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Fcohere.ai\u002F\">Homepage\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">AlexaTM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F08\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">20\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.01448\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Luminous\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F09\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">70\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Fdocs.aleph-alpha.com\u002Fdocs\u002Fintroduction\u002Fluminous\u002F\">Docs\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Sparrow\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F09\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">70\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14375v1\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">WeLM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F09\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.10372\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">U-PaLM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">540\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11399\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Flan-PaLM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\" align=\"center\">540\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11416\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">Flan-U-PaLM\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2022\u002F10\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">540\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11416\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">GPT-4\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F3\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">-\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08774v2\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">PanGU-Σ\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">2023\u002F3\u003C\u002Ftd>\n    \u003Ctd class=\"tg-baqh\" align=\"center\">1085\u003C\u002Ftd>\n    \u003Ctd class=\"tg-0lax\" align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.10845\">Paper\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n## Paper List\n\n### Resources of LLMs\n\n#### Publicly Available Models\n\n1. \u003Cu>T5\u003C\u002Fu>: **\"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer\"**. *Colin Raffel et al.* JMLR 2019. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.10683)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Ft5-11b)]\n2. \u003Cu>mT5\u003C\u002Fu>: **\"mT5: A massively multilingual pre-trained text-to-text transformer\"**. *Linting Xue* et al. NAACL 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.11934)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fmt5-xxl\u002Ftree\u002Fmain)]\n3. \u003Cu>PanGu-α\u003C\u002Fu>: **\"PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation\"**. *Wei Zeng et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.12369)] [[Checkpoint](https:\u002F\u002Fopeni.pcl.ac.cn\u002FPCL-Platform.Intelligence\u002FPanGu-Alpha)]\n4. \u003Cu>CPM-2\u003C\u002Fu>: **\"CPM-2: Large-scale Cost-effective Pre-trained Language Models\"**. *Zhengyan Zhang et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.10715)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002FTsinghuaAI\u002FCPM)]\n5. \u003Cu>T0\u003C\u002Fu>: **\"Multitask Prompted Training Enables Zero-Shot Task Generalization\"**. *Victor Sanh et al.* ICLR 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.08207)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Fbigscience\u002FT0)]\n6. \u003Cu>GPT-NeoX-20B\u003C\u002Fu>: **\"GPT-NeoX-20B: An Open-Source Autoregressive Language Model\"**. *Sid Black et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.06745)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002FEleutherAI\u002Fgpt-neox-20b\u002Ftree\u002Fmain)]\n7. \u003Cu>CodeGen\u003C\u002Fu>: **\"CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis\"**. *Erik Nijkamp et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.13474)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002FSalesforce\u002Fcodegen-16B-nl)]\n8. \u003Cu>Tk-Instruct\u003C\u002Fu>: **\"Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks\"**. *Yizhong Wang et al.* EMNLP 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.07705)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Fallenai\u002Ftk-instruct-11b-def-pos)]\n9. \u003Cu>UL2\u003C\u002Fu>: **\"UL2: Unifying Language Learning Paradigms\"**. *Yi Tay et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.05131)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Ful2)]\n10. \u003Cu>OPT\u003C\u002Fu>: **\"OPT: Open Pre-trained Transformer Language Models\"**. *Susan Zhang et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.01068)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmetaseq\u002Ftree\u002Fmain\u002Fprojects\u002FOPT)]\n11. \u003Cu>NLLB\u003C\u002Fu>: **\"No Language Left Behind: Scaling Human-Centered Machine Translation\"**. *NLLB Team.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.04672)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq\u002Ftree\u002Fnllb)]\n12. \u003Cu>BLOOM\u003C\u002Fu>: **\"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model\"**. *BigScience Workshop*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.05100)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Fbigscience\u002Fbloom)]\n13. \u003Cu>GLM\u003C\u002Fu>: **\"GLM-130B: An Open Bilingual Pre-trained Model\"**. *Aohan Zeng et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.02414)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FGLM-130B)]\n14. \u003Cu>Flan-T5\u003C\u002Fu>: **\"Scaling Instruction-Finetuned Language Models\"**. *Hyung Won Chung et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11416)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ft5x\u002Fblob\u002Fmain\u002Fdocs\u002Fmodels.md#flan-t5-checkpoints)]\n15. \u003Cu>mT0 && BLOOMZ\u003C\u002Fu>: **\"Crosslingual Generalization through Multitask Finetuning\"**. *Niklas Muennighoff et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.01786)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fxmtf)]\n16. \u003Cu>Galactica\u003C\u002Fu>: **\"Galactica: A Large Language Model for Science\"**. *Ross Taylor et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.09085)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fgalactica-120b)]\n17. \u003Cu>OPT-IML\u003C\u002Fu>: **\"OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization\"**. *Srinivasan et al.* . arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.12017)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fopt-iml-30b)]\n18. \u003Cu>CodeGeeX\u003C\u002Fu>: **\"CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X\"**. *Qinkai Zheng et al.* . arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17568)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCodeGeeX)]\n19. \u003Cu>Pythia\u003C\u002Fu>: **\"Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling\"**. *Stella Biderman et al.* . arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01373)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Fpythia)]\n20. \u003Cu>LLaMA\u003C\u002Fu>: **\"LLaMA: Open and Efficient Foundation Language Models\"**. *Hugo Touvron et al.* arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.13971v1)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama)]\n\n#### Closed-source Models\n\n1. \u003Cu>GShard\u003C\u002Fu>: **\"GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding\"**. *Dmitry Lepikhin et al.* ICLR 2021. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2006.16668v1)]\n2. \u003Cu>GPT-3\u003C\u002Fu>: **\"Language Models are Few-Shot Learners\"**. *Tom B. Brown et al.* NeurIPS 2020. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.14165)]\n3. \u003Cu>LaMDA\u003C\u002Fu>: **\"LaMDA: Language Models for Dialog Applications\"**. *Romal Thoppilan et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.08239)]\n4. \u003Cu>HyperCLOVA\u003C\u002Fu>: **\"What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers\"**. *Boseop Kim et al.* EMNLP 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.04650)]\n5. \u003Cu>CodeX\u003C\u002Fu>: **\"Evaluating Large Language Models Trained on Code\"**. *Mark Chen et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.03374)]\n6. \u003Cu>ERNIE 3.0\u003C\u002Fu>: **\"ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation\"**. *Yu Sun et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.02137)]\n7. \u003Cu>Jurassic-1\u003C\u002Fu>: **\"Jurassic-1: Technical details and evaluation\"**. *Opher Lieber et al.* 2021. [[Paper](https:\u002F\u002Fassets.website-files.com\u002F60fd4503684b466578c0d307\u002F61138924626a6981ee09caf6_jurassic_tech_paper.pdf)]\n8. \u003Cu>FLAN\u003C\u002Fu>: **\"Finetuned Language Models Are Zero-Shot Learners\"**. *Jason Wei et al.* ICLR 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.01652)]\n9. \u003Cu>MT-NLG\u003C\u002Fu>: **\"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model\"**. *Shaden Smith et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11990)]\n10. \u003Cu>Yuan 1.0\u003C\u002Fu>: **\"Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning\"**. *Shaohua Wu et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.04725)]\n11. \u003Cu>Anthropic\u003C\u002Fu>: **\"A General Language Assistant as a Laboratory for Alignment\"** . *Amanda Askell et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.00861)]\n12. \u003Cu>WebGPT\u003C\u002Fu>: **\"WebGPT: Browser-assisted question-answering with human feedback\"** . *Reiichiro Nakano et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.09332)]\n13. \u003Cu>Gopher\u003C\u002Fu>: **\"Scaling Language Models: Methods, Analysis & Insights from Training Gopher\"**.  *Jack W. Rae et al.* arXiv 2021. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2112.11446v2)]\n14. \u003Cu>ERNIE 3.0 Titan\u003C\u002Fu>: **\"ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation\"**.  *Shuohuan Wang et al. *arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.12731)]\n15. \u003Cu>GLaM\u003C\u002Fu>: **\"GLaM: Efficient Scaling of Language Models with Mixture-of-Experts\"**. *Nan Du et al.* ICML 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.06905)]\n16. \u003Cu>InstructGPT\u003C\u002Fu>: **\"Training language models to follow instructions with human feedback\"**. *Long Ouyang et al.* arXiv 2022. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2203.02155v1)]\n17. \u003Cu>AlphaCode\u003C\u002Fu>: **\"Competition-Level Code Generation with AlphaCode\"**. *Yujia Li et al.* arXiv 2022. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2203.07814v1)]\n18. \u003Cu>Chinchilla\u003C\u002Fu>: **\"Training Compute-Optimal Large Language Models\"**. *Jordan Hoffmann et al.* arXiv. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.15556)]\n19. \u003Cu>PaLM\u003C\u002Fu>: **\"PaLM: Scaling Language Modeling with Pathways\"**. *Aakanksha Chowdhery et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.02311)]\n20. \u003Cu>AlexaTM\u003C\u002Fu>: **\"AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model\"**. *Saleh Soltan et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.01448)]\n21. \u003Cu>Sparrow\u003C\u002Fu>: **\"Improving alignment of dialogue agents via targeted human judgements\"**. *Amelia Glaese et al.* . arXiv 2022. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14375v1)]\n22. \u003Cu>WeLM\u003C\u002Fu>: **\"WeLM: A Well-Read Pre-trained Language Model for Chinese\"**. *Hui Su et al.* . arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.10372)]\n23. \u003Cu>U-PaLM\u003C\u002Fu>: **\"Transcending Scaling Laws with 0.1% Extra Compute\"**. *Yi Tay et al.* arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11399)]\n24. \u003Cu>Flan-PaLM && Flan-U-PaLM\u003C\u002Fu>: **\"Scaling Instruction-Finetuned Language Models\"**. *Hyung Won Chung et al.* arXiv. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11416)] \n25. \u003Cu>GPT-4\u003C\u002Fu>: **\"GPT-4 Technical Report\"**. *OpenAI*. arXiv 2023. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08774v2)]\n26. \u003Cu>PanGu-Σ\u003C\u002Fu>: **\"PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing\"**. *Xiaozhe Ren et al.* arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.10845)]\n\n#### Commonly Used Corpora\n\n1. \u003Cu>BookCorpus\u003C\u002Fu>: **\"Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books\"**. *Yukun Zhu et al.*  ICCV 2015. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1506.06724v1)] [[Source](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbookcorpus)]\n2. \u003Cu>Guntenburg\u003C\u002Fu>: [[Source](https:\u002F\u002Fwww.gutenberg.org\u002F)]\n3. \u003Cu>CommonCrawl\u003C\u002Fu>: [[Source](https:\u002F\u002Fcommoncrawl.org\u002F)]\n4. \u003Cu>C4\u003C\u002Fu>: **\"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer\"**. *Colin Raffel et al.* JMLR 2019. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1910.10683v3)] [[Source](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fc4)]\n5. \u003Cu>CC-stories-R\u003C\u002Fu>: **\"A Simple Method for Commonsense Reasoning\"**. *Trieu H. Trinh el al.* arXiv 2018. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1806.02847v2)] [[Source](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fspacemanidol\u002Fcc-stories)]\n6. \u003Cu>CC-NEWS\u003C\u002Fu>: **\"RoBERTa: A Robustly Optimized BERT Pretraining Approach\"**. *Yinhan Liu et al.* arXiv 2019. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1907.11692v1)] [[Source](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fcc_news)]\n7. \u003Cu>REALNEWs\u003C\u002Fu>: **\"Defending Against Neural Fake News\"**. *Rowan Zellers et al.* NeurIPS 2019. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1905.12616v3)] [[Source](https:\u002F\u002Fgithub.com\u002Frowanz\u002Fgrover\u002Ftree\u002Fmaster\u002Frealnews)]\n8. \u003Cu>OpenWebText\u003C\u002Fu>: [[Source](https:\u002F\u002Fskylion007.github.io\u002FOpenWebTextCorpus\u002F)]\n9. \u003Cu>Pushshift.io\u003C\u002Fu>: **\"The Pushshift Reddit Dataset\"**. *Jason Baumgartner et al*. AAAI 2020. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2001.08435v1)] [[Source](https:\u002F\u002Ffiles.pushshift.io\u002Freddit\u002F)]\n10. \u003Cu>Wikipedia\u003C\u002Fu>: [[Source](https:\u002F\u002Fdumps.wikimedia.org\u002F)]\n11. \u003Cu>BigQuery\u003C\u002Fu>:  [[Source](https:\u002F\u002Fcloud.google.com\u002Fbigquery\u002Fpublic-data?hl=zh-cn)]\n12. \u003Cu>The Pile\u003C\u002Fu>: **\"The Pile: An 800GB Dataset of Diverse Text for Language Modeling\"**. *Leo Gao et al*. arxiv 2021. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2101.00027v1)] [[Source](https:\u002F\u002Fpile.eleuther.ai\u002F)]\n13. \u003Cu>ROOTS\u003C\u002Fu>: **\"The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset\"**. *Laurençon et al*. NeurIPS 2022 Datasets and Benchmarks Track. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.03915)]\n\n#### Library Resource\n\n1. \u003Cu>Transformers\u003C\u002Fu>: **\"Transformers: State-of-the-Art Natural Language Processing\"**. *Thomas Wolf et al.* EMNLP 2020. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.03771)] [[Source](https:\u002F\u002Fhuggingface.co\u002F)]\n2. \u003Cu>DeepSpeed\u003C\u002Fu>: **\"Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters\"**. *Rasley et al.* KDD 2020. [[Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3394486.3406703)] [[Source](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDeepSpeed)]\n3. \u003Cu>Megatron-LM\u003C\u002Fu>: **\"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism\"**. *Mohammad Shoeybi et al.* arXiv 2019. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.08053)] [[Source](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM)]\n4. \u003Cu>JAX\u003C\u002Fu>:  [[Source](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fjax)]\n5. \u003Cu>Colossal-AI\u003C\u002Fu>: **\"Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training\"**. *Zhengda Bian et al.* arXiv 2021. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2110.14883v2)] [[Source](https:\u002F\u002Fgithub.com\u002Fhpcaitech\u002FColossalAI)]\n6. \u003Cu>BMTrain\u003C\u002Fu>: [[Source](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FBMTrain)]\n7. \u003Cu>FastMoE\u003C\u002Fu>: **\"FastMoE: A Fast Mixture-of-Expert Training System\"**.  *Jiaao He et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.13262)] [[Source](https:\u002F\u002Fgithub.com\u002Flaekov\u002Ffastmoe)]\n\n#### Deep Learning Frameworks\n\n1. \u003Cu>Pytorch\u003C\u002Fu>: **\"PyTorch: An Imperative Style, High-Performance Deep Learning Library\"**. *Adam Paszke el al.* NeurIPS 2019. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.01703)] [[Source](https:\u002F\u002Fpytorch.org\u002F)]\n2. \u003Cu>TensorFlow\u003C\u002Fu>: **\"TensorFlow: A system for large-scale machine learning\"**. *Martín Abadi et al.* OSDI 2016. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.08695)] [[Source](https:\u002F\u002Fwww.tensorflow.org\u002F)] \n3. \u003Cu>MXNet\u003C\u002Fu>: **\"MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems\"**. *Tianqi Chen et al.* arXiv 2015. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.01274)] [[Source](https:\u002F\u002Fgithub.com\u002Fapache\u002Fmxnet)] \n4. \u003Cu>PaddlePaddle\u003C\u002Fu>: **\"PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice\"** . *Yanjun Ma et al.* Frontiers of Data and Domputing 2019.  [[Paper](http:\u002F\u002Fwww.jfdc.cnic.cn\u002FEN\u002Fabstract\u002Fabstract2.shtml)] [[Source](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddle)] \n5. \u003Cu>MindSpore\u003C\u002Fu>: **\"Huawei MindSpore AI Development Framework\"** . *Huawei Technologies Co., Ltd.* Artificial Intelligence Technology 2022. [[Paper](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-981-19-2879-6_5)] [[Source](https:\u002F\u002Fgithub.com\u002Fmindspore-ai\u002Fmindspore)] \n6. \u003Cu>OneFlow\u003C\u002Fu>: **\"OneFlow: Redesign the Distributed Deep Learning Framework from Scratch\"** . *Jinhui Yuan et al.* arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.15032)] [[Source](https:\u002F\u002Fgithub.com\u002FOneflow-Inc\u002Foneflow)] \n\n### Pre-training\n#### Data Collection\n\n1. **\"The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset\"**. *Laurençon et al*. NeurIPS 2022 Datasets and Benchmarks Track. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.03915)]\n1. **\"Deduplicating Training Data Makes Language Models Better\"**. *Katherine Lee et al*. ACL 2022. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.06499)]\n1. **\"Deduplicating Training Data Mitigates Privacy Risks in Language Models\"**. *Nikhil Kandpal et al*. ICML 2022. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.06539)]\n1. **\"Scaling Laws and Interpretability of Learning from Repeated Data\"**. *Danny Hernandez et al*. arXiv 2022. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.10487)]\n1. **\"A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity\"**. *Shayne Longpre et al*. arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13169)]\n\n#### Architecture\n\n##### Mainstream Architectures\n\n**Causal Decoder**\n\n1. **\"Language Models are Few-Shot Learners\"**. *Tom B. Brown et al*. NeurIPS 2020. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2005.14165)]\n1. **\"OPT: Open Pre-trained Transformer Language Models\"**. *Susan Zhang et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2205.01068)]\n1. **\"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model\"**. *Teven Le Scao et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2211.05100)]\n1. **\"Training Compute-Optimal Large Language Models\"**. *Jordan Hoffmann et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2203.15556)]\n1. **\"Scaling Language Models: Methods, Analysis & Insights from Training Gopher\"**. *Jack W. Rae et al*. arXiv 2021. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2112.11446)]\n1. **\"Galactica: A Large Language Model for Science\"**. *Ross Taylor et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2211.09085)]\n1. **\"PaLM: Scaling Language Modeling with Pathways\"**. *Aakanksha Chowdhery et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2204.02311)]\n1. **\"Jurassic-1: Technical Details and Evaluation\"**. *Opher Lieber et al*. AI21 Labs. [[paper](https:\u002F\u002Fuploads-ssl.webflow.com\u002F60fd4503684b466578c0d307\u002F61138924626a6981ee09caf6_jurassic_tech_paper.pdf)]\n1. **\"LaMDA: Language Models for Dialog Applications\"**. *Romal Thoppilan et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2201.08239)]\n\n**Prefix Decoder**\n1. **\"GLM-130B: An Open Bilingual Pre-trained Model\"**. *Aohan Zeng et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2210.02414)]\n1. **\"GLM: General Language Model Pretraining with Autoregressive Blank Infilling\"**. *Zhengxiao Du et al*. ACL 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2103.10360)]\n1. **\"Transcending Scaling Laws with 0.1% Extra Compute\"**. *Yi Tay et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11399)]\n\n**MoE**\n1. **\"Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity\"**. *William Fedus et al*. JMLR. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2101.03961)]\n1. **\"Unified Scaling Laws for Routed Language Models\"**. *Aidan Clark et al*. ICML 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2202.01169)]\n\n**SSM**\n1. **\"Pretraining Without Attention\"**. *Junxiong Wang et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10544)]\n1. **\"Efficiently Modeling Long Sequences with Structured State Spaces\"**. *Albert Gu et al*. ICLR 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2111.00396)]\n1. **\"Long Range Language Modeling via Gated State Spaces\"**. *Harsh Mehta et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2206.13947)]\n1. **\"Hungry Hungry Hippos: Towards Language Modeling with State Space Models\"**. *Daniel Y. Fu et al*. ICLR 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.14052)]\n\n##### Detailed Configuration\n\n**Layer Normalization**\n1. \u003Cu>RMSNorm\u003C\u002Fu>: **\"Root Mean Square Layer Normalization\"**. *Biao Zhang et al*. NeurIPS 2019. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1910.07467)]\n1. \u003Cu>DeepNorm\u003C\u002Fu>: **\"DeepNet: Scaling Transformers to 1,000 Layers\"**. *Hongyu Wang et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2203.00555)]\n1. \u003Cu>Sandwich-LN\u003C\u002Fu>: **\"CogView: Mastering Text-to-Image Generation via Transformers\"**. *Ming Ding et al*. NeirIPS 2021. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.13290)]\n\n**Position Encoding**\n1. \u003Cu>T5 bias\u003C\u002Fu>: **\"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer\"**. *Colin Raffel et al.* JMLR 2019. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.10683)]\n1. \u003Cu>ALiBi\u003C\u002Fu>: **\"Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation\"**. *Ofir Press et al*. ICLR 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2108.12409)]\n1. \u003Cu>RoPE\u003C\u002Fu>: **\"RoFormer: Enhanced Transformer with Rotary Position Embedding\"**. *Jianlin Su et al*. arXiv 2021. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2104.09864)]\n1. \u003Cu>xPos\u003C\u002Fu>: **\"A Length-Extrapolatable Transformer\"**. *Yutao Sun et al*. arXiv 2022. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10554)]\n\n**Attention**\n1. \u003Cu>Multi-query attention\u003C\u002Fu>: **\"Fast Transformer Decoding: One Write-Head is All You Need\"**. *Noam Shazeer*. arXiv 2019. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.02150)]\n1. \u003Cu>FlashAttention\u003C\u002Fu>: **\"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness\"**. *Tri Dao et al*. NeurIPS 2022. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.14135)]\n1. \u003Cu>PagedAttention\u003C\u002Fu>: **\"vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention\"**. *Woosuk Kwon et al*.  2023.  paper(Stay Tuned) [[Offical WebSite](https:\u002F\u002Fvllm.ai\u002F)]\n\n##### Analysis\n\n1. **\"What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?\"**. *Thomas Wang et al*. ICML 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2204.05832)]\n1. **\"What Language Model to Train if You Have One Million GPU Hours?\"**. *Teven Le Scao et al*. Findings of EMNLP 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2210.15424)]\n1. **\"Examining Scaling and Transfer of Language Model Architectures for Machine Translation\"**. *Biao Zhang et al*. ICML 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2202.00528)]\n1. **\"Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?\"**. *Yi Tay et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2207.10551)]\n1. **\"Do Transformer Modifications Transfer Across Implementations and Applications?\"**. *Sharan Narang et al*. EMNLP 2021. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2102.11972)]\n\n#### Training Algorithms\n\n1. **\"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism\"**. *Mohammad Shoeybi et al*. arXiv 2019. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1909.08053)]\n1. **\"An Efficient 2D Method for Training Super-Large Deep Learning Models\"**. *Qifan Xu et al*. arXiv 2021. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2104.05343)]\n1. **\"Tesseract: Parallelize the Tensor Parallelism Efficiently\"**. *Boxiang Wang et al*. ICPP 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2105.14500)]\n1. **\"Maximizing Parallelism in Distributed Training for Huge Neural Networks\"**. *Zhengda Bian et al*. arXiv 2021. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2105.14450)]\n1. **\"GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism\"**. *Yanping Huang et al*. NeurIPS 2019. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1811.06965)]\n1. **\"PipeDream: Fast and Efficient Pipeline Parallel DNN Training\"**. *Aaron Harlap et al*. arXiv 2018. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1806.03377)]\n1. **\"ZeRO: Memory Optimizations Toward Training Trillion Parameter Models\"**. *Samyam Rajbhandari et al*. SC 2020. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F1910.02054)]\n1. **\"ZeRO-Offload: Democratizing Billion-Scale Model Training\"**. *Jie Ren et al*. USENIX 2021. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2101.06840)]\n\n#### Pre-training on Code\n\n##### LLMs for Program Synthesis\n\n1. **\"Evaluating Large Language Models Trained on Code\"**. *Mark Chen et al*. arXiv 2021. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2107.03374)]\n1. **\"Program Synthesis with Large Language Models\"**. *Jacob Austin et al*. arXiv 2021. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2108.07732)]\n1. **\"Show Your Work: Scratchpads for Intermediate Computation with Language Models\"**. *Maxwell Nye et al*. arXiv 2021. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2112.00114)]\n1. **\"A Systematic Evaluation of Large Language Models of Code\"**. *Frank F. Xu et al*. arXiv 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2202.13169)]\n1. **\"Competition-Level Code Generation with AlphaCode\"**. *Yujia Li et al*. Science. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2203.07814)]\n1. **\"CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis\"**. *Erik Nijkamp et al*. ICLR 2023. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2203.13474)]\n1. **\"InCoder: A Generative Model for Code Infilling and Synthesis\"**. *Daniel Fried et al*. ICLR 2023. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2204.05999)]\n1. **\"CodeT: Code Generation with Generated Tests\"**. *Bei Chen et al*. ICLR 2023. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2207.10397)]\n1. **\"StarCoder: may the source be with you!\"**. *Raymond Li et al*. arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.06161)]\n\n##### NLP Tasks Formatted as Code\n\n1. **\"Language Models of Code are Few-Shot Commonsense Learners\"**. *Aman Madaan et al*. EMNLP 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2210.07128)]\n1. **\"Autoformalization with Large Language Models\"**. *Yuhuai Wu et al*. NeurIPS 2022. [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2205.12615)]\n\n### Adaptation Tuning\n\n#### Instruction Tuning\n\n1. **\"Multi-Task Deep Neural Networks for Natural Language Understanding\"**. *Xiaodong Liu et al*. ACL 2019. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.11504)] [[Homepage](https:\u002F\u002Fgithub.com\u002Fnamisan\u002Fmt-dnn)]\n1. **\"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer\"**. *Colin Raffel et al*. JMLR 2020. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.10683)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ftext-to-text-transfer-transformer#released-model-checkpoints)]\n1. **\"Muppet: Massive Multi-task Representations with Pre-Finetuning\"**. *Armen Aghajanyan et al*. EMNLP 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.11038)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Fmodels?other=arxiv:2101.11038)]\n1. **\"Cross-Task Generalization via Natural Language Crowdsourcing Instructions\"**. *Swaroop Mishra et al*. ACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08773)] [[Collection](https:\u002F\u002Finstructions.apps.allenai.org\u002F#data)]\n1. **\"Finetuned Language Models Are Zero-Shot Learners\"**. *Jason Wei et al*. ICLR 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.01652)] [[Homepage](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002FFLAN)]\n1. **\"Multitask Prompted Training Enables Zero-Shot Task Generalization\"**. *Victor Sanh et al*. ICLR 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.08207)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Fbigscience\u002FT0#how-to-use)]\n1. **\"PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts\"**. *Stephen H. Bach et al*. ACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.01279)] [[Collection](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpromptsource)]\n1.  **\"Training language models to follow instructions with human feedback\"**. *Long Ouyang et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.02155)]\n1. **\"Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks\"**. *Yizhong Wang et al*. EMNLP 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.07705)] [[Collection](https:\u002F\u002Finstructions.apps.allenai.org\u002F#data)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002Fmodels?search=tk-instruct-)]\n1. **\"MVP: Multi-task Supervised Pre-training for Natural Language Generation\"**. *Tianyi Tang et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.12131)] [[Collection](https:\u002F\u002Fhuggingface.co\u002FRUCAIBox)] [[Checkpoint](https:\u002F\u002Fhuggingface.co\u002FRUCAIBox)]\n1. **\"Crosslingual Generalization through Multitask Finetuning\"**. *Niklas Muennighoff et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.01786)] [[Collection](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fxmtf#data)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fxmtf#models)]\n1. **\"Scaling Instruction-Finetuned Language Models\"**. *Hyung Won Chung et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11416)] [[Homepage](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002FFLAN)]\n1. **\"Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor\"**. *Or Honovich et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.09689)] [[Homepage](https:\u002F\u002Fgithub.com\u002Forhonovich\u002Funnatural-instructions)]\n1. **\"Self-Instruct: Aligning Language Model with Self Generated Instructions\"**. *Yizhong Wang et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10560)] [[Homepage](https:\u002F\u002Fgithub.com\u002Fyizhongw\u002Fself-instruct)]\n1. **\"OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization\"**. *Srinivasan Iyer et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.12017)] [[Checkpoint](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmetaseq\u002Ftree\u002Fmain\u002Fprojects\u002FOPT-IML)]\n1. **\"The Flan Collection: Designing Data and Methods for Effective Instruction Tuning\"**. *Shayne Longpre et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.13688)] [[Homepage](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002FFLAN)]\n1. **\"Is Prompt All You Need No. A Comprehensive and Broader View of Instruction Learning\"**. *Renze Lou et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.10475)]\n1. **\"Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning\"**. *Hao Chen et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.09246)]\n1. **\"LIMA: Less Is More for Alignment\"**. *Chunting Zhou*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.11206)]\n\n\n#### Alignment Tuning\n\n1. **\"TAMER: Training an Agent Manually via Evaluative Reinforcement\"**. *W. Bradley Knox et al*. ICDL 2008. [[Paper](https:\u002F\u002Fwww.cs.utexas.edu\u002F~bradknox\u002Fpapers\u002Ficdl08-knox.pdf)]\n1. **\"Interactive Learning from Policy-Dependent Human Feedback\"**. *James MacGlashan et al*. ICML 2017. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1701.06049)]\n1. **\"Deep Reinforcement Learning from Human Preferences\"**. *Paul Christiano et al*. NIPS 2017. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03741)]\n1. **\"Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces\"**. *Garrett Warnell et al*. AAAI 2018. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.10163)]\n1. **\"Fine-Tuning Language Models from Human Preferences\"**. *Daniel M. Ziegler et al*. arXiv 2019. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.08593)]\n1. **\"Learning to summarize from human feedback\"**. *Nisan Stiennon et al*. NeurIPS 2020. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.01325)]\n1. **\"Alignment of Language Agents\"**. *Zachary Kenton et al*. arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.14659)]\n1. **\"Recursively Summarizing Books with Human Feedback\"**. *Jeff Wu et al*. arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.10862)]\n1. **\"A General Language Assistant as a Laboratory for Alignment\"**. *Amanda Askell et al*. arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.00861)]\n1. **\"WebGPT: Browser-assisted question-answering with human feedback\"**. *Reiichiro Nakano et al*. arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.09332)]\n1. **\"Training language models to follow instructions with human feedback\"**. *Long Ouyang et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.02155)]\n1. **\"Teaching language models to support answers with verified quotes\"**. *Jacob Menick et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11147)]\n1. **\"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback\"**. *Yuntao Bai et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.05862)]\n1. **\"Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning\"**. *Deborah Cohen et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.02294)]\n1. **\"Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned\"**. *Deep Ganguli et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.07858)]\n1. **\"Improving alignment of dialogue agents via targeted human judgements\"**. *Amelia Glaese et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14375)]\n1. **\"Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization\"**. *Rajkumar Ramamurthy et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.01241)]\n1. **\"Scaling Laws for Reward Model Overoptimization\"**. *Leo Gao et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.10760)]\n1. **\"The Wisdom of Hindsight Makes Language Models Better Instruction Followers\"**. *Tianjun Zhang et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05206)]\n1. **\"RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment\"**. *Hanze Dong et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.06767)]\n1. **\"Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment\"**. *Rishabh Bhardwaj et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.09662)]\n\n#### Parameter-Efficient Model Adaptation\n1. **\"Parameter-Efficient Transfer Learning for NLP\"**. *Neil Houlsby et al*. ICML 2019. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.00751)] [[GitHub](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fadapter-bert)]\n1. **\"MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer\"**. *Jonas Pfeiffer et al*. EMNLP 2020. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.00052)] [[GitHub](https:\u002F\u002Fgithub.com\u002FAdapter-Hub\u002Fadapter-transformers)]\n1. **\"AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts\"**. *Taylor Shin et al*. EMNLP 2020. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.15980)] [[GitHub](https:\u002F\u002Fucinlp.github.io\u002Fautoprompt\u002F)]\n1. **\"Prefix-Tuning: Optimizing Continuous Prompts for Generation\"**. *Xiang Lisa Li et al*. ACL 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.00190)] [[GitHub](https:\u002F\u002Fgithub.com\u002FXiangLi1999\u002FPrefixTuning)]\n1. **\"GPT Understands, Too\"**. *Xiao Liu et al*. arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.10385)] [[GitHub](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FP-tuning)]\n1. **\"The Power of Scale for Parameter-Efficient Prompt Tuning\"**. *Brian Lester et al*. EMNLP 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2104.08691)]\n1. **\"LoRA: Low-Rank Adaptation of Large Language Models\"**. *Edward J. Hu et al*. arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.09685)] [[GitHub](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FLoRA)]\n1. **\"Towards a Unified View of Parameter-Efficient Transfer Learning\"**. *Junxian He et al*. ICLR 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.04366)] [[GitHub](https:\u002F\u002Fgithub.com\u002Fjxhe\u002Funify-parameter-efficient-tuning)]\n1. **\"P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks\"**. *Xiao Liu et al*. ACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.07602)] [[GitHub](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FP-tuning-v2)]\n1. **\"DyLoRA: Parameter-Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation\"**. *Mojtaba Valipour et al*. EACL 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.07558)] [[GitHub](https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FKD-NLP\u002Ftree\u002Fmain\u002FDyLoRA)]\n1. **\"Parameter-efficient fine-tuning of large-scale pre-trained language models\"**. *Ning Ding et al*. Nat Mach Intell. [[Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs42256-023-00626-4)] [[GitHub](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FOpenDelta)]\n1. **\"Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning\"**. *Qingru Zhang et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.10512)] [[GitHub](https:\u002F\u002Fgithub.com\u002FQingruZhang\u002FAdaLoRA)]\n1. **\"LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention\"**. *Renrui Zhang et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.16199)] [[GitHub](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter)]\n1. **\"LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models\"**. *Zhiqiang Hu et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01933)] [[GitHub](https:\u002F\u002Fgithub.com\u002FAGI-Edgerunners\u002FLLM-Adapters)]\n\n\n#### Memory-Efficient Model Adaptation\n1. **\"A Survey of Quantization Methods for Efficient Neural Network Inference\"**. *Amir Gholami et al*. arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.13630)]\n1. **\"8-bit Optimizers via Block-wise Quantization\"**. *Tim Dettmers et al*. arXiv 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02861)]\n1. **\"Compression of Generative Pre-trained Language Models via Quantization\"**. *Chaofan Tao et al*. ACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.10705)]\n1. **\"ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers\"**. *Zhewei Yao et al*. NeurIPS 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.01861)] [[GitHub](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDeepSpeed)]\n1. **\"LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale\"**. *Tim Dettmers et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.07339)] [[GitHub](https:\u002F\u002Fgithub.com\u002FTimDettmers\u002Fbitsandbytes)]\n1. **\"GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers\"**. *Elias Frantar et al*. ICLR 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.17323)] [[GitHub](https:\u002F\u002Fgithub.com\u002FIST-DASLab\u002Fgptq)]\n1. **\"SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models\"**. *Guangxuan Xiao et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.10438)] [[GitHub](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fsmoothquant)]\n1. **\"The case for 4-bit precision: k-bit Inference Scaling Laws\"**. *Tim Dettmers et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.09720)]\n1. **\"ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation\"**. *Zhewei Yao et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08302)]\n1. **\"QLoRA: Efficient Finetuning of Quantized LLMs\"**. *Tim Dettmers et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14314)] [[GitHub](https:\u002F\u002Fgithub.com\u002Fartidoro\u002Fqlora)]\n1. **\"LLM-QAT: Data-Free Quantization Aware Training for Large Language Models\"**. *Zechun Liu et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17888)]\n1. **\"AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration\"**. *Ji Lin et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.00978)] [[GitHub](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fllm-awq)]\n\n\n### Utilization\n\n#### In-Context Learning (ICL)\n\n1. **\"An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels\"**. *Taylor Sorensen et al*. ACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11364)]\n2. **\"What Makes Good In-Context Examples for GPT-3?\"**. *Jiachang Liu et al*. ACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.06804)]\n3. **\"Learning to retrieve prompts for in-context learning\"**. *Ohad Rubin et al*. NAACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.08633)]\n4. **\"Diverse demonstrations improve in-context compositional generalization\"**. *Itay Levy et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.06800)]\n5. **\"Demystifying Prompts in Language Models via Perplexity Estimation\"**. *Hila Gonen et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.04037)]\n6. **\"Active Example Selection for In-Context Learning\"**. *Yiming Zhang et al*. EMNLP 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.04486)]\n7. **\"Self-adaptive In-context Learning\"**. *Zhiyong Wu et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10375)]\n8. **\"Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity\"**. *Yao Lu et al*. ACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08786)]\n9. **\"Structured Prompting: Scaling In-Context Learning to 1,000 Examples\"**. *Hao, Yaru et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.06713)]\n10. **\"The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning\"**. *Ye, Xi et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.03401)]\n11. **\"Cross-Task Generalization via Natural Language Crowdsourcing Instructions\"**. *Swaroop Mishra et al*. ACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08773)]\n12. **\"Prompt-Augmented Linear Probing: Scaling Beyond the Limit of Few-shot In-Context Learner\"**. *Hyunsoo Cho et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10873)]\n13. **\"An Explanation of In-context Learning as Implicit Bayesian Inference\"**. S*ang Michael Xie et al*. ICLR 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.02080)]\n14. **\"Calibrate Before Use: Improving Few-Shot Performance of Language Models\"**. *Zihao Zhao et al*. ICML 2021. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.09690)]\n15. **\"Data distributional properties drive emergent in-context learning in transformers\"**. *Stephanie C. Y. Chan et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.05055)]\n16. **\"In-context Learning and Induction Heads\"**. *Catherine Olsson et al*. arXiv 2022. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2209.11895)]\n17. **\"On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model\"**. *Seongjin Shin et al*. NAACL 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.13509)]\n18. **\"Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?\"**. *Sewon Min et al*. EMNLP 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.12837)]\n19. **\"Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale\"**. *Hritik Bansal et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.09095)]\n20. **\"Transformers as algorithms: Generalization and implicit model selection in in-context learning\"**. *Yingcong Li et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.07067)]\n21. **\"Transformers learn in-context by gradient descent\"**. *Johannes von Oswald et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.07677)]\n22. **\"What learning algorithm is in-context learning? investigations with linear models\"**. *Ekin Aky{\\\"{u}}rek et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.15661)]\n23. **\"A Survey for In-context Learning\"**. *Qingxiu Dong et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.00234)]\n24. **What In-Context Learning \"Learns\" In-Context: Disentangling Task Recognition and Task Learning**. *Jane Pan et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.09731)]\n25. **The Learnability of In-Context Learning**. *Noam Wies et al*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.07895)]\n26. **Do Prompt-Based Models Really Understand the Meaning of Their Prompts?** *Albert Webson et al*. NAACL 2022. [[Paper](https:\u002F\u002Faclanthology.org\u002F2022.naacl-main.167\u002F)]\n27. **Larger language models do in-context learning differently**. *Jerry Wei*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.03846)]\n28. **Meta-in-context learning in large language models**. *Julian Coda-Forno*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12907)]\n29. **Symbol tuning improves in-context learning in language models**. *Jerry Wei*. arXiv 2023. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.08298)]\n\n#### Chain-of-Thought Reasoning (CoT)\n\n1. **\"Automatic Chain of Thought Prompting in Large Language Models\"**. *Zhuosheng Zhang et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03493)]\n2. **\"Chain of Thought Prompting Elicits Reasoning in Large Language Models\"**. *Jason Wei et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903)]\n3. **\"STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning\"**. *Zelikman et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.14465)]\n4. **\"Large language models are zero-shot reasoners\"**. *Takeshi Kojima et al*. arXiv 2022. [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.11916)]\n5. **\"Automatic Chain of Thought Prompting in Large Language Models\"**. *Zhuosheng Zhang et al*. arXiv. [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03493)]\n6.","LLMSurvey 是一个收集了大量关于大型语言模型相关论文和资源的项目。该项目基于同名综述论文《A Survey of Large Language Models》，旨在为研究人员提供一个全面的框架和路线图，特别是对初学者友好。其核心功能包括整理并归类现有的研究成果，涵盖了从预训练方法到指令调优、上下文学习等多个方面，并且最新版本中还增加了长链式思维推理的内容。适合于任何希望深入了解大型语言模型技术细节的研究人员或学生使用，在学术研究及教育领域尤为适用。","2026-06-06 03:36:11","high_star"]