[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72357":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72357,"matmulfreellm","ridgerchu\u002Fmatmulfreellm","ridgerchu","Implementation for MatMul-free LM.","",null,"Python",3064,202,50,23,0,8,59.72,"Apache License 2.0",false,"master",[23,24,25],"large-language-model","linear-transformer","llm","2026-06-12 04:01:04","\u003Cdiv align=center>\n\u003Cimg src=\"__assets__\u002Flogo.png\" width=\"200px\">\n\u003C\u002Fdiv>\n\u003Ch2 align=\"center\">MatMul-Free LM\u003C\u002Fh2>\n\u003Ch5 align=\"center\"> If you like our project, please give us a star ⭐ on GitHub for the latest updates.  \u003C\u002Fh2>\n\u003Ch5 align=\"center\"> This repo is adapted from \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fsustcsonglin\u002Fflash-linear-attention\">flash-linear-attention\u003C\u002Fa>. \u003C\u002Fh2>\n\n\u003Ch5 align=\"center\">\n\n[![hf_model](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-Models-blue.svg)](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fridger\u002Fmatmulfree-lm-665f4d2b4e4648756e0dd13c) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2406.02528-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.02528) \n# Introduction\n\u003Cdiv align=center>\n\u003Cimg src=\"__assets__\u002Fmain.png\">\n\u003C\u002Fdiv>\nMatMul-Free LM is a language model architecture that eliminates the need for Matrix Multiplication (MatMul) operations. This repository provides an implementation of MatMul-Free LM that is compatible with the 🤗 Transformers library.\n\n# Scaling Law\n\u003Cdiv align=center>\n\u003Cimg src=\"__assets__\u002Fscaling_law.png\">\n\u003C\u002Fdiv>\nWe evaluate how the scaling law fits to the 370M, 1.3B and 2.7B parameter models in both Transformer++ and our model. For a fair comparison, each operation is treated identically, though our model uses more efficient ternary weights in some layers. Interestingly, the scaling projection for our model exhibits a steeper descent compared to Transformer++, suggesting our architecture is more efficient in leveraging additional compute to improve performance.\n\n# Installation\n\nThe following requirements should be satisfied \n- [PyTorch](https:\u002F\u002Fpytorch.org\u002F) >= 2.0\n- [Triton](https:\u002F\u002Fgithub.com\u002Fopenai\u002Ftriton) >=2.2\n- [einops](https:\u002F\u002Feinops.rocks\u002F)\n\n```sh\npip install -U git+https:\u002F\u002Fgithub.com\u002Fridgerchu\u002Fmatmulfreellm\n```\n\n# Usage\n## Pre-trained Model Zoo\n| Model Size     | Layer | Hidden dimension  | Trained tokens |\n|:----------------|:------------:|:----------------:|:------------------:|\n| [370M](https:\u002F\u002Fhuggingface.co\u002Fridger\u002FMMfreeLM-370M)  | 24  | 1024 | 15B  |\n| [1.3B](https:\u002F\u002Fhuggingface.co\u002Fridger\u002FMMfreeLM-1.3B)  | 24 | 2048 | 100B  |\n| [2.7B](https:\u002F\u002Fhuggingface.co\u002Fridger\u002FMMfreeLM-2.7B)  | 32  | 2560 | 100B  |\n\n## Model\n\nWe provide the implementations of models that are compatible with 🤗 Transformers library. \nHere's an example of how to initialize a model from the default configs in `matmulfreelm`:\nThis is a huggingface-compatible library that you can use such command to initialize the model with huggingface `AutoModel`:\n\n\n```py\n>>> from mmfreelm.models import HGRNBitConfig\n>>> from transformers import AutoModel\n>>> config = HGRNBitConfig()\n>>> AutoModel.from_config(config)\nHGRNBitModel(\n  (embeddings): Embedding(32000, 2048)\n  (layers): ModuleList(\n    (0): HGRNBitBlock(\n      (attn_norm): RMSNorm(2048, eps=1e-06)\n      (attn): HGRNBitAttention(\n        (i_proj): FusedBitLinear(\n          in_features=2048, out_features=2048, bias=False\n          (norm): RMSNorm(2048, eps=1e-08)\n        )\n        (f_proj): FusedBitLinear(\n          in_features=2048, out_features=2048, bias=False\n          (norm): RMSNorm(2048, eps=1e-08)\n        )\n        (g_proj): FusedBitLinear(\n          in_features=2048, out_features=2048, bias=False\n          (norm): RMSNorm(2048, eps=1e-08)\n        )\n        (g_norm): FusedRMSNormSwishGate()\n        (o_proj): FusedBitLinear(\n          in_features=2048, out_features=2048, bias=False\n          (norm): RMSNorm(2048, eps=1e-08)\n        )\n      )\n      (mlp_norm): RMSNorm(2048, eps=1e-06)\n      (mlp): HGRNBitMLP(\n        (gate_proj): FusedBitLinear(\n          in_features=2048, out_features=11264, bias=False\n          (norm): RMSNorm(2048, eps=1e-08)\n        )\n        (down_proj): FusedBitLinear(\n          in_features=5632, out_features=2048, bias=False\n          (norm): RMSNorm(5632, eps=1e-08)\n        )\n        (act_fn): SiLU()\n      )\n    )\n    \n)\n>>> \n\n```\n\n## Generation\n\nUpon successfully pretraining a model, it becomes accessible for generating text using the 🤗 text generation APIs.\nIn the following, we give a generation example in `generate.py`:\n\n```py\nimport os\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\nimport mmfreelm\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n#Change here to our open-sourced model\nname = ''\ntokenizer = AutoTokenizer.from_pretrained(name)\nmodel = AutoModelForCausalLM.from_pretrained(name).cuda().half()\ninput_prompt = \"In a shocking finding, scientist discovered a herd of unicorns living in a remote, \"\ninput_ids = tokenizer(input_prompt, return_tensors=\"pt\").input_ids.cuda()\noutputs = model.generate(input_ids, max_length=32,  do_sample=True, top_p=0.4, temperature=0.6)\nprint(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])\n```\n\n\n\n# Citation\nIf you use this repo in your work, please cite our preprint:\n```bib\n@article{zhu2024scalable,\ntitle={Scalable MatMul-free Language Modeling},\nauthor={Zhu, Rui-Jie and Zhang, Yu and Sifferman, Ethan and Sheaves, Tyler and Wang, Yiqiao and Richmond, Dustin and Zhou, Peng and Eshraghian, Jason K},\njournal={arXiv preprint arXiv:2406.02528},\nyear={2024}\n}\n```\n","MatMul-Free LM 是一种无需矩阵乘法操作的语言模型架构。该项目通过消除传统的矩阵乘法，实现了更高效的计算方式，并且兼容 🤗 Transformers 库，使得用户可以轻松地使用预训练模型或进行自定义开发。其核心功能包括支持多种规模的预训练模型（从370M到2.7B参数），并利用了如Triton和einops等现代技术来提高性能。此项目特别适合那些寻求减少计算资源消耗同时保持高性能的语言处理应用场景，比如文本生成、机器翻译等任务。",2,"2026-06-11 03:41:30","high_star"]