[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72564":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},72564,"xlstm","NX-AI\u002Fxlstm","NX-AI","Official repository of the xLSTM.","https:\u002F\u002Fwww.nx-ai.com\u002F",null,"Python",2169,184,22,50,0,4,7,15,12,28.8,"Apache License 2.0",false,"main",[26,27,28,29,30,31],"deep-learning","deep-learning-architecture","llm","machine-learning","nlp","rnn","2026-06-12 02:03:05","\u003Cdiv align=\"center\">\n\n# xLSTM: Extended Long Short-Term Memory\n\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Paper&message=2405.04517&color=B31B1B&logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.04517)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fxlstm?color=blue)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fxlstm\u002F)\n[![PyPI Downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fpersonalized-badge\u002Fxlstm?period=total&units=INTERNATIONAL_SYSTEM&left_color=GREY&right_color=BLUE&left_text=downloads)](https:\u002F\u002Fpepy.tech\u002Fprojects\u002Ftirex-ts)\n![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNX-AI\u002Fxlstm)\n[![License: Apache-2.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache--2.0-green.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n\n![xLSTM Figure](.\u002Fres\u002Fdesc_xlstm_overview.svg)\n\n\u003C\u002Fdiv>\n\n> **Paper:** https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.04517\n>\n> **Authors:** Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter\n\n## About\n\nxLSTM is a new Recurrent Neural Network architecture based on ideas of the original LSTM.\nThrough Exponential Gating with appropriate normalization and stabilization techniques and a new Matrix Memory it overcomes the limitations of the original LSTM \nand shows promising performance on Language Modeling when compared to Transformers or State Space Models.\n\n:rotating_light: We trained a 7B parameter xLSTM Language Model on 2.3T tokens! :rotating_light:\n\nWe refer to the optimized architecture for our xLSTM 7B as xLSTM Large. \n\n## Minimal Installation\n\nCreate a conda environment from the file `environment_pt240cu124.yaml`.\nInstall the model code only (i.e. the module `xlstm`) as package:\n\nFor using the xLSTM Large 7B model install [`mlstm_kernels`](https:\u002F\u002Fgithub.com\u002FNX-AI\u002Fmlstm_kernels) via:\n``` \npip install mlstm_kernels\n```\nThen install the xlstm package via pip: \n```bash\npip install xlstm\n```\nOr clone from github:\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FNX-AI\u002Fxlstm.git\ncd xlstm\npip install -e .\n```\n\n## Requirements\n\nThis package is based on PyTorch and was tested for versions `>=1.8`. For a well-tested environment, install the `environment_pt240cu124.yaml` as:\n```bash\nconda env create -n xlstm -f environment_pt240cu124.yaml\nconda activate xlstm\n``` \n\nFor the xLSTM Large 7B model we require our [`mlstm_kernels`](https:\u002F\u002Fgithub.com\u002FNX-AI\u002Fmlstm_kernels) package, which provides fast kernels for the xLSTM.\n\n\u003Cdiv align=\"center\">\n\n# xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Paper&message=2503.13427&color=B31B1B&logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.13427)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-xLSTM_7B-yellow?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002FNX-AI\u002FxLSTM-7b)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-nxai_community-green)](https:\u002F\u002Fgithub.com\u002FNX-AI\u002Ftirex-internal\u002Fblob\u002Fmain\u002FLICENSE)\n\n\n> **Paper:** https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.13427\n>\n> **Authors:** Maximilian Beck, Korbinian Pöppel, Phillip Lippe, Richard Kurle, Patrick M. Blies, Günter Klambauer, Sebastian Böck, Sepp Hochreiter\n\n![xLSTM Figure](.\u002Fres\u002Fxlstm_7b_poster.svg)\n\n\u003C\u002Fdiv>\n\nWe have optimized the xLSTM architecture in terms of training throughput and stability. \nThe code for the updated architecture is located in `xlstm\u002Fxlstm_large`.\n\nThe model weights are available on Huggingface at https:\u002F\u002Fhuggingface.co\u002FNX-AI\u002FxLSTM-7b. \n\n## How to use the xLSTM Large 7B and its architecture\n\nWe provide a standalone single file implementation of the xLSTM Large architecture in [`xlstm\u002Fxlstm_large\u002Fmodel.py`](https:\u002F\u002Fgithub.com\u002FNX-AI\u002Fxlstm\u002Fblob\u002Fmain\u002Fxlstm\u002Fxlstm_large\u002Fmodel.py).\nThis implementation requires our [`mlstm_kernels`](https:\u002F\u002Fgithub.com\u002FNX-AI\u002Fmlstm_kernels) package and other than that has no dependency on the NeurIPS xLSTM architecture implementation.\n\nFor a quick start, we provide a [`demo.ipynb`](https:\u002F\u002Fgithub.com\u002FNX-AI\u002Fxlstm\u002Fblob\u002Fmain\u002Fnotebooks\u002Fxlstm_large\u002Fdemo.ipynb) notebook for the xLSTM Large architecture at `notebooks\u002Fxlstm_large\u002Fdemo.ipynb`. \n\nIn this notebook we import our config and model class, initialize a random model and perform a forward pass, like so:\n\n```python\nimport torch\nfrom xlstm.xlstm_large.model import xLSTMLargeConfig, xLSTMLarge\n\n# configure the model with TFLA Triton kernels\nxlstm_config = xLSTMLargeConfig(\n    embedding_dim=512,\n    num_heads=4,\n    num_blocks=6,\n    vocab_size=2048,\n    return_last_states=True,\n    mode=\"inference\",\n    chunkwise_kernel=\"chunkwise--triton_xl_chunk\", # xl_chunk == TFLA kernels\n    sequence_kernel=\"native_sequence__triton\",\n    step_kernel=\"triton\",\n)\n# instantiate the model\nxlstm = xLSTMLarge(xlstm_config)\nxlstm = xlstm.to(\"cuda\")\n# create inputs\ninput = torch.randint(0, 2048, (3, 256)).to(\"cuda\")\n# run a forward pass\nout = xlstm(input)\nout.shape[1:] == (256, 2048)\n```\n\n## Recommendation for other hardware\n\nWe have tested our model mostly on NVIDIA GPUs, however our Triton kernels should also run on AMD GPUs. \nFor other platforms, like Apple Metal, we recommend using the native PyTorch implementations for now:\n\n```python \nxlstm_config = xLSTMLargeConfig(\n    embedding_dim=512,\n    num_heads=4,\n    num_blocks=6,\n    vocab_size=2048,\n    return_last_states=True,\n    mode=\"inference\",\n    chunkwise_kernel=\"chunkwise--native_autograd\", # no Triton kernels\n    sequence_kernel=\"native_sequence__native\", # no Triton kernels\n    step_kernel=\"native\", # no Triton kernels\n)\n```\n\nIf you are working inside Apple's MLX ecosystem, check out the community-driven\n[xLSTM-metal](https:\u002F\u002Fgithub.com\u002FMLXPorts\u002FxLSTM-metal) port which provides an\nMLX-native implementation of xLSTM targeting Apple Silicon.\n\n# Models from the xLSTM NeurIPS Paper\n\nThis section explains how to use the models from the xLSTM paper.\n\n## How to use the xLSTM architecture from our NeurIPS paper\n\nFor non language applications or for integrating in other architectures you can use the `xLSTMBlockStack` and for language modeling or other token-based applications you can use the `xLSTMLMModel`.\n\n### Using the sLSTM CUDA kernels\n\nFor the CUDA version of sLSTM, you need Compute Capability >= 8.0, see [https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-gpus](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-gpus). If you have problems with the compilation, please try (thanks to [@zia1138](https:\u002F\u002Fgithub.com\u002Fzia1138) for pointing out):\n```bash\nexport TORCH_CUDA_ARCH_LIST=\"8.0;8.6;9.0\"\n```\n\nFor all kinds of custom setups with torch and CUDA, keep in mind that versions have to match. Also, to make sure the correct CUDA libraries are included you can use the \"XLSTM_EXTRA_INCLUDE_PATHS\" environment variable now to inject different include paths, e.g.:\n\n```bash\nexport XLSTM_EXTRA_INCLUDE_PATHS='\u002Fusr\u002Flocal\u002Finclude\u002Fcuda\u002F:\u002Fusr\u002Finclude\u002Fcuda\u002F'\n```\n\nor within python:\n\n```python\nimport os\nos.environ['XLSTM_EXTRA_INCLUDE_PATHS']='\u002Fusr\u002Flocal\u002Finclude\u002Fcuda\u002F:\u002Fusr\u002Finclude\u002Fcuda\u002F'\n```\n\nfor standalone, even faster sLSTM kernels, feel free to use the [FlashRNN](https:\u002F\u002Fgithub.com\u002FNX-AI\u002Fflashrnn) library.\n\n### xLSTM Block Stack\n\nThe `xLSTMBLockStack` is meant for use as alternative backbone in existing projects. It is similar to a stack of Transformer blocks, but uses xLSTM blocks:\n\n```python\nimport torch\n\nfrom xlstm import (\n    xLSTMBlockStack,\n    xLSTMBlockStackConfig,\n    mLSTMBlockConfig,\n    mLSTMLayerConfig,\n    sLSTMBlockConfig,\n    sLSTMLayerConfig,\n    FeedForwardConfig,\n)\n\ncfg = xLSTMBlockStackConfig(\n    mlstm_block=mLSTMBlockConfig(\n        mlstm=mLSTMLayerConfig(\n            conv1d_kernel_size=4, qkv_proj_blocksize=4, num_heads=4\n        )\n    ),\n    slstm_block=sLSTMBlockConfig(\n        slstm=sLSTMLayerConfig(\n            backend=\"cuda\",\n            num_heads=4,\n            conv1d_kernel_size=4,\n            bias_init=\"powerlaw_blockdependent\",\n        ),\n        feedforward=FeedForwardConfig(proj_factor=1.3, act_fn=\"gelu\"),\n    ),\n    context_length=256,\n    num_blocks=7,\n    embedding_dim=128,\n    slstm_at=[1],\n\n)\n\nxlstm_stack = xLSTMBlockStack(cfg)\n\nx = torch.randn(4, 256, 128).to(\"cuda\")\nxlstm_stack = xlstm_stack.to(\"cuda\")\ny = xlstm_stack(x)\ny.shape == x.shape\n```\n\nIf you are working with yaml strings \u002F files for configuration you can also use dacite to create the config dataclasses. This is the same as the snippet above:\n\n```python\nfrom omegaconf import OmegaConf\nfrom dacite import from_dict\nfrom dacite import Config as DaciteConfig\nfrom xlstm import xLSTMBlockStack, xLSTMBlockStackConfig\n\nxlstm_cfg = \"\"\" \nmlstm_block:\n  mlstm:\n    conv1d_kernel_size: 4\n    qkv_proj_blocksize: 4\n    num_heads: 4\nslstm_block:\n  slstm:\n    backend: cuda\n    num_heads: 4\n    conv1d_kernel_size: 4\n    bias_init: powerlaw_blockdependent\n  feedforward:\n    proj_factor: 1.3\n    act_fn: gelu\ncontext_length: 256\nnum_blocks: 7\nembedding_dim: 128\nslstm_at: [1]\n\"\"\"\ncfg = OmegaConf.create(xlstm_cfg)\ncfg = from_dict(data_class=xLSTMBlockStackConfig, data=OmegaConf.to_container(cfg), config=DaciteConfig(strict=True))\nxlstm_stack = xLSTMBlockStack(cfg)\n\nx = torch.randn(4, 256, 128).to(\"cuda\")\nxlstm_stack = xlstm_stack.to(\"cuda\")\ny = xlstm_stack(x)\ny.shape == x.shape\n\n```\n\n\n### xLSTM Language Model\n\nThe `xLSTMLMModel` is a wrapper around the `xLSTMBlockStack` that adds the token embedding and lm head.\n\n```python\nfrom omegaconf import OmegaConf\nfrom dacite import from_dict\nfrom dacite import Config as DaciteConfig\nfrom xlstm import xLSTMLMModel, xLSTMLMModelConfig\n\nxlstm_cfg = \"\"\" \nvocab_size: 50304\nmlstm_block:\n  mlstm:\n    conv1d_kernel_size: 4\n    qkv_proj_blocksize: 4\n    num_heads: 4\nslstm_block:\n  slstm:\n    backend: cuda\n    num_heads: 4\n    conv1d_kernel_size: 4\n    bias_init: powerlaw_blockdependent\n  feedforward:\n    proj_factor: 1.3\n    act_fn: gelu\ncontext_length: 256\nnum_blocks: 7\nembedding_dim: 128\nslstm_at: [1]\n\"\"\"\ncfg = OmegaConf.create(xlstm_cfg)\ncfg = from_dict(data_class=xLSTMLMModelConfig, data=OmegaConf.to_container(cfg), config=DaciteConfig(strict=True))\nxlstm_stack = xLSTMLMModel(cfg)\n\nx = torch.randint(0, 50304, size=(4, 256)).to(\"cuda\")\nxlstm_stack = xlstm_stack.to(\"cuda\")\ny = xlstm_stack(x)\ny.shape[1:] == (256, 50304)\n```\n\n\n## Experiments\n\nThe synthetic experiments show-casing the benefits of sLSTM over mLSTM and vice versa best are the Parity task and the Multi-Query Associative Recall task. The Parity task can only be solved with state-tracking capabilities provided by the memory-mixing of sLSTM. The Multi-Query Associative Recall task measures memorization capabilities, where the matrix-memory and state expansion of mLSTM is very beneficial.\nIn combination they do well on both tasks.\n\nTo run each, run the `main.py` in the experiments folder like:\n```\nPYTHONPATH=. python experiments\u002Fmain.py --config experiments\u002Fparity_xlstm01.yaml   # xLSTM[0:1], sLSTM only\nPYTHONPATH=. python experiments\u002Fmain.py --config experiments\u002Fparity_xlstm10.yaml   # xLSTM[1:0], mLSTM only\nPYTHONPATH=. python experiments\u002Fmain.py --config experiments\u002Fparity_xlstm11.yaml   # xLSTM[1:1], mLSTM and sLSTM\n```\n\nNote that the training loop does not contain early stopping or test evaluation.\n\n\n## Citation\n\nIf you use this codebase, or otherwise find our work valuable, please cite the xLSTM paper:\n```\n@inproceedings{beck:24xlstm,\n  title = {xLSTM: Extended Long Short-Term Memory}, \n  author = {Maximilian Beck and Korbinian Pöppel and Markus Spanring and Andreas Auer and Oleksandra Prudnikova and Michael Kopp and Günter Klambauer and Johannes Brandstetter and Sepp Hochreiter},\n  booktitle = {Thirty-eighth Conference on Neural Information Processing Systems},\n  year = {2024},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.04517}, \n}\n\n@article{beck:25xlstm7b,\n  title = {{xLSTM 7B}: A Recurrent LLM for Fast and Efficient Inference},\n  author = {Maximilian Beck and Korbinian Pöppel and Phillip Lippe and Richard Kurle and Patrick M. Blies and Günter Klambauer and Sebastian Böck and Sepp Hochreiter},\n  booktitle = {Forty-second International Conference on Machine Learning},\n  year = {2025},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.13427}\n}\n\n```\n","xLSTM是一个基于LSTM改进的新型循环神经网络架构，通过指数门控、适当的归一化和稳定技术以及新的矩阵记忆克服了传统LSTM的局限性。其核心功能包括增强的记忆能力和更高效的序列建模能力，尤其在语言模型任务上表现出色，能够与Transformer等先进模型相媲美。该项目使用Python编写，并依赖PyTorch框架。适合应用于需要高效处理长序列数据的任务场景中，如自然语言处理中的文本生成、情感分析等领域。",2,"2026-06-11 03:42:37","high_star"]