[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72611":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},72611,"blt","facebookresearch\u002Fblt","facebookresearch","Code for BLT research paper",null,"Python",2045,192,28,45,0,1,8,60.16,"Other",false,"main",[],"2026-06-12 04:01:06","# Byte Latent Transformer\n\nThis repository contains code for our paper: \"Byte Latent Transformer: Patches Scale Better Than Tokens\"\n\n- [Paper Link](https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fblt\u002FBLT__Patches_Scale_Better_Than_Tokens.pdf)\n\n## Abstract\n\nWe introduce the Byte Latent Transformer architecture (BLTs), a new byte-level LLM architecture that\nfor the first time, matches tokenization-based LLM performance at scale, with significant improvements\nin inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve\nas the primary units of computation. Patches are segmented dynamically based on the entropy of the\nnext byte, allocating more compute and model capacity where there is more data complexity. The BLT\narchitecture includes new attention mechanisms to maximize the information flow between byte and\npatch hidden representations and a new type of byte-sequence memory. We present the first scaling\nstudy of byte-level models up to 8B parameters and 8T training bytes, showing for the first time\nthat we can train a model end-to-end at scale from bytes with no tokenization or other preprocessing.\nScaling trends reveal training and inference efficiency benefits from dynamically selecting very long\npatches on average, along with qualitative improvements with reasoning and long tail generalization\nfrom modeling byte-sequences.\n\n![BLT Architecture Diagram](blt-figure.jpg)\n\n## Development Status\n\nWe are actively updating the blt code to make it easier to reproduce our results.\nPlease file an issue and\u002For be patient while we make more of our code public!\n\n## Quick start\n\nThere are three ways you can create your environment.\n\n### Option 1: conda + pip\n\nRun these commands in your terminal or a script:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fblt\ncd blt\nconda create -n blt python=3.12\nconda activate blt\npip install --pre torch --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcu121\npip install ninja\npip install -v -U git+https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fxformers.git@de742ec3d64bd83b1184cc043e541f15d270c148\npip install -r requirements.txt\n```\n\n### Option 2: Slurm Job to Build Env\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fblt\ncd blt\n\nbash setup\u002Fcreate_env.sh\n# or if you have access to a SLURM cluster\nsbatch setup\u002Fcreate_env.sh\n```\n\nOnce that is done you can activate the environment\n\n```bash\nconda activate blt_\u003Cdate>\n```\n\n### Options 3 (experimental, reproducible): uv\n\nRun the following to install the env using [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F).\nThe main benefit of this method is that the build is reproducible since there is a lock file.\n\n```bash\nuv pip install --group pre_build --no-build-isolation\nuv pip install --group compile_xformers --no-build-isolation\nuv sync\nuv run python download_blt_weights.py\nuv run python demo.py \"A BLT has\"\n```\n\n## Downloading HF Model Weights and Generating Text\n\nWe have released weights on HF for the [BLT 1B Model](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fblt-1b) and [BLT 7B Model](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fblt-7b).\nWe are actively working with HF to make BLT available in [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fen\u002Findex) and will update this when it is.\nIn the meantime, you can follow these instructions to load model weights, initialize a model, and generate text.\nThese instructions have been tested on H100 GPUs, but we can only offer suggestions on running on other hardware.\n\n1. On the model weights HF page, create a HuggingFace account, request access to weights, and wait for approval.\n2. On the huggingface cli, login: `huggingface-cli login`\n\nFrom here there are two options: (1) load weights in our train script and (2) loading weights via HF hub to use for anything else.\n\n## Load Weights via HF Hub\n\nIn your terminal:\n\n```bash\npython -m bytelatent.hf load-transformers --entropy-repo facebook\u002Fblt-entropy --blt-repo facebook\u002Fblt-1b --prompt \"My test prompt\" hub\n```\n\nIn your own code:\n\n```python\nfrom bytelatent.transformer import LMTransformer\nfrom bytelatent.model.blt import ByteLatentTransformer\nfrom bytelatent.hf import BltTokenizerAndPatcher\n\nentropy_repo = \"facebook\u002Fblt-entropy\"\nblt_repo = \"facebook\u002Fblt-1b\"\nentropy_model = LMTransformer.from_pretrained(entropy_repo)\nblt_model = ByteLatentTransformer.from_pretrained(blt_repo)\ntok_and_patcher = BltTokenizerAndPatcher.from_pretrained(blt_repo)\ntokenizer = tok_and_patcher.tokenizer_args.build()\npatcher = tok_and_patcher.patcher_args.build()\n```\n\n## Load Weights for Running BLT Train Script\n\n1. Download the model weights with: `python download_blt_weights.py`, which will load to `hf-weights`\n2. Run the generate demo: `python demo.py \"A BLT has\"`.\n\nThe demo generates text, but is also a good starting point for loading BLT in your own code.\n\n## Downloading Training Data\n\nNote: The following instructions are not well tested in the BLT code as it is based on the lingua code, which we have diverged from.\n\nUse the provided script to download and prepare data from huggingface (among `fineweb_edu`, `fineweb_edu_10bt`, or `dclm_baseline_1.0`).\nThis command will download the `fineweb_edu` and prepare it for training in the `.\u002Fdata` directory, specifying the amount of memory `terashuf` (the tool used to shuffle samples) will be allocated. By default, the number of chunks (`nchunks`) is 32. If you are running on fewer than 32 GPUs, it is recommended to set `nchunks` to 1 or to match `nchunks` with the number of GPUs (`nchunks` = NGPUs). See [here](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Flingua\u002Fissues\u002F55#issuecomment-2483643076) for more details.\n\n```bash\npython setup\u002Fdownload_prepare_hf_data.py fineweb_edu \u003CMEMORY> --data_dir .\u002Fdata --seed 42 --nchunks \u003CNCHUNKS>\n```\n\nto download tokenizer (here llama3), use the following script:\n\n```bash\npython setup\u002Fdownload_tokenizer.py llama3 \u003CSAVE_PATH> --api_key \u003CHUGGINGFACE_TOKEN>\n```\n\nNow launch a debug job to check if everything works. **The provided configurations are templates, you need to adapt them for them to work (change `dump_dir`, `data.root_dir`, `data.tokenizer.path`, etc ...)**\n\n```bash\n# stool stands for SLURM tool !\npython -m bytelatent.stool script=bytelatent.train config=bytelatent\u002Fconfigs\u002Fdebug.yaml nodes=1 partition=\u003Cpartition>\n# if you want to launch locally you can use torchrun\ntorchrun --nproc-per-node 8 -m bytelatent.train config=bytelatent\u002Fconfigs\u002Fdebug.yaml\n# or you can also launch on 1 GPU\npython -m bytelatent.train  config=bytelatent\u002Fconfigs\u002Fdebug.yaml\n```\n\nWhen using `stool`, if a job crashes, it can be relaunched using sbatch:\n\n```bash\nsbatch path\u002Fto\u002Fdump_dir\u002Fsubmit.slurm\n```\n\n## Linting\n\nTo lint, run the following command\n\n```\nbash dev\u002Flint.sh\n```\n\n## Citation\n\nThe BLT is partially based on Meta Lingua, so consider citing it in addition to our BLT paper if you re-use our work.\n\nBLT Paper Citation (will be updated to arXiv soon)\n\n```\n@article{meta_blt,\n  author = {Artidoro Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Muller, Margaret Li, Chunting Zhou, Lili Yu, Jason Weston, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Ari Holtzman†, Srinivasan Iyer},\n  title = {Byte Latent Transformer: Patches Scale Better Than Tokens},\n  url = {https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fblt},\n  year = {2024}\n}\n```\n\nLingua Code\n\n```\n@misc{meta_lingua,\n  author = {Mathurin Videau, Badr Youbi Idrissi, Daniel Haziza, Luca Wehrstedt, Jade Copet, Olivier Teytaud, David Lopez-Paz},\n  title = {{Meta Lingua}: A minimal {PyTorch LLM} training library},\n  url = {https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Flingua},\n  year = {2024}\n}\n```\n\n## License\n\nThe BLT code is partially based on Meta Lingua.\n\nMeta BLT is licensed under CC-BY-NC-4.0 license. Refer to the LICENSE file in the top level directory.\n","Byte Latent Transformer（BLT）是一个基于字节的大型语言模型架构，首次在大规模上实现了与基于分词的语言模型相当的性能，并显著提高了推理效率和鲁棒性。该项目采用动态大小的片段来编码字节，这些片段作为主要计算单元，根据下一个字节的熵动态分割，从而在数据复杂度较高的地方分配更多的计算资源和模型容量。此外，BLT引入了新的注意力机制以最大化字节和片段隐藏表示之间的信息流，并提出了一种新型的字节序列记忆。适合于需要从原始字节数据直接进行端到端训练而无需预处理的应用场景，特别是在对长尾泛化能力和推理效率有高要求的情况下。",2,"2026-06-11 03:42:46","high_star"]