[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1317":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},1317,"BitNet","microsoft\u002FBitNet","microsoft","Official inference framework for 1-bit LLMs","",null,"Python",39284,3592,350,189,0,8,109,340,47,45,"MIT License",false,"main",true,[],"2026-06-12 02:00:26","# bitnet.cpp\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n![version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fversion-1.0-blue)\n\n[\u003Cimg src=\".\u002Fassets\u002Fheader_model_release.png\" alt=\"BitNet Model on Hugging Face\" width=\"800\"\u002F>](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FBitNet-b1.58-2B-4T)\n\nTry it out via this [demo](https:\u002F\u002Fdemo-bitnet-h0h8hcfqeqhrf5gf.canadacentral-01.azurewebsites.net\u002F), or build and run it on your own [CPU](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FBitNet?tab=readme-ov-file#build-from-source) or [GPU](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FBitNet\u002Fblob\u002Fmain\u002Fgpu\u002FREADME.md).\n\nbitnet.cpp is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support **fast** and **lossless** inference of 1.58-bit models on CPU and GPU (NPU support will coming next).\n\nThe first release of bitnet.cpp is to support inference on CPUs. bitnet.cpp achieves speedups of **1.37x** to **5.07x** on ARM CPUs, with larger models experiencing greater performance gains. Additionally, it reduces energy consumption by **55.4%** to **70.0%**, further boosting overall efficiency. On x86 CPUs, speedups range from **2.37x** to **6.17x** with energy reductions between **71.9%** to **82.2%**. Furthermore, bitnet.cpp can run a 100B BitNet b1.58 model on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second), significantly enhancing the potential for running LLMs on local devices. Please refer to the [technical report](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.16144) for more details.\n\n**Latest optimization** introduces parallel kernel implementations with configurable tiling and embedding quantization support, achieving **1.15x to 2.1x** additional speedup over the original implementation across different hardware platforms and workloads. For detailed technical information, see the [optimization guide](src\u002FREADME.md).\n\n\u003Cimg src=\".\u002Fassets\u002Fperformance.png\" alt=\"performance_comparison\" width=\"800\"\u002F>\n\n\n## Demo\n\nA demo of bitnet.cpp running a BitNet b1.58 3B model on Apple M2:\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7f46b736-edec-4828-b809-4be780a3e5b1\n\n## What's New:\n- 01\u002F15\u002F2026 [BitNet CPU Inference Optimization](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FBitNet\u002Fblob\u002Fmain\u002Fsrc\u002FREADME.md) ![NEW](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNEW-red)\n- 05\u002F20\u002F2025 [BitNet Official GPU inference kernel](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FBitNet\u002Fblob\u002Fmain\u002Fgpu\u002FREADME.md)\n- 04\u002F14\u002F2025 [BitNet Official 2B Parameter Model on Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FBitNet-b1.58-2B-4T)\n- 02\u002F18\u002F2025 [Bitnet.cpp: Efficient Edge Inference for Ternary LLMs](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.11880)\n- 11\u002F08\u002F2024 [BitNet a4.8: 4-bit Activations for 1-bit LLMs](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.04965)\n- 10\u002F21\u002F2024 [1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.16144)\n- 10\u002F17\u002F2024 bitnet.cpp 1.0 released.\n- 03\u002F21\u002F2024 [The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm\u002Fblob\u002Fmaster\u002Fbitnet\u002FThe-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf)\n- 02\u002F27\u002F2024 [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.17764)\n- 10\u002F17\u002F2023 [BitNet: Scaling 1-bit Transformers for Large Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.11453)\n\n## Acknowledgements\n\nThis project is based on the [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp) framework. We would like to thank all the authors for their contributions to the open-source community. Also, bitnet.cpp's kernels are built on top of the Lookup Table methodologies pioneered in [T-MAC](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FT-MAC\u002F). For inference of general low-bit LLMs beyond ternary models, we recommend using T-MAC.\n## Official Models\n\u003Ctable>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Cth rowspan=\"2\">Model\u003C\u002Fth>\n        \u003Cth rowspan=\"2\">Parameters\u003C\u002Fth>\n        \u003Cth rowspan=\"2\">CPU\u003C\u002Fth>\n        \u003Cth colspan=\"3\">Kernel\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Cth>I2_S\u003C\u002Fth>\n        \u003Cth>TL1\u003C\u002Fth>\n        \u003Cth>TL2\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd rowspan=\"2\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FBitNet-b1.58-2B-4T\">BitNet-b1.58-2B-4T\u003C\u002Fa>\u003C\u002Ftd>\n        \u003Ctd rowspan=\"2\">2.4B\u003C\u002Ftd>\n        \u003Ctd>x86\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>ARM\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## Supported Models\n❗️**We use existing 1-bit LLMs available on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002F) to demonstrate the inference capabilities of bitnet.cpp. We hope the release of bitnet.cpp will inspire the development of 1-bit LLMs in large-scale settings in terms of model size and training tokens.**\n\n\u003Ctable>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Cth rowspan=\"2\">Model\u003C\u002Fth>\n        \u003Cth rowspan=\"2\">Parameters\u003C\u002Fth>\n        \u003Cth rowspan=\"2\">CPU\u003C\u002Fth>\n        \u003Cth colspan=\"3\">Kernel\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Cth>I2_S\u003C\u002Fth>\n        \u003Cth>TL1\u003C\u002Fth>\n        \u003Cth>TL2\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd rowspan=\"2\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F1bitLLM\u002Fbitnet_b1_58-large\">bitnet_b1_58-large\u003C\u002Fa>\u003C\u002Ftd>\n        \u003Ctd rowspan=\"2\">0.7B\u003C\u002Ftd>\n        \u003Ctd>x86\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>ARM\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd rowspan=\"2\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F1bitLLM\u002Fbitnet_b1_58-3B\">bitnet_b1_58-3B\u003C\u002Fa>\u003C\u002Ftd>\n        \u003Ctd rowspan=\"2\">3.3B\u003C\u002Ftd>\n        \u003Ctd>x86\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>ARM\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd rowspan=\"2\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FHF1BitLLM\u002FLlama3-8B-1.58-100B-tokens\">Llama3-8B-1.58-100B-tokens\u003C\u002Fa>\u003C\u002Ftd>\n        \u003Ctd rowspan=\"2\">8.0B\u003C\u002Ftd>\n        \u003Ctd>x86\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>ARM\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd rowspan=\"2\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ftiiuae\u002Ffalcon3-67605ae03578be86e4e87026\">Falcon3 Family\u003C\u002Fa>\u003C\u002Ftd>\n        \u003Ctd rowspan=\"2\">1B-10B\u003C\u002Ftd>\n        \u003Ctd>x86\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>ARM\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd rowspan=\"2\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ftiiuae\u002Ffalcon-edge-series-6804fd13344d6d8a8fa71130\">Falcon-E Family\u003C\u002Fa>\u003C\u002Ftd>\n        \u003Ctd rowspan=\"2\">1B-3B\u003C\u002Ftd>\n        \u003Ctd>x86\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>ARM\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#9989;\u003C\u002Ftd>\n        \u003Ctd>&#10060;\u003C\u002Ftd>\n    \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n\n## Installation\n\n### Requirements\n- python>=3.9\n- cmake>=3.22\n- clang>=18\n    - For Windows users, install [Visual Studio 2022](https:\u002F\u002Fvisualstudio.microsoft.com\u002Fdownloads\u002F). In the installer, toggle on at least the following options(this also automatically installs the required additional tools like CMake):\n        -  Desktop-development with C++\n        -  C++-CMake Tools for Windows\n        -  Git for Windows\n        -  C++-Clang Compiler for Windows\n        -  MS-Build Support for LLVM-Toolset (clang)\n    - For Debian\u002FUbuntu users, you can download with [Automatic installation script](https:\u002F\u002Fapt.llvm.org\u002F)\n\n        `bash -c \"$(wget -O - https:\u002F\u002Fapt.llvm.org\u002Fllvm.sh)\"`\n- conda (highly recommend)\n\n### Build from source\n\n> [!IMPORTANT]\n> If you are using Windows, please remember to always use a Developer Command Prompt \u002F PowerShell for VS2022 for the following commands. Please refer to the FAQs below if you see any issues.\n\n1. Clone the repo\n```bash\ngit clone --recursive https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FBitNet.git\ncd BitNet\n```\n2. Install the dependencies\n```bash\n# (Recommended) Create a new conda environment\nconda create -n bitnet-cpp python=3.9\nconda activate bitnet-cpp\n\npip install -r requirements.txt\n```\n3. Build the project\n```bash\n# Manually download the model and run with local path\nhuggingface-cli download microsoft\u002FBitNet-b1.58-2B-4T-gguf --local-dir models\u002FBitNet-b1.58-2B-4T\npython setup_env.py -md models\u002FBitNet-b1.58-2B-4T -q i2_s\n\n```\n\u003Cpre>\nusage: setup_env.py [-h] [--hf-repo {1bitLLM\u002Fbitnet_b1_58-large,1bitLLM\u002Fbitnet_b1_58-3B,HF1BitLLM\u002FLlama3-8B-1.58-100B-tokens,tiiuae\u002FFalcon3-1B-Instruct-1.58bit,tiiuae\u002FFalcon3-3B-Instruct-1.58bit,tiiuae\u002FFalcon3-7B-Instruct-1.58bit,tiiuae\u002FFalcon3-10B-Instruct-1.58bit}] [--model-dir MODEL_DIR] [--log-dir LOG_DIR] [--quant-type {i2_s,tl1}] [--quant-embd]\n                    [--use-pretuned]\n\nSetup the environment for running inference\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --hf-repo {1bitLLM\u002Fbitnet_b1_58-large,1bitLLM\u002Fbitnet_b1_58-3B,HF1BitLLM\u002FLlama3-8B-1.58-100B-tokens,tiiuae\u002FFalcon3-1B-Instruct-1.58bit,tiiuae\u002FFalcon3-3B-Instruct-1.58bit,tiiuae\u002FFalcon3-7B-Instruct-1.58bit,tiiuae\u002FFalcon3-10B-Instruct-1.58bit}, -hr {1bitLLM\u002Fbitnet_b1_58-large,1bitLLM\u002Fbitnet_b1_58-3B,HF1BitLLM\u002FLlama3-8B-1.58-100B-tokens,tiiuae\u002FFalcon3-1B-Instruct-1.58bit,tiiuae\u002FFalcon3-3B-Instruct-1.58bit,tiiuae\u002FFalcon3-7B-Instruct-1.58bit,tiiuae\u002FFalcon3-10B-Instruct-1.58bit}\n                        Model used for inference\n  --model-dir MODEL_DIR, -md MODEL_DIR\n                        Directory to save\u002Fload the model\n  --log-dir LOG_DIR, -ld LOG_DIR\n                        Directory to save the logging info\n  --quant-type {i2_s,tl1}, -q {i2_s,tl1}\n                        Quantization type\n  --quant-embd          Quantize the embeddings to f16\n  --use-pretuned, -p    Use the pretuned kernel parameters\n\u003C\u002Fpre>\n## Usage\n### Basic usage\n```bash\n# Run inference with the quantized model\npython run_inference.py -m models\u002FBitNet-b1.58-2B-4T\u002Fggml-model-i2_s.gguf -p \"You are a helpful assistant\" -cnv\n```\n\u003Cpre>\nusage: run_inference.py [-h] [-m MODEL] [-n N_PREDICT] -p PROMPT [-t THREADS] [-c CTX_SIZE] [-temp TEMPERATURE] [-cnv]\n\nRun inference\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -m MODEL, --model MODEL\n                        Path to model file\n  -n N_PREDICT, --n-predict N_PREDICT\n                        Number of tokens to predict when generating text\n  -p PROMPT, --prompt PROMPT\n                        Prompt to generate text from\n  -t THREADS, --threads THREADS\n                        Number of threads to use\n  -c CTX_SIZE, --ctx-size CTX_SIZE\n                        Size of the prompt context\n  -temp TEMPERATURE, --temperature TEMPERATURE\n                        Temperature, a hyperparameter that controls the randomness of the generated text\n  -cnv, --conversation  Whether to enable chat mode or not (for instruct models.)\n                        (When this option is turned on, the prompt specified by -p will be used as the system prompt.)\n\u003C\u002Fpre>\n\n### Benchmark\nWe provide scripts to run the inference benchmark providing a model.\n\n```  \nusage: e2e_benchmark.py -m MODEL [-n N_TOKEN] [-p N_PROMPT] [-t THREADS]  \n   \nSetup the environment for running the inference  \n   \nrequired arguments:  \n  -m MODEL, --model MODEL  \n                        Path to the model file. \n   \noptional arguments:  \n  -h, --help  \n                        Show this help message and exit. \n  -n N_TOKEN, --n-token N_TOKEN  \n                        Number of generated tokens. \n  -p N_PROMPT, --n-prompt N_PROMPT  \n                        Prompt to generate text from. \n  -t THREADS, --threads THREADS  \n                        Number of threads to use. \n```  \n   \nHere's a brief explanation of each argument:  \n   \n- `-m`, `--model`: The path to the model file. This is a required argument that must be provided when running the script.  \n- `-n`, `--n-token`: The number of tokens to generate during the inference. It is an optional argument with a default value of 128.  \n- `-p`, `--n-prompt`: The number of prompt tokens to use for generating text. This is an optional argument with a default value of 512.  \n- `-t`, `--threads`: The number of threads to use for running the inference. It is an optional argument with a default value of 2.  \n- `-h`, `--help`: Show the help message and exit. Use this argument to display usage information.  \n   \nFor example:  \n   \n```sh  \npython utils\u002Fe2e_benchmark.py -m \u002Fpath\u002Fto\u002Fmodel -n 200 -p 256 -t 4  \n```  \n   \nThis command would run the inference benchmark using the model located at `\u002Fpath\u002Fto\u002Fmodel`, generating 200 tokens from a 256 token prompt, utilizing 4 threads.  \n\nFor the model layout that do not supported by any public model, we provide scripts to generate a dummy model with the given model layout, and run the benchmark on your machine:\n\n```bash\npython utils\u002Fgenerate-dummy-bitnet-model.py models\u002Fbitnet_b1_58-large --outfile models\u002Fdummy-bitnet-125m.tl1.gguf --outtype tl1 --model-size 125M\n\n# Run benchmark with the generated model, use -m to specify the model path, -p to specify the prompt processed, -n to specify the number of token to generate\npython utils\u002Fe2e_benchmark.py -m models\u002Fdummy-bitnet-125m.tl1.gguf -p 512 -n 128\n```\n\n### Convert from `.safetensors` Checkpoints\n\n```sh\n# Prepare the .safetensors model file\nhuggingface-cli download microsoft\u002Fbitnet-b1.58-2B-4T-bf16 --local-dir .\u002Fmodels\u002Fbitnet-b1.58-2B-4T-bf16\n\n# Convert to gguf model\npython .\u002Futils\u002Fconvert-helper-bitnet.py .\u002Fmodels\u002Fbitnet-b1.58-2B-4T-bf16\n```\n\n### FAQ (Frequently Asked Questions)📌 \n\n#### Q1: The build dies with errors building llama.cpp due to issues with std::chrono in log.cpp?\n\n**A:**\nThis is an issue introduced in recent version of llama.cpp. Please refer to this [commit](https:\u002F\u002Fgithub.com\u002Ftinglou\u002Fllama.cpp\u002Fcommit\u002F4e3db1e3d78cc1bcd22bcb3af54bd2a4628dd323) in the [discussion](https:\u002F\u002Fgithub.com\u002Fabetlen\u002Fllama-cpp-python\u002Fissues\u002F1942) to fix this issue.\n\n#### Q2: How to build with clang in conda environment on windows?\n\n**A:** \nBefore building the project, verify your clang installation and access to Visual Studio tools by running:\n```\nclang -v\n```\n\nThis command checks that you are using the correct version of clang and that the Visual Studio tools are available. If you see an error message such as:\n```\n'clang' is not recognized as an internal or external command, operable program or batch file.\n```\n\nIt indicates that your command line window is not properly initialized for Visual Studio tools.\n\n• If you are using Command Prompt, run:\n```\n\"C:\\Program Files\\Microsoft Visual Studio\\2022\\Professional\\Common7\\Tools\\VsDevCmd.bat\" -startdir=none -arch=x64 -host_arch=x64\n```\n\n• If you are using Windows PowerShell, run the following commands:\n```\nImport-Module \"C:\\Program Files\\Microsoft Visual Studio\\2022\\Professional\\Common7\\Tools\\Microsoft.VisualStudio.DevShell.dll\" Enter-VsDevShell 3f0e31ad -SkipAutomaticLocation -DevCmdArguments \"-arch=x64 -host_arch=x64\"\n```\n\nThese steps will initialize your environment and allow you to use the correct Visual Studio tools.\n","BitNet是微软官方推出的一个用于1-bit大语言模型的推理框架。它提供了优化的内核，支持在CPU和GPU上对1.58-bit模型进行快速且无损的推理，未来还将支持NPU。该框架通过一系列技术优化，在ARM和x86架构的CPU上实现了显著的速度提升（最高可达6.17倍）及能耗降低（最高可达82.2%），并且能够在单个CPU上运行高达100B参数规模的模型。此外，最新版本引入了并行内核实现与可配置的分块及嵌入量化支持，进一步提升了不同硬件平台上的性能。BitNet特别适用于需要在边缘设备或资源受限环境中高效运行大型语言模型的应用场景。",2,"2026-06-11 02:43:01","top_all"]