[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-229":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":9,"totalLinesOfCode":9,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":9,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":4,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":9,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":16,"starSnapshotCount":16,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},229,"candle","huggingface\u002Fcandle","huggingface","Minimalist ML framework for Rust",null,"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle","Rust",20463,1600,162,466,0,5,53,35,44.61,false,"main","2026-06-12 02:00:10","# candle\n[![discord server](https:\u002F\u002Fdcbadge.limes.pink\u002Fapi\u002Fserver\u002Fhugging-face-879548962464493619)](https:\u002F\u002Fdiscord.gg\u002Fhugging-face-879548962464493619)\n[![Latest version](https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fcandle-core.svg)](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fcandle-core)\n[![Documentation](https:\u002F\u002Fdocs.rs\u002Fcandle-core\u002Fbadge.svg)](https:\u002F\u002Fdocs.rs\u002Fcandle-core)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fbase-org\u002Fnode?color=blue)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fblob\u002Fmain\u002FLICENSE-MIT)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-blue?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fblob\u002Fmain\u002FLICENSE-APACHE)\n\nCandle is a minimalist ML framework for Rust with a focus on performance (including GPU support) \nand ease of use. Try our online demos: \n[whisper](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-whisper),\n[LLaMA2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2),\n[T5](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-T5-Generation-Wasm),\n[yolo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-yolo),\n[Segment\nAnything](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002Fcandle-segment-anything-wasm).\n\n## Get started\n\nMake sure that you have [`candle-core`](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Ftree\u002Fmain\u002Fcandle-core) correctly installed as described in [**Installation**](https:\u002F\u002Fhuggingface.github.io\u002Fcandle\u002Fguide\u002Finstallation.html).\n\nLet's see how to run a simple matrix multiplication.\nWrite the following to your `myapp\u002Fsrc\u002Fmain.rs` file:\n```rust\nuse candle_core::{Device, Tensor};\n\nfn main() -> Result\u003C(), Box\u003Cdyn std::error::Error>> {\n    let device = Device::Cpu;\n\n    let a = Tensor::randn(0f32, 1., (2, 3), &device)?;\n    let b = Tensor::randn(0f32, 1., (3, 4), &device)?;\n\n    let c = a.matmul(&b)?;\n    println!(\"{c}\");\n    Ok(())\n}\n```\n\n`cargo run` should display a tensor of shape `Tensor[[2, 4], f32]`.\n\n\nHaving installed `candle` with Cuda support, simply define the `device` to be on GPU:\n\n```diff\n- let device = Device::Cpu;\n+ let device = Device::new_cuda(0)?;\n```\n\nFor more advanced examples, please have a look at the following section.\n\n## Check out our examples\n\nThese online demos run entirely in your browser:\n- [yolo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-yolo): pose estimation and\n  object recognition.\n- [whisper](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-whisper): speech recognition.\n- [LLaMA2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2): text generation.\n- [T5](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-T5-Generation-Wasm): text generation.\n- [Phi-1.5, and Phi-2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-Phi-1.5-Wasm): text generation.\n- [Segment Anything Model](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002Fcandle-segment-anything-wasm): Image segmentation.\n- [BLIP](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-BLIP-Image-Captioning): image captioning.\n\nWe also provide some command line based examples using state of the art models:\n\n- [LLaMA v1, v2, and v3](.\u002Fcandle-examples\u002Fexamples\u002Fllama\u002F): general LLM, includes\n  the SOLAR-10.7B variant.\n- [Falcon](.\u002Fcandle-examples\u002Fexamples\u002Ffalcon\u002F): general LLM.\n- [Codegeex4](.\u002Fcandle-examples\u002Fexamples\u002Fcodegeex4-9b\u002F): Code completion, code interpreter, web search, function calling, repository-level\n- [GLM4](.\u002Fcandle-examples\u002Fexamples\u002Fglm4\u002F): Open Multilingual Multimodal Chat LMs by THUDM\n- [Gemma v1 and v2](.\u002Fcandle-examples\u002Fexamples\u002Fgemma\u002F): 2b and 7b+\u002F9b general LLMs from Google Deepmind.\n- [RecurrentGemma](.\u002Fcandle-examples\u002Fexamples\u002Frecurrent-gemma\u002F): 2b and 7b\n  Griffin based models from Google that mix attention with a RNN like state.\n- [Phi-1, Phi-1.5, Phi-2, and Phi-3](.\u002Fcandle-examples\u002Fexamples\u002Fphi\u002F): 1.3b,\n  2.7b, and 3.8b general LLMs with performance on par with 7b models.\n- [StableLM-3B-4E1T](.\u002Fcandle-examples\u002Fexamples\u002Fstable-lm\u002F): a 3b general LLM\n  pre-trained on 1T tokens of English and code datasets. Also supports\n  StableLM-2, a 1.6b LLM trained on 2T tokens, as well as the code variants.\n- [Mamba](.\u002Fcandle-examples\u002Fexamples\u002Fmamba\u002F): an inference only\n  implementation of the Mamba state space model.\n- [Mistral7b-v0.1](.\u002Fcandle-examples\u002Fexamples\u002Fmistral\u002F): a 7b general LLM with\n  better performance than all publicly available 13b models as of 2023-09-28.\n- [Mixtral8x7b-v0.1](.\u002Fcandle-examples\u002Fexamples\u002Fmixtral\u002F): a sparse mixture of\n  experts 8x7b general LLM with better performance than a Llama 2 70B model with\n  much faster inference.\n- [StarCoder](.\u002Fcandle-examples\u002Fexamples\u002Fbigcode\u002F) and\n  [StarCoder2](.\u002Fcandle-examples\u002Fexamples\u002Fstarcoder2\u002F): LLM specialized to code generation.\n- [Qwen1.5](.\u002Fcandle-examples\u002Fexamples\u002Fqwen\u002F): Bilingual (English\u002FChinese) LLMs.\n- [RWKV v5 and v6](.\u002Fcandle-examples\u002Fexamples\u002Frwkv\u002F): An RNN with transformer level LLM\n  performance.\n- [Replit-code-v1.5](.\u002Fcandle-examples\u002Fexamples\u002Freplit-code\u002F): a 3.3b LLM specialized for code completion.\n- [Yi-6B \u002F Yi-34B](.\u002Fcandle-examples\u002Fexamples\u002Fyi\u002F): two bilingual\n  (English\u002FChinese) general LLMs with 6b and 34b parameters.\n- [Quantized LLaMA](.\u002Fcandle-examples\u002Fexamples\u002Fquantized\u002F): quantized version of\n  the LLaMA model using the same quantization techniques as\n  [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp).\n- [Quantized Qwen3 MoE](.\u002Fcandle-examples\u002Fexamples\u002Fquantized-qwen3-moe\u002F): support gguf quantized models of Qwen3 MoE models.\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fraw\u002Fmain\u002Fcandle-examples\u002Fexamples\u002Fquantized\u002Fassets\u002Faoc.gif\" width=\"600\">\n  \n- [Stable Diffusion](.\u002Fcandle-examples\u002Fexamples\u002Fstable-diffusion\u002F): text to\n  image generative model, support for the 1.5, 2.1, SDXL 1.0 and Turbo versions.\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fraw\u002Fmain\u002Fcandle-examples\u002Fexamples\u002Fstable-diffusion\u002Fassets\u002Fstable-diffusion-xl.jpg\" width=\"200\">\n\n- [Wuerstchen](.\u002Fcandle-examples\u002Fexamples\u002Fwuerstchen\u002F): another text to\n  image generative model.\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fraw\u002Fmain\u002Fcandle-examples\u002Fexamples\u002Fwuerstchen\u002Fassets\u002Fcat.jpg\" width=\"200\">\n\n- [yolo-v3](.\u002Fcandle-examples\u002Fexamples\u002Fyolo-v3\u002F) and\n  [yolo-v8](.\u002Fcandle-examples\u002Fexamples\u002Fyolo-v8\u002F): object detection and pose\n  estimation models.\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fraw\u002Fmain\u002Fcandle-examples\u002Fexamples\u002Fyolo-v8\u002Fassets\u002Fbike.od.jpg\" width=\"200\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fraw\u002Fmain\u002Fcandle-examples\u002Fexamples\u002Fyolo-v8\u002Fassets\u002Fbike.pose.jpg\" width=\"200\">\n- [segment-anything](.\u002Fcandle-examples\u002Fexamples\u002Fsegment-anything\u002F): image\n  segmentation model with prompt.\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fraw\u002Fmain\u002Fcandle-examples\u002Fexamples\u002Fsegment-anything\u002Fassets\u002Fsam_merged.jpg\" width=\"200\">\n\n- [SegFormer](.\u002Fcandle-examples\u002Fexamples\u002Fsegformer\u002F): transformer based semantic segmentation model.\n- [Whisper](.\u002Fcandle-examples\u002Fexamples\u002Fwhisper\u002F): speech recognition model.\n- [EnCodec](.\u002Fcandle-examples\u002Fexamples\u002Fencodec\u002F): high-quality audio compression\n  model using residual vector quantization.\n- [MetaVoice](.\u002Fcandle-examples\u002Fexamples\u002Fmetavoice\u002F): foundational model for\n  text-to-speech.\n- [Parler-TTS](.\u002Fcandle-examples\u002Fexamples\u002Fparler-tts\u002F): large text-to-speech\n  model.\n- [T5](.\u002Fcandle-examples\u002Fexamples\u002Ft5), [Bert](.\u002Fcandle-examples\u002Fexamples\u002Fbert\u002F),\n  [JinaBert](.\u002Fcandle-examples\u002Fexamples\u002Fjina-bert\u002F) : useful for sentence embeddings.\n- [DINOv2](.\u002Fcandle-examples\u002Fexamples\u002Fdinov2\u002F): computer vision model trained\n  using self-supervision (can be used for imagenet classification, depth\n  evaluation, segmentation).\n- [VGG](.\u002Fcandle-examples\u002Fexamples\u002Fvgg\u002F),\n  [RepVGG](.\u002Fcandle-examples\u002Fexamples\u002Frepvgg): computer vision models.\n- [BLIP](.\u002Fcandle-examples\u002Fexamples\u002Fblip\u002F): image to text model, can be used to\n  generate captions for an image.\n- [CLIP](.\u002Fcandle-examples\u002Fexamples\u002Fclip\u002F): multi-model vision and language\n  model.\n- [TrOCR](.\u002Fcandle-examples\u002Fexamples\u002Ftrocr\u002F): a transformer OCR model, with\n  dedicated submodels for hand-writing and printed recognition.\n- [Marian-MT](.\u002Fcandle-examples\u002Fexamples\u002Fmarian-mt\u002F): neural machine translation\n  model, generates the translated text from the input text.\n- [Moondream](.\u002Fcandle-examples\u002Fexamples\u002Fmoondream\u002F): tiny computer-vision model \n  that can answer real-world questions about images.\n\nRun them using commands like:\n```\ncargo run --example quantized --release\n```\n\nIn order to use **CUDA** add `--features cuda` to the example command line. If\nyou have cuDNN installed, use `--features cudnn` for even more speedups.\n\nThere are also some wasm examples for whisper and\n[llama2.c](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fllama2.c). You can either build them with\n`trunk` or try them online:\n[whisper](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-whisper),\n[llama2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2),\n[T5](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-T5-Generation-Wasm),\n[Phi-1.5, and Phi-2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-Phi-1.5-Wasm),\n[Segment Anything Model](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002Fcandle-segment-anything-wasm).\n\nFor LLaMA2, run the following command to retrieve the weight files and start a\ntest server:\n```bash\n# install target platform 'wasm32-unknown-unknown'\nrustup target add wasm32-unknown-unknown\n\ncd candle-wasm-examples\u002Fllama2-c\nwget https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2\u002Fresolve\u002Fmain\u002Fmodel.bin\nwget https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2\u002Fresolve\u002Fmain\u002Ftokenizer.json\ntrunk serve --release --port 8081\n```\nAnd then head over to\n[http:\u002F\u002Flocalhost:8081\u002F](http:\u002F\u002Flocalhost:8081\u002F).\n\n\u003C!--- ANCHOR: useful_libraries --->\n\n## Useful External Resources\n- [`candle-tutorial`](https:\u002F\u002Fgithub.com\u002FToluClassics\u002Fcandle-tutorial): A\n  very detailed tutorial showing how to convert a PyTorch model to Candle.\n- [`candle-lora`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-lora): Efficient and\n  ergonomic LoRA implementation for Candle. `candle-lora` has      \n  out-of-the-box LoRA support for many models from Candle, which can be found\n  [here](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-lora\u002Ftree\u002Fmaster\u002Fcandle-lora-transformers\u002Fexamples).\n- [`candle-video`](https:\u002F\u002Fgithub.com\u002FFerrisMind\u002Fcandle-video): Rust library for text-to-video generation (LTX-Video and related models) built on Candle, focused on fast, Python-free inference.\n- [`optimisers`](https:\u002F\u002Fgithub.com\u002FKGrewal1\u002Foptimisers): A collection of optimisers\n  including SGD with momentum, AdaGrad, AdaDelta, AdaMax, NAdam, RAdam, and RMSprop.\n- [`candle-vllm`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-vllm): Efficient platform for inference and\n  serving local LLMs including an OpenAI compatible API server.\n- [`candle-ext`](https:\u002F\u002Fgithub.com\u002Fmokeyish\u002Fcandle-ext): An extension library to Candle that provides PyTorch functions not currently available in Candle.\n- [`candle-coursera-ml`](https:\u002F\u002Fgithub.com\u002Fvishpat\u002Fcandle-coursera-ml): Implementation of ML algorithms from Coursera's [Machine Learning Specialization](https:\u002F\u002Fwww.coursera.org\u002Fspecializations\u002Fmachine-learning-introduction) course.\n- [`kalosm`](https:\u002F\u002Fgithub.com\u002Ffloneum\u002Ffloneum\u002Ftree\u002Fmaster\u002Finterfaces\u002Fkalosm): A multi-modal meta-framework in Rust for interfacing with local pre-trained models with support for controlled generation, custom samplers, in-memory vector databases, audio transcription, and more.\n- [`candle-sampling`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-sampling): Sampling techniques for Candle.\n- [`gpt-from-scratch-rs`](https:\u002F\u002Fgithub.com\u002Fjeroenvlek\u002Fgpt-from-scratch-rs): A port of Andrej Karpathy's _Let's build GPT_ tutorial on YouTube showcasing the Candle API on a toy problem.\n- [`candle-einops`](https:\u002F\u002Fgithub.com\u002Ftomsanbear\u002Fcandle-einops): A pure rust implementation of the python [einops](https:\u002F\u002Fgithub.com\u002Farogozhnikov\u002Feinops) library.\n- [`atoma-infer`](https:\u002F\u002Fgithub.com\u002Fatoma-network\u002Fatoma-infer): A Rust library for fast inference at scale, leveraging FlashAttention2 for efficient attention computation, PagedAttention for efficient KV-cache memory management, and multi-GPU support. It is OpenAI api compatible.\n- [`llms-from-scratch-rs`](https:\u002F\u002Fgithub.com\u002Fnerdai\u002Fllms-from-scratch-rs): A comprehensive Rust translation of the code from Sebastian Raschka's Build an LLM from Scratch book.\n- [`vllm.rs`](https:\u002F\u002Fgithub.com\u002Fguoqingbao\u002Fvllm.rs): A minimalist vLLM implementation in Rust based on Candle.\n\nIf you have an addition to this list, please submit a pull request.\n\n\u003C!--- ANCHOR_END: useful_libraries --->\n\n\u003C!--- ANCHOR: features --->\n\n## Features\n\n- Simple syntax, looks and feels like PyTorch.\n    - Model training.\n    - Embed user-defined ops\u002Fkernels, such as [flash-attention v2](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fblob\u002F89ba005962495f2bfbda286e185e9c3c7f5300a3\u002Fcandle-flash-attn\u002Fsrc\u002Flib.rs#L152).\n- Backends.\n    - Optimized CPU backend with optional MKL support for x86 and Accelerate for macs.\n    - CUDA backend for efficiently running on GPUs, multiple GPU distribution via NCCL.\n    - WASM support, run your models in a browser.\n- Included models.\n    - Language Models.\n        - LLaMA v1, v2, and v3 with variants such as SOLAR-10.7B.\n        - Falcon.\n        - StarCoder, StarCoder2.\n        - Phi 1, 1.5, 2, and 3.\n        - Mamba, Minimal Mamba\n        - Gemma v1 2b and 7b+, v2 2b and 9b.\n        - Mistral 7b v0.1.\n        - Mixtral 8x7b v0.1.\n        - StableLM-3B-4E1T, StableLM-2-1.6B, Stable-Code-3B.\n        - Replit-code-v1.5-3B.\n        - Bert.\n        - Yi-6B and Yi-34B.\n        - Qwen1.5, Qwen1.5 MoE, Qwen3 MoE.\n        - RWKV v5 and v6.\n    - Quantized LLMs.\n        - Llama 7b, 13b, 70b, as well as the chat and code variants.\n        - Mistral 7b, and 7b instruct.\n        - Mixtral 8x7b.\n        - Zephyr 7b a and b (Mistral-7b based).\n        - OpenChat 3.5 (Mistral-7b based).\n        - Qwen3 MoE (16B-A3B, 32B-A3B)\n    - Text to text.\n        - T5 and its variants: FlanT5, UL2, MADLAD400 (translation), CoEdit (Grammar correction).\n        - Marian MT (Machine Translation).\n    - Text to image.\n        - Stable Diffusion v1.5, v2.1, XL v1.0.\n        - Wurstchen v2.\n    - Image to text.\n        - BLIP.\n        - TrOCR.\n    - Audio.\n        - Whisper, multi-lingual speech-to-text.\n        - EnCodec, audio compression model.\n        - MetaVoice-1B, text-to-speech model.\n        - Parler-TTS, text-to-speech model.\n    - Computer Vision Models.\n        - DINOv2, ConvMixer, EfficientNet, ResNet, ViT, VGG, RepVGG, ConvNeXT,\n          ConvNeXTv2, MobileOne, EfficientVit (MSRA), MobileNetv4, Hiera, FastViT.\n        - yolo-v3, yolo-v8.\n        - Segment-Anything Model (SAM).\n        - SegFormer.\n- File formats: load models from safetensors, npz, ggml, or PyTorch files.\n- Serverless (on CPU), small and fast deployments.\n- Quantization support using the llama.cpp quantized types.\n\n\u003C!--- ANCHOR_END: features --->\n\n## How to use\n\n\u003C!--- ANCHOR: cheatsheet --->\nCheatsheet:\n\n|            | Using PyTorch                            | Using Candle                                                     |\n|------------|------------------------------------------|------------------------------------------------------------------|\n| Creation   | `torch.Tensor([[1, 2], [3, 4]])`         | `Tensor::new(&[[1f32, 2.], [3., 4.]], &Device::Cpu)?`           |\n| Creation   | `torch.zeros((2, 2))`                    | `Tensor::zeros((2, 2), DType::F32, &Device::Cpu)?`               |\n| Indexing   | `tensor[:, :4]`                          | `tensor.i((.., ..4))?`                                           |\n| Operations | `tensor.view((2, 2))`                    | `tensor.reshape((2, 2))?`                                        |\n| Operations | `a.matmul(b)`                            | `a.matmul(&b)?`                                                  |\n| Arithmetic | `a + b`                                  | `&a + &b`                                                        |\n| Device     | `tensor.to(device=\"cuda\")`               | `tensor.to_device(&Device::new_cuda(0)?)?`                            |\n| Dtype      | `tensor.to(dtype=torch.float16)`         | `tensor.to_dtype(&DType::F16)?`                                  |\n| Saving     | `torch.save({\"A\": A}, \"model.bin\")`      | `candle::safetensors::save(&HashMap::from([(\"A\", A)]), \"model.safetensors\")?` |\n| Loading    | `weights = torch.load(\"model.bin\")`      | `candle::safetensors::load(\"model.safetensors\", &device)`        |\n\n\u003C!--- ANCHOR_END: cheatsheet --->\n\n\n## Structure\n\n- [candle-core](.\u002Fcandle-core): Core ops, devices, and `Tensor` struct definition\n- [candle-nn](.\u002Fcandle-nn\u002F): Tools to build real models\n- [candle-examples](.\u002Fcandle-examples\u002F): Examples of using the library in realistic settings\n- [candle-kernels](.\u002Fcandle-kernels\u002F): CUDA custom kernels\n- [candle-datasets](.\u002Fcandle-datasets\u002F): Datasets and data loaders.\n- [candle-transformers](.\u002Fcandle-transformers): transformers-related utilities.\n- [candle-flash-attn](.\u002Fcandle-flash-attn): Flash attention v2 layer.\n- [candle-onnx](.\u002Fcandle-onnx\u002F): ONNX model evaluation.\n\n## FAQ\n\n### Why should I use Candle?\n\n\u003C!--- ANCHOR: goals --->\n\nCandle's core goal is to *make serverless inference possible*. Full machine learning frameworks like PyTorch\nare very large, which makes creating instances on a cluster slow. Candle allows deployment of lightweight\nbinaries.\n\nSecondly, Candle lets you *remove Python* from production workloads. Python overhead can seriously hurt performance,\nand the [GIL](https:\u002F\u002Fwww.backblaze.com\u002Fblog\u002Fthe-python-gil-past-present-and-future\u002F) is a notorious source of headaches.\n\nFinally, Rust is cool! A lot of the HF ecosystem already has Rust crates, like [safetensors](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fsafetensors) and [tokenizers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftokenizers).\n\n\u003C!--- ANCHOR_END: goals --->\n\n### Other ML frameworks\n\n- [dfdx](https:\u002F\u002Fgithub.com\u002Fcoreylowman\u002Fdfdx) is a formidable crate, with shapes being included\n  in types. This prevents a lot of headaches by getting the compiler to complain about shape mismatches right off the bat.\n  However, we found that some features still require nightly, and writing code can be a bit daunting for non rust experts.\n\n  We're leveraging and contributing to other core crates for the runtime so hopefully both crates can benefit from each\n  other.\n\n- [burn](https:\u002F\u002Fgithub.com\u002Fburn-rs\u002Fburn) is a general crate that can leverage multiple backends so you can choose the best\n  engine for your workload.\n\n- [tch-rs](https:\u002F\u002Fgithub.com\u002FLaurentMazare\u002Ftch-rs.git) Bindings to the torch library in Rust. Extremely versatile, but they \n  bring in the entire torch library into the runtime. The main contributor of `tch-rs` is also involved in the development\n  of `candle`.\n\n### Common Errors\n\n#### Missing symbols when compiling with the mkl feature.\n\nIf you get some missing symbols when compiling binaries\u002Ftests using the mkl\nor accelerate features, e.g. for mkl you get:\n```\n  = note: \u002Fusr\u002Fbin\u002Fld: (....o): in function `blas::sgemm':\n          ...\u002Fblas-0.22.0\u002Fsrc\u002Flib.rs:1944: undefined reference to `sgemm_' collect2: error: ld returned 1 exit status\n\n  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified\n  = note: use the `-l` flag to specify native libraries to link\n  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo\n```\nor for accelerate:\n```\nUndefined symbols for architecture arm64:\n            \"_dgemm_\", referenced from:\n                candle_core::accelerate::dgemm::h1b71a038552bcabe in libcandle_core...\n            \"_sgemm_\", referenced from:\n                candle_core::accelerate::sgemm::h2cf21c592cba3c47 in libcandle_core...\n          ld: symbol(s) not found for architecture arm64\n```\n\nThis is likely due to a missing linker flag that was needed to enable the mkl library. You\ncan try adding the following for mkl at the top of your binary:\n```rust\nextern crate intel_mkl_src;\n```\nor for accelerate:\n```rust\nextern crate accelerate_src;\n```\n\n#### Cannot run the LLaMA examples: access to source requires login credentials\n\n```\nError: request error: https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-hf\u002Fresolve\u002Fmain\u002Ftokenizer.json: status code 401\n```\n\nThis is likely because you're not permissioned for the LLaMA-v2 model. To fix\nthis, you have to register on the huggingface-hub, accept the [LLaMA-v2 model\nconditions](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-hf), and set up your\nauthentication token. See issue\n[#350](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F350) for more details.\n\n#### Docker build\n\nWhen building CUDA kernels inside a Dockerfile, nvidia-smi cannot be used to auto-detect compute capability.\n\nYou must explicitly set CUDA_COMPUTE_CAP, for example:\n\n```\nFROM nvidia\u002Fcuda:12.9.0-devel-ubuntu22.04\n\n# Install git and curl\nRUN set -eux; \\\n  apt-get update; \\\n  apt-get install -y curl git ca-certificates;\n\n# Install Rust\nRUN curl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fsh.rustup.rs | sh -s -- -y\n\n# Clone candle repo\nRUN git clone https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle.git\n\n# Set compute capability for the build\nARG CUDA_COMPUTE_CAP=90\nENV CUDA_COMPUTE_CAP=${CUDA_COMPUTE_CAP}\n\n# Build with explicit compute cap\nWORKDIR \u002Fapp\nCOPY . .\nRUN cargo build --release features cuda\n```\n\n#### Compiling with flash-attention fails\n\n```\n\u002Fusr\u002Finclude\u002Fc++\u002F11\u002Fbits\u002Fstd_function.h:530:146: error: parameter packs not expanded with ‘...’:\n```\n\nThis is a bug in gcc-11 triggered by the Cuda compiler. To fix this, install a different, supported gcc version - for example gcc-10, and specify the path to the compiler in the NVCC_CCBIN environment variable.\n```\nenv NVCC_CCBIN=\u002Fusr\u002Flib\u002Fgcc\u002Fx86_64-linux-gnu\u002F10 cargo ...\n```\n\n#### Linking error on windows when running rustdoc or mdbook tests\n\n```\nCouldn't compile the test.\n---- .\\candle-book\\src\\inference\\hub.md - Using_the_hub::Using_in_a_real_model_ (line 50) stdout ----\nerror: linking with `link.exe` failed: exit code: 1181\n\u002F\u002Fvery long chain of linking\n = note: LINK : fatal error LNK1181: cannot open input file 'windows.0.48.5.lib'\n```\n\nMake sure you link all native libraries that might be located outside a project target, e.g., to run mdbook tests, you should run:\n\n```\nmdbook test candle-book -L .\\target\\debug\\deps\\ `\n-L native=$env:USERPROFILE\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\windows_x86_64_msvc-0.42.2\\lib `\n-L native=$env:USERPROFILE\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\windows_x86_64_msvc-0.48.5\\lib\n```\n\n#### Extremely slow model load time with WSL\n\nThis may be caused by the models being loaded from `\u002Fmnt\u002Fc`, more details on\n[stackoverflow](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F68972448\u002Fwhy-is-wsl-extremely-slow-when-compared-with-native-windows-npm-yarn-processing).\n\n#### Tracking down errors\n\nYou can set `RUST_BACKTRACE=1` to be provided with backtraces when a candle\nerror is generated.\n\n#### CudaRC error\n\nIf you encounter an error like this one `called `Result::unwrap()` on an `Err` value: LoadLibraryExW { source: Os { code: 126, kind: Uncategorized, message: \"The specified module could not be found.\" } }` on windows. To fix copy and rename these 3 files (make sure they are in path). The paths depend on your cuda version.\n`c:\\Windows\\System32\\nvcuda.dll` -> `cuda.dll`\n`c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.4\\bin\\cublas64_12.dll` -> `cublas.dll`\n`c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.4\\bin\\curand64_10.dll` -> `curand.dll`\n","Candle 是一个专为 Rust 设计的极简机器学习框架，注重性能（包括 GPU 支持）和易用性。其核心功能包括高效的矩阵运算、支持多种设备（如 CPU 和 GPU），并通过简洁的 API 使开发者能够快速上手并运行复杂的模型。Candle 提供了对 Whisper、LLaMA2、T5 等先进模型的支持，并且这些模型可以直接在浏览器中运行演示。该框架适用于需要高性能计算且偏好 Rust 语言的应用场景，例如自然语言处理、图像识别与分割等领域。",2,"2026-06-11 02:31:43","trending"]