[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-5584":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":24,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},5584,"mistral.rs","EricLBuehler\u002Fmistral.rs","EricLBuehler","Fast, flexible LLM inference","",null,"Rust",7280,623,43,226,0,34,158,12,91.39,"MIT License",false,"master",true,[26,27,28],"llm","rust","uqff","2026-06-12 04:00:25","\u003Ca name=\"top\">\u003C\u002Fa>\n\u003C!--\n\u003Ch1 align=\"center\">\n  mistral.rs\n\u003C\u002Fh1>\n-->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Fres\u002Fbanner.png\" alt=\"mistral.rs\" width=\"100%\" style=\"max-width: 800px;\">\n\u003C\u002Fdiv>\n\n\u003Ch3 align=\"center\">\nFast, flexible LLM inference.\n\u003C\u002Fh3>\n\n\u003Cp align=\"center\">\n  | \u003Ca href=\"https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002F\">\u003Cb>Documentation\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fcrates.io\u002Fcrates\u002Fmistralrs\">\u003Cb>Rust SDK\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPYTHON_SDK.html\">\u003Cb>Python SDK\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FSZrecqK8qw\">\u003Cb>Discord\u003C\u002Fb>\u003C\u002Fa> |\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fstargazers\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEricLBuehler\u002Fmistral.rs?style=social&label=Star\" alt=\"GitHub stars\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n## Latest\n\n- **Gemma 4**: Full multimodal: text, image, video, and audio input. [Guide](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FGEMMA4.html) | [Video setup](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FVIDEO.html)\n- **MXFP4 ISQ quantization**: MXFP4 with optimized decode kernels for faster, smaller models. [Quantization docs](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FQUANTS.html)\n- **Qwen 3.5 model family**: Support for the Qwen 3.5 series including vision. [Guide](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FQWEN3_5.html)\n\n## Why mistral.rs?\n\n- **Any Hugging Face model, zero config**: Just `mistralrs run -m user\u002Fmodel`.\n- **True multimodality**: Text, vision, video, and audio, speech generation, image generation, and embeddings in one engine.\n- **Full quantization control**: Choose the precise quantization you want to use, or make your own UQFF with `mistralrs quantize`.\n- **Built-in web UI**: `mistralrs serve --ui` gives you a web interface instantly.\n- **Hardware-aware**: `mistralrs tune` benchmarks your system and picks optimal quantization + device mapping.\n- **Flexible SDKs**: Python package and Rust crate to build your projects.\n- **[Agentic features](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FAGENTS.html)** — server-side tool loop, web search, MCP client, and HTTP tool dispatch\n\n## Quick Start\n\n### Install\n\n**Linux\u002FmacOS:**\n```bash\ncurl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Finstall.sh | sh\n```\n\n**Windows (PowerShell):**\n```powershell\nirm https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Finstall.ps1 | iex\n```\n\n[Manual installation & other platforms](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FINSTALLATION.html)\n\n### Run Your First Model\n\n```bash\n# Interactive chat\nmistralrs run -m Qwen\u002FQwen3-4B\n\n# One-shot prompt (no interactive session)\nmistralrs run -m Qwen\u002FQwen3-4B -i \"What is the capital of France?\"\n\n# One-shot with an image\nmistralrs run -m google\u002Fgemma-4-E4B-it --image photo.jpg -i \"Describe this image\"\n\n# Or start a server with web UI\nmistralrs serve --ui -m google\u002Fgemma-4-E4B-it\n```\n\nThen visit `http:\u002F\u002Flocalhost:1234\u002Fui` for the web chat interface.\n\n### The `mistralrs` CLI\n\nThe CLI is designed to be **zero-config**: just point it at a model and go.\n\n- **Auto-detection**: Automatically detects model architecture, quantization format, and chat template\n- **All-in-one**: Single binary for chat, server, benchmarks, and web UI (`run`, `serve`, `bench`)\n- **Hardware tuning**: Run `mistralrs tune` to automatically benchmark and configure optimal settings for your hardware\n- **Format-agnostic**: Works with Hugging Face models, GGUF files, and [UQFF quantizations](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FUQFF.html) seamlessly\n\n```bash\n# Auto-tune for your hardware and emit a config file\nmistralrs tune -m Qwen\u002FQwen3-4B --emit-config config.toml\n\n# Run using the generated config\nmistralrs from-config -f config.toml\n\n# Diagnose system issues (CUDA, Metal, HuggingFace connectivity)\nmistralrs doctor\n```\n\n[Full CLI documentation](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FCLI.html)\n\n\u003Cdetails open>\n  \u003Csummary>\u003Cb>Web Chat Demo\u003C\u002Fb>\u003C\u002Fsummary>\n  \u003Cbr>\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Fres\u002Fchat.gif\" alt=\"Web Chat UI Demo\" \u002F>\n\u003C\u002Fdetails>\n\n## What Makes It Fast\n\n**Performance**\n- Continuous batching support by default on all devices.\n- CUDA with [FlashAttention](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FFLASH_ATTENTION.html) V2\u002FV3, Metal, [multi-GPU tensor parallelism](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FDISTRIBUTED\u002FDISTRIBUTED.html)\n- [PagedAttention](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPAGED_ATTENTION.html) for high throughput continuous batching on CUDA or Apple Silicon, prefix caching (including multimodal)\n\n**Quantization** ([full docs](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FQUANTS.html))\n- [In-situ quantization (ISQ)](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FISQ.html) of any Hugging Face model\n- GGUF (2-8 bit), GPTQ, AWQ, HQQ, FP8, BNB support\n- ⭐ [Per-layer topology](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FTOPOLOGY.html): Fine-tune quantization per layer for optimal quality\u002Fspeed\n- ⭐ Auto-select fastest quant method for your hardware\n\n**Flexibility**\n- [LoRA & X-LoRA](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FADAPTER_MODELS.html) with weight merging\n- [AnyMoE](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FANYMOE.html): Create mixture-of-experts on any base model\n- [Multiple models](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002Fmulti_model\u002Foverview.html): Load\u002Funload at runtime\n\n**Agentic Features**\n- Integrated [tool calling](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FTOOL_CALLING.html) with grammar enforcement and strict schema mode\n- ⭐ Server-side [agentic loop](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FTOOL_CALLING.html#agentic-loop): auto-execute tools and feed results back\n- ⭐ [Tool dispatch URL](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FTOOL_CALLING.html#tool-dispatch-url): POST tool calls to your own endpoint\n- ⭐ [Web search integration](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FWEB_SEARCH.html) with embedding-based ranking\n- ⭐ [MCP client](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FMCP\u002Fclient.html): Connect to external tools via Process, HTTP, or WebSocket\n- Python\u002FRust [tool callbacks](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FTOOL_CALLING.html#tool-callbacks) for custom execution\n\n[Full feature documentation](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002F)\n\n## Supported Models\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Text Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Granite 4.0\n- SmolLM 3\n- DeepSeek V3\n- GPT-OSS\n- DeepSeek V2\n- Qwen 3 Next\n- Qwen 3 MoE\n- Phi 3.5 MoE\n- Qwen 3\n- GLM 4\n- GLM-4.7-Flash\n- GLM-4.7 (MoE)\n- Gemma 2\n- Qwen 2\n- Starcoder 2\n- Phi 3\n- Mixtral\n- Phi 2\n- Gemma\n- Llama\n- Mistral\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Multimodal Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Qwen 3.5\n- Qwen 3.5 MoE\n- Qwen 3-VL\n- Qwen 3-VL MoE\n- Gemma 3n\n- Llama 4\n- Gemma 3\n- Mistral 3\n- Phi 4 multimodal\n- Qwen 2.5-VL\n- MiniCPM-O\n- Llama 3.2 Vision\n- Qwen 2-VL\n- Idefics 3\n- Idefics 2\n- LLaVA Next\n- LLaVA\n- Phi 3V\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Speech Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Voxtral (ASR\u002Fspeech-to-text)\n- Dia\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Image Generation Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- FLUX\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Embedding Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Embedding Gemma\n- Qwen 3 Embedding\n\u003C\u002Fdetails>\n\n[Request a new model](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues\u002F156) | [Full compatibility tables](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FSUPPORTED_MODELS.html)\n\n## Python SDK\n\n```bash\npip install mistralrs  # or mistralrs-cuda, mistralrs-metal, mistralrs-mkl, mistralrs-accelerate\n```\n\n```python\nfrom mistralrs import Runner, Which, ChatCompletionRequest\n\nrunner = Runner(\n    which=Which.Plain(model_id=\"Qwen\u002FQwen3-4B\"),\n    in_situ_quant=\"4\",\n)\n\nres = runner.send_chat_completion_request(\n    ChatCompletionRequest(\n        model=\"default\",\n        messages=[{\"role\": \"user\", \"content\": \"Hello!\"}],\n        max_tokens=256,\n    )\n)\nprint(res.choices[0].message.content)\n```\n\n[Python SDK](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPYTHON_SDK.html) | [Installation](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPYTHON_INSTALLATION.html) | [Examples](examples\u002Fpython) | [Cookbook](examples\u002Fpython\u002Fcookbook.ipynb)\n\n## Rust SDK\n\n```bash\ncargo add mistralrs\n```\n\n```rust\nuse anyhow::Result;\nuse mistralrs::{IsqType, TextMessageRole, TextMessages, MultimodalModelBuilder};\n\n#[tokio::main]\nasync fn main() -> Result\u003C()> {\n    let model = MultimodalModelBuilder::new(\"google\u002Fgemma-4-E4B-it\")\n        .with_isq(IsqType::Q4K)\n        .with_logging()\n        .build()\n        .await?;\n\n    let messages = TextMessages::new().add_message(\n        TextMessageRole::User,\n        \"Hello!\",\n    );\n\n    let response = model.send_chat_request(messages).await?;\n\n    println!(\"{:?}\", response.choices[0].message.content);\n\n    Ok(())\n}\n```\n\n[API Docs](https:\u002F\u002Fdocs.rs\u002Fmistralrs) | [Crate](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fmistralrs) | [Examples](mistralrs\u002Fexamples)\n\n## Docker\n\nFor quick containerized deployment:\n\n```bash\ndocker pull ghcr.io\u002Fericlbuehler\u002Fmistral.rs:latest\ndocker run --gpus all -p 1234:1234 ghcr.io\u002Fericlbuehler\u002Fmistral.rs:latest \\\n  serve -m Qwen\u002FQwen3-4B\n```\n\n[Docker images](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpkgs\u002Fcontainer\u002Fmistral.rs)\n\n> For production use, we recommend installing the CLI directly for maximum flexibility.\n\n## Documentation\n\nFor complete documentation, see the **[Documentation](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002F)**.\n\n**Quick Links:**\n- [CLI Reference](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FCLI.html) - All commands and options\n- [HTTP API](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FHTTP.html) - OpenAI-compatible endpoints\n- [Quantization](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FQUANTS.html) - ISQ, GGUF, GPTQ, and more\n- [Device Mapping](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FDEVICE_MAPPING.html) - Multi-GPU and CPU offloading\n- [MCP Integration](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FMCP\u002Fclient.html) - MCP integration documentation\n- [Troubleshooting](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FTROUBLESHOOTING.html) - Common issues and solutions\n- [Configuration](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FCONFIGURATION.html) - Environment variables for configuration\n\n## Contributing\n\nContributions welcome! Please [open an issue](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues) to discuss new features or report bugs. If you want to add a new model, please contact us via an issue and we can coordinate.\n\n## Credits\n\nThis project would not be possible without the excellent work at [Candle](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle). Thank you to all [contributors](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fgraphs\u002Fcontributors)!\n\nmistral.rs is not affiliated with Mistral AI.\n\n\u003Cp align=\"right\">\n  \u003Ca href=\"#top\">Back to Top\u003C\u002Fa>\n\u003C\u002Fp>\n","mistral.rs 是一个用于快速、灵活的大规模语言模型（LLM）推理的项目。它支持任何Hugging Face模型，无需配置即可运行，并提供真正的多模态处理能力，包括文本、图像、视频和音频的输入与输出。该项目采用Rust语言编写，具有全量化控制功能，允许用户选择或自定义量化方案以优化模型性能。此外，mistral.rs 还内置了Web UI，并提供了针对不同硬件环境的自动调优功能，以及Python和Rust SDK供开发者构建自己的应用。适合需要高性能、低延迟的语言模型推理的应用场景，如聊天机器人、内容生成、多模态数据处理等。",2,"2026-06-11 03:04:13","top_language"]