[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81852":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":14,"starSnapshotCount":14,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},81852,"Camelid","timtoole02\u002FCamelid","timtoole02","Camelid: a Rust-native local inference backend with evidence-gated model compatibility.",null,"Rust",63,10,4,0,7,36,38,21,3.12,"MIT License",false,"main",true,[],"2026-06-12 02:04:20","# 🐫 Camelid\n\n[![CI][ci-badge]][ci-workflow]\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](LICENSE)\n[![Rust: 1.87+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRust-1.87%2B-orange.svg)](rust-toolchain.toml)\n\n**Camelid** is a state-of-the-art, Rust-native local inference engine for GGUF language models. It is designed for maximum speed, strict trust guarantees, and hardware saturation on modern platforms, especially Apple Silicon macOS. \n\nInstead of optimistic compatibility claims, Camelid matches trusted inference baselines with 1:1 mathematical token parity and enforces an exact-row evidence-backed supported matrix.\n\n---\n\n## ⚡ Hardware-Saturated Performance\n\nCamelid bypasses high-level compilation abstractions to write bare-metal math kernels directly in AArch64 assembly and vectorized NEON SIMD.\n\n```text\n               ┌────────────────────────────────────────────────────────┐\n               │              Camelid AArch64 SIMD Engine               │\n               └────────────────────────────────────────────────────────┘\n                                    │\n         ┌──────────────────────────┼──────────────────────────┐\n         ▼                          ▼                          ▼\n ┌───────────────┐          ┌───────────────┐          ┌───────────────┐\n │   i8mm GEMM   │          │  DotProd GEMV │          │   NEON Core   │\n │ (smmla, 4x8)  │          │ (sdot, 4x4)   │          │ (vadd, vmul)  │\n └───────────────┘          └───────────────┘          └───────────────┘\n   Prefill Phase             Decode Phase               Activation\u002FNorm\n   64 MACs\u002Fcycle             4 MACs\u002Fcycle               128-bit Vectors\n```\n\n*   **Bare-Metal AArch64 Matrix Multiplication (`smmla`)**: Emits matrix multiply-accumulate instructions in hardware, performing 64 8-bit MAC operations per clock cycle to saturate prefill throughput.\n*   **Vectorized Decode Kernels (`sdot`)**: Performs high-speed row-major matrix-vector dot-products directly on repacked Q8_0 weights, eliminating dequantization overhead on Apple Silicon.\n*   **128-bit NEON Element-Wise & Reduction Pipelines**: Vectorizes input block quantization, RMS Normalization (`rms_norm`), SiLU activations, and element-wise additions\u002Fmultiplications into parallel SIMD register operations.\n*   **Performance-Core Thread Scheduling**: Binds Rayon multi-threaded loops strictly to physical Performance (P) Cores, avoiding Efficiency core synchronization delays that derail inference latency.\n\n---\n\n## 🌐 High-Speed Distributed Clustering\n\nScale model inference across multiple machines (such as Mac minis) using high-speed interfaces like direct **Thunderbolt 4 Bridge Networking** (IP-over-Thunderbolt) for microsecond-scale bus latency.\n\n*   **Split-Model Parallelization**: Partition layers dynamically across nodes (e.g., Coordinator evaluates layers $0..K$ locally, Worker evaluates layers $K..N$).\n*   **Zero dequantization\u002Frepack overhead**: Coordinator and Worker communicate raw activations over a custom high-performance socket transport.\n*   **TCP Bus Telemetry**: Includes a built-in `bench-network` tool to physically measure link round-trip time (RTT) and throughput.\n\n---\n\n## 💎 Google Gemini-Style Frontend\n\nCamelid includes a built-in React\u002FVite web interface inside [frontend\u002F](frontend\u002F) that replicates a premium Google Gemini experience:\n*   **Curated Aesthetics**: Harmonious dark mode palettes, subtle glowing gradients, glassmorphic layout rails, and micro-animations.\n*   **Honest Readiness Signals**: Interaction panels dynamically reflect loaded GGUF capabilities, locking inputs until a model is fully validated.\n*   **Intuitive Chat Surface**: Product-forward conversation area that emphasizes responsiveness.\n\n---\n\n## 📋 Exact-Row Support Matrix\n\nCamelid enforces a strict evidence boundary. Support is exact-row only; neighboring sizes, formats, or tokenizers must have their own verification benchmarks before public readiness.\n\n| Exact Model Row | Public Status | Current Evidence Boundary |\n| :--- | :--- | :--- |\n| **TinyLlama 1.1B Chat Q8_0** | **Verified Support** | End-to-end generation, 50-token reference parity, and checked 512-context. |\n| **Llama 3.2 1B Instruct Q8_0** | **Verified Bounded Support** | Load, OpenAI API, WebUI chat, 1:1 token parity, and checked 512\u002F1024 context. |\n| **Llama 3.2 3B Instruct Q8_0** | **Smoke Supported** | Compact\u002Fbroader 50-token reference parity, API\u002FWebUI smoke, and 2048 context. |\n| **Llama 3 8B Instruct Q8_0** | **Verified Bounded Support** | completions, chat completions, WebUI validation, and lazy-Q8 read hot-paths. |\n| **Mistral-7B-Instruct-v0.3 Q8_0**| **Smoke Supported** | Exact-row load, tokenizer, 50-token parity, and checked 4096 context. |\n| **Mixtral-8x7B-Instruct-v0.1** | **Backend Support Only** | One-token MoE backend execution. Later-generation divergence remains blocked. |\n\n---\n\n## 🚀 Quickstart\n\nVerify that Camelid builds cleanly, starts the backend, and serves a live API endpoint locally.\n\n### 1) Build and Run the Server\n\nEnsure you are using Rust 1.87+ and target native CPU instruction sets during compilation:\n\n```bash\n# Build optimized for your CPU\nRUSTFLAGS=\"-C target-cpu=native\" cargo build --release\n\n# Serve a local GGUF model\n.\u002Ftarget\u002Frelease\u002Fcamelid serve --model \u002Fpath\u002Fto\u002FLlama-3.2-3B-Instruct-Q8_0.gguf --threads 4\n```\n*Note: Set `--threads` exactly to your machine's physical Performance Core count (e.g. `sysctl -a | grep hw.perflevel0.physicalcpu` on macOS) to bypass scheduling traps.*\n\n### 2) Verify Capabilities API\n\nVerify that the local capabilities discovery is reachable:\n\n```bash\ncurl -s http:\u002F\u002F127.0.0.1:8181\u002Fapi\u002Fcapabilities\n```\n\n### 3) Start the Gemini WebUI\n\nRun the React\u002FVite development server locally to access the premium front-end chat surface:\n\n```bash\ncd frontend\nnpm ci\nnpm run dev\n```\n\n---\n\n## 🤝 High-Speed Cluster Commands\n\nTo scale across two Mac minis using direct IP-over-Thunderbolt bridges:\n\n#### 1. On Worker Mini (`192.168.0.2`):\n```bash\n.\u002Ftarget\u002Frelease\u002Fcamelid serve-distributed \\\n    --role worker \\\n    --addr 192.168.0.2:8089 \\\n    --layer-range 16..32 \\\n    --model \u002Fpath\u002Fto\u002Fmodel.gguf\n```\n\n#### 2. On Coordinator Mini (`192.168.0.1`):\n```bash\n.\u002Ftarget\u002Frelease\u002Fcamelid serve-distributed \\\n    --role coordinator \\\n    --addr 192.168.0.1:8181 \\\n    --worker-addr 192.168.0.2:8089 \\\n    --layer-range 0..16 \\\n    --model \u002Fpath\u002Fto\u002Fmodel.gguf\n```\n\n---\n\n## 🧪 Validation & Formatting\n\nKeep the repository clean, fully validated, and formatted before committing changes:\n\n```bash\ncargo fmt --all -- --check\ncargo clippy --all-targets --all-features -- -D warnings\ncargo test --all-targets --all-features\n```\n\nFor front-end development, check compiling outputs:\n\n```bash\ncd frontend\nnpm run build\n```\n\n---\n\n## 🗺️ Documentation Map\n\n*   [`COMPATIBILITY.md`](COMPATIBILITY.md) — Exact support criteria and ledgers\n*   [`STATUS.md`](STATUS.md) — Active blockers, evidence checkpoints, and benchmarks\n*   [`ROADMAP.md`](ROADMAP.md) — Engineering sequencing and delivery goals\n*   [`ARCHITECTURE.md`](ARCHITECTURE.md) — Deep architectural module layouts\n\n---\n\n## 📜 License and Reference Credits\n\nCamelid is open-source and licensed under the [MIT License](LICENSE).\n\nCamelid's tokenizer, reference compatibility layouts, and validation benchmarks are inspired by and checked against [`llama.cpp`](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp) (Copyright (c) 2023-2026 The ggml authors, MIT License). Camelid maintains its original Rust-native codebase while proudly crediting the extraordinary reference work of the broader `ggml` ecosystem.\n\n[ci-badge]: https:\u002F\u002Fgithub.com\u002Ftimtoole02\u002FCamelid\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg\n[ci-workflow]: https:\u002F\u002Fgithub.com\u002Ftimtoole02\u002FCamelid\u002Factions\u002Fworkflows\u002Fci.yml\n","Camelid 是一个基于 Rust 的本地推理引擎，专为GGUF语言模型设计。其核心功能包括通过直接在AArch64汇编和向量化NEON SIMD中编写底层数学内核来实现硬件饱和性能，确保在现代平台尤其是Apple Silicon macOS上达到最大速度和严格的信任保证。该项目特别适合需要高性能本地推理的应用场景，如在苹果M系列芯片的Mac设备上运行大规模语言模型。此外，它还支持通过高速接口如Thunderbolt 4桥接网络实现分布式集群扩展，从而进一步提高处理能力。",2,"2026-06-11 04:06:56","CREATED_QUERY"]