[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73317":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":15,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},73317,"cake","evilsocket\u002Fcake","evilsocket","Distributed inference for mobile, desktop and server.","",null,"Rust",3081,195,45,12,0,4,9,53,28.88,"Other",false,"main",true,[],"2026-06-12 02:03:11","\u003Cdiv align=\"center\">\n\n# `cake`\n\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-blue)](https:\u002F\u002Fgithub.com\u002Fevilsocket\u002Fcake\u002Fblob\u002Fmain\u002Fdocs\u002Findex.md)\n[![Release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fevilsocket\u002Fcake.svg?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fevilsocket\u002Fcake\u002Freleases\u002Flatest)\n[![Rust Report](https:\u002F\u002Frust-reportcard.xuri.me\u002Fbadge\u002Fgithub.com\u002Fevilsocket\u002Fcake)](https:\u002F\u002Frust-reportcard.xuri.me\u002Freport\u002Fgithub.com\u002Fevilsocket\u002Fcake)\n[![CI](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fevilsocket\u002Fcake\u002Fci.yml)](https:\u002F\u002Fgithub.com\u002Fevilsocket\u002Fcake\u002Factions\u002Fworkflows\u002Fci.yml)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-FAIR%20v1.0.0-blue.svg?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fevilsocket\u002Fcake\u002Fblob\u002Fmain\u002FLICENSE)\n\n\u003C\u002Fdiv>\n\nCake is a **multimodal AI inference server** written in Rust that can run models as a single node, or shard them across a heterogeneous cluster of devices — iOS, Android, macOS, Linux, Windows — to run workloads that wouldn't fit on a single GPU, effectively leveraging [planned obsolescence](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPlanned_obsolescence) to make AI more accessible and democratic.\n\n\u003Cp align=\"center\">\n  \u003Cstrong>\n  This is experimental code that's being actively developed and changed very quickly.\n  \u003C\u002Fstrong>\n\u003C\u002Fp>\n\n## Key Features\n\n- **Multi Modal** — [Text generation](docs\u002Fmodels.md), [image generation](docs\u002Fimage_generation.md) (Stable Diffusion, FLUX), and [voice synthesis](docs\u002Fvoice_generation.md) (VibeVoice TTS with voice cloning).\n- **Multi Model** — [15 text model families](docs\u002Fmodels.md), 6 image model variants, and 2 TTS models. Architecture auto-detected from HuggingFace checkpoints.\n- **Multi Platform** — CUDA, Metal, Vulkan, and CPU backends across [Linux, macOS, Windows, iOS, and Android](docs\u002Finstall.md).\n- **Multi Node** — Shard transformer blocks across devices with [zero-config mDNS clustering](docs\u002Fclustering.md) or manual topology. Also runs entirely on a single machine.\n- **OpenAI-Compatible API** — REST API with streaming, plus a [built-in web UI and TUI chat client](docs\u002Fusage.md#web-ui).\n- **Docker** — [Container builds](docs\u002Fdocker.md) for Linux\u002FNVIDIA with docker-compose cluster support.\n\n## Quick Start\n\n### Build\n\n```sh\ncargo build --release --features cuda        # Linux (NVIDIA)\ncargo build --release --features metal       # macOS (Apple Silicon GPU)\ncargo build --release --features accelerate  # macOS (Apple Silicon CPU, F32 models)\ncargo build --release --features vulkan      # Linux (AMD\u002FIntel\u002FSteam Deck)\ncargo build --release                        # CPU only (portable)\n```\n\n### Models\n\nDownload models from HuggingFace with `cake pull`. Models are stored in the standard HuggingFace cache directory (`~\u002F.cache\u002Fhuggingface\u002Fhub\u002F`) and are shared with any other tools that use the same cache (transformers, huggingface-cli, etc.).\n\n```sh\ncake pull evilsocket\u002FQwen3-0.6B             # text model (600M params)\ncake pull evilsocket\u002Fflux1-dev               # image model (FLUX.1-dev FP8)\ncake pull evilsocket\u002FVibeVoice-1.5B          # voice synthesis model\n\ncake list                                    # show all locally available models\ncake rm evilsocket\u002FQwen3-0.6B               # delete a cached model\n```\n\nModels are also downloaded automatically on first use if not already cached.\n\n### Single Node\n\nRun any model locally on a single machine — architecture is auto-detected from the model's `config.json`:\n\n```sh\n# Text generation\ncake run evilsocket\u002FQwen3-0.6B \"Explain quantum computing in simple terms\"\n\n# Interactive TUI chat\ncake chat Qwen\u002FQwen3-0.6B\n\n# Start an API server + web UI\ncake serve evilsocket\u002FQwen3-0.6B\n\n# Image generation (FLUX.1-dev FP8)\ncake run evilsocket\u002Fflux1-dev --model-type image-model --image-model-arch flux1 \\\n  \"a cyberpunk cityscape at night\"\n\n# Voice synthesis with voice cloning\ncake run evilsocket\u002FVibeVoice-1.5B --model-type audio-model \\\n  --voice-prompt voice.wav \"Hello world\"\n```\n\n### Distributed\n\nShard a model across multiple machines using `--cluster-key`. Workers don't need the model data — the master automatically streams the required tensor weights over the network (compressed with zstd, verified with CRC32 checksums). Workers cache received data locally for subsequent runs.\n\n```sh\n# Start workers on any machines (no model needed)\ncake run --cluster-key mysecret --name gpu-server-1    # machine A\ncake run --cluster-key mysecret --name macbook          # machine B\n\n# Run inference from the master (has the model)\ncake run evilsocket\u002FQwen3-0.6B \"Hello\" --cluster-key mysecret\n\n# Or start an API server as the master\ncake serve evilsocket\u002FQwen3-0.6B --cluster-key mysecret\n```\n\nThe master discovers workers via mDNS, assigns layers proportionally to each device's VRAM\u002Fcompute, and pushes only the required weight shards. See the [clustering documentation](docs\u002Fclustering.md) for manual topology files and advanced configuration.\n\nFor the full usage guide and API reference, [check the project documentation](docs\u002Findex.md).\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=evilsocket\u002Fcake&type=Timeline)](https:\u002F\u002Fstar-history.com\u002F#evilsocket\u002Fcake&Timeline)\n\n## License\n\nReleased under the [FAIR License (Free for Attribution and Individual Rights) v1.0.0](LICENSE).\n\n- **Non-commercial use** (personal, educational, research, non-profit) is freely permitted under the terms of the license.\n- **Commercial use** (SaaS, paid apps, any monetization) requires visible attribution to the project and its author. See the [license](LICENSE) for details.\n- **Business use** (any use by or on behalf of a business entity) requires a signed commercial agreement with the author. Contact `evilsocket@gmail.com` for inquiries.\n\nTo see the licenses of the project dependencies, install cargo license with `cargo install cargo-license` and then run `cargo license`.","Cake 是一个用 Rust 编写的多模态 AI 推理服务器，支持在单个节点或异构设备集群上运行模型。其核心功能包括文本生成、图像生成和语音合成，并支持多种平台（Linux、macOS、Windows、iOS 和 Android）以及不同硬件后端（CUDA、Metal、Vulkan 和 CPU）。此外，Cake 提供了 OpenAI 兼容的 REST API 以及内置的 Web UI 和 TUI 聊天客户端，方便用户进行交互。它还支持通过零配置 mDNS 聚类或多节点手动拓扑来分片变压器块，从而在多个设备上高效运行大型工作负载。此项目适用于需要跨平台、多模态 AI 推理能力的应用场景，特别是那些希望利用现有硬件资源来扩展 AI 计算能力的开发者和企业。",2,"2026-06-11 03:44:59","high_star"]