[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74125":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":14,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":46,"readmeContent":47,"aiSummary":48,"trendingCount":16,"starSnapshotCount":16,"syncStatus":49,"lastSyncTime":50,"discoverSource":51},74125,"claude-code-local","nicedreamzapp\u002Fclaude-code-local","nicedreamzapp","Run Claude Code 100% on-device with local AI on Apple Silicon. MLX-native Anthropic-API server, 65 tok\u002Fs Qwen 3.5 122B, Llama 3.3 70B, Gemma 4 31B. Private, offline, airgap-ready. Built for NDA \u002F legal \u002F healthcare workflows.","https:\u002F\u002Fnicedreamzwholesale.com\u002Fairgap\u002F",null,"Python",2736,522,24,1,0,8,30,155,101.16,"MIT License",false,"main",true,[26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45],"abliterated","ai-privacy","airgap","ambient-computing","anthropic","apple-silicon","browser-agent","claude-code","gemma","llama","local-ai","local-llm","macos","mlx","mlx-lm","offline-ai","on-device-ai","private-ai","qwen","voice-ai","2026-06-12 04:01:13","\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">🧠⚡ Claude Code Local — The Lineup\u003C\u002Fh1>\n  \u003Cp align=\"center\">\n    \u003Cstrong>Three local AI brains. Four modes. One MacBook. Zero cloud.\u003Cbr>Pick your fighter and run Claude Code 100% on-device.\u003Cbr>📍 Now with \u003Ca href=\"#-new-deepseek-v4-flash-via-ds4\">DeepSeek V4 Flash · 1M-token context · via Antirez's \u003Ccode>ds4\u003C\u002Fcode> engine\u003C\u002Fa>.\u003C\u002Fstrong>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-code-local\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fnicedreamzapp\u002Fclaude-code-local?style=for-the-badge&logo=github&color=f5c542&labelColor=1f2328\" alt=\"GitHub stars\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-code-local\u002Fnetwork\u002Fmembers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fnicedreamzapp\u002Fclaude-code-local?style=for-the-badge&logo=github&color=4c9a2a&labelColor=1f2328\" alt=\"GitHub forks\">\u003C\u002Fa>\n    \u003Ca href=\"#-the-lineup--pick-your-fighter\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🥊_Lineup-3_Models-red?style=for-the-badge\" alt=\"3 Models\">\u003C\u002Fa>\n    \u003Ca href=\"#-the-modes\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🎮_Modes-4-purple?style=for-the-badge\" alt=\"4 Modes\">\u003C\u002Fa>\n    \u003Ca href=\"#-benchmarks\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F⚡_Qwen_3.5-65_tok%2Fs-brightgreen?style=for-the-badge\" alt=\"Qwen 3.5 speed\">\u003C\u002Fa>\n    \u003Ca href=\"#-benchmarks\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🚀_Claude_Code_Task-17.6s_(Qwen)-blue?style=for-the-badge\" alt=\"Claude Code task time\">\u003C\u002Fa>\n    \u003Ca href=\"#-safety--how-the-data-flows\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🔒_Privacy-100%25_Local-success?style=for-the-badge\" alt=\"100% Local\">\u003C\u002Fa>\n    \u003Ca href=\"#-hands-free-voice-mode--the-whole-loop-on-device\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🎤_Voice-Hands_Free-orange?style=for-the-badge\" alt=\"Hands-Free Voice\">\u003C\u002Fa>\n    \u003Ca href=\"#-the-complete-local-first-stack\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🪴_Ambient-Computing-ff69b4?style=for-the-badge\" alt=\"Ambient Computing\">\u003C\u002Fa>\n    \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📜_License-MIT-yellow?style=for-the-badge\" alt=\"MIT\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FZdSqgAxUW\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1497121921580404818?label=NiceDreamzApps&logo=discord&color=5865F2&style=for-the-badge\" alt=\"Join the NiceDreamzApps Discord\">\u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Cem>Built by \u003Ca href=\"https:\u002F\u002Fx.com\u002FNiceDreamzApps\">Matt Macosko\u003C\u002Fa> in Arcata, CA. Started with a chicken problem. Still figuring it out.\u003C\u002Fem>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"#-watch-the-demo--airgap-ai\">🎬 Demo\u003C\u002Fa> ·\n    \u003Ca href=\"#-the-lineup--pick-your-fighter\">🥊 Lineup\u003C\u002Fa> ·\n    \u003Ca href=\"#-the-modes\">🎮 Modes\u003C\u002Fa> ·\n    \u003Ca href=\"#-quick-start-3-commands\">🚀 Quick Start\u003C\u002Fa> ·\n    \u003Ca href=\"#-benchmarks\">📊 Benchmarks\u003C\u002Fa> ·\n    \u003Ca href=\"#-safety--how-the-data-flows\">🔒 Safety\u003C\u002Fa> ·\n    \u003Ca href=\"#-hands-free-voice-mode--the-whole-loop-on-device\">🎤 Voice\u003C\u002Fa> ·\n    \u003Ca href=\"#-the-complete-local-first-stack\">🧩 The Stack\u003C\u002Fa> ·\n    \u003Ca href=\"#-whats-next\">🛣️ Roadmap\u003C\u002Fa> ·\n    \u003Ca href=\"#-contributing--ideas\">🤝 Contribute\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ch2 align=\"center\">🎬 WATCH THE DEMO — AirGap AI\u003C\u002Fh2>\n  \u003Cp align=\"center\">\n    \u003Cstrong>A real NDA. Llama 3.3 70B. Wi-Fi physically OFF. \u003Ccode>lsof\u003C\u002Fcode> running live.\u003Cbr>\n    Watch a 70-billion-parameter model audit a confidential legal document, on-device, with the receipts on screen.\u003C\u002Fstrong>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=V_J1LpNGwmY\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.youtube.com\u002Fvi\u002FV_J1LpNGwmY\u002Fmaxresdefault.jpg\" width=\"720\" alt=\"AirGap AI — Wi-Fi OFF NDA Demo\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=V_J1LpNGwmY\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F▶_Watch_on_YouTube-FF0000?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"Watch on YouTube\">\n    \u003C\u002Fa>\n    &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002F@nicedreamzapps\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSubscribe-@nicedreamzapps-FF0000?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"Subscribe\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Cem>Built for lawyers, accountants, doctors, therapists, contractors — anyone handling other people's private stuff.\u003C\u002Fem>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fnicedreamzwholesale.com\u002Fairgap\u002F\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📞_Need_this_for_your_firm%3F-Book_a_15--min_call-black?style=for-the-badge\" alt=\"Book a call\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ch2 align=\"center\">🏁 HEXAGON SHOOTOUT — Free AI vs $100\u002Fmo Claude Code\u003C\u002Fh2>\n  \u003Cp align=\"center\">\n    \u003Cstrong>Three AIs. One laptop. Same prompt. Live counters.\u003Cbr>\n    Watch Gemma 31B local, Llama 70B local, and Claude cloud race the same HTML physics prompt on a MacBook.\u003C\u002Fstrong>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=2KeTDDodE0A\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.youtube.com\u002Fvi\u002F2KeTDDodE0A\u002Fmaxresdefault.jpg\" width=\"720\" alt=\"Hexagon Shootout — 3 AIs, 1 laptop, same prompt, live\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=2KeTDDodE0A\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F▶_Watch_on_YouTube-FF0000?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"Watch on YouTube\">\n    \u003C\u002Fa>\n    &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002F@nicedreamzapps\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSubscribe-@nicedreamzapps-FF0000?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"Subscribe\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Cem>Gemma 31B: 56s · Claude cloud: 22s · Llama 70B: 2:17 — two of three ran with zero cloud calls.\u003C\u002Fem>\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ch3 align=\"center\">🎤 Also on the channel — NarrateClaude (Hands-Free Ambient AI)\u003C\u002Fh3>\n  \u003Cp align=\"center\">\n    \u003Cem>Speak to Claude Code, hear replies in a cloned voice — 100% on-device. 2:31.\u003C\u002Fem>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4ETqEjjopUk\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.youtube.com\u002Fvi\u002F4ETqEjjopUk\u002Fmaxresdefault.jpg\" width=\"540\" alt=\"NarrateClaude Hands-Free Ambient AI Demo\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4ETqEjjopUk\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F▶_Watch_on_YouTube-FF0000?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"Watch\">\u003C\u002Fa>\n    &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002F@nicedreamzapps\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSubscribe-@nicedreamzapps-FF0000?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"Subscribe\">\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ch3 align=\"center\">🏠 New — My Mac mini at home is the AI. I just talk to it from any browser.\u003C\u002Fh3>\n  \u003Cp align=\"center\">\n    \u003Cem>Open any browser on any phone — chat with the Mac mini at home, hear it reply in your own cloned voice. 0:50.\u003C\u002Fem>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=PLbV4QtFmFY\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.youtube.com\u002Fvi\u002FPLbV4QtFmFY\u002Fmaxresdefault.jpg\" width=\"540\" alt=\"My Mac mini at home is the AI — browser-anywhere demo\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=PLbV4QtFmFY\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F▶_Watch_on_YouTube-FF0000?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"Watch\">\u003C\u002Fa>\n    &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002F@nicedreamzapps\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSubscribe-@nicedreamzapps-FF0000?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"Subscribe\">\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\n---\n\n> ## 🧩 This repo is the **BRAIN** of a 4-part local-first ambient-computing stack\n>\n> Brain (here) · 🎤 Ears+Mouth · 🌐 Hands · 📱 Phone. Each repo stands alone; together they take Claude Code off the keyboard and off the screen. **[Jump to the stack diagram →](#-the-complete-local-first-stack)**\n>\n> 🖥️ **More of my open-source software:** [**nicedreamzwholesale.com\u002Fsoftware**](https:\u002F\u002Fnicedreamzwholesale.com\u002Fsoftware\u002F)\n\n---\n\n## 🥊 The Lineup — Pick Your Fighter\n\nWe started with one model. Now we ship a **roster**. Same MLX server, same Anthropic API, swap one env var and you swap the brain — plus the brand-new `ds4` engine for DeepSeek V4 Flash slotted in via its own native Metal runtime.\n\n| | 🟢 **Gemma 4 31B** | 🔵 **Qwen 3.5 122B** | 🐳 **DeepSeek V4 Flash** ⭐ |\n|---|:---:|:---:|:---:|\n| Nickname | The Quick One | The Beast | The 1M-Context Whale |\n| Build | 4-bit IT abliterated | 4-bit MoE (A10B) | 2-bit asymmetric (ds4 GGUF) |\n| Speed | ~15 tok\u002Fs | **65 tok\u002Fs** 🚀 | ~32 tok\u002Fs |\n| Params | 31 B dense | 122 B \u002F 10 B active | **284 B \u002F 37 B active** |\n| Context | 128 K | 256 K | **1 M tokens** |\n| RAM | ~18 GB | ~75 GB | ~81 GB |\n| Disk | 18 GB | 65 GB | 81 GB (+ disk KV cache) |\n| Best at | Daily coding, fits 64 GB Mac | Max throughput, active sparsity | Long context, agentic loops |\n| Engine | MLX Native | MLX Native | [`antirez\u002Fds4`](https:\u002F\u002Fgithub.com\u002Fantirez\u002Fds4) |\n| Launcher | `Gemma 4 Code.command` | `Claude Local.command` | `DeepSeek V4 Flash.app` |\n| Min RAM to run | 32 GB | 96 GB | 128 GB |\n\n> 💡 **Fun fact:** Qwen wins raw speed because it's an MoE — only 10B of 122B params activate per token. DeepSeek V4 Flash is even bigger (284B) but only ~37B active per token, *and* it ships with on-disk KV cache so a 25k-token Claude Code system prompt prefills exactly once, ever.\n\n### 🐳 New: DeepSeek V4 Flash via `ds4`\n\nWe tested it the day Antirez (the Redis guy) shipped `ds4`. **Local DeepSeek beat cloud Claude on wall-clock time** on the same MacBook, same prompt.\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fyoutu.be\u002F7l8-s8xkpms\" target=\"_blank\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.youtube.com\u002Fvi\u002F7l8-s8xkpms\u002Fmaxresdefault.jpg\" alt=\"Three-way local AI comparison — DeepSeek V4 Flash vs Cloud Claude vs Gemma 4 31B\" width=\"640\">\n  \u003C\u002Fa>\n  \u003Cbr>\n  \u003Cem>▶ Watch on YouTube — DeepSeek V4 Flash vs Cloud Claude vs Gemma 4 31B\u003Cbr>same prompt · three completely different auroras · one MacBook\u003C\u002Fem>\n\u003C\u002Fp>\n\n| | |\n|---|---|\n| 🧠 **Engine** | [`antirez\u002Fds4`](https:\u002F\u002Fgithub.com\u002Fantirez\u002Fds4) — pure C + Metal kernels, ~few thousand lines |\n| 🤗 **Weights** | [`antirez\u002Fdeepseek-v4-gguf`](https:\u002F\u002Fhuggingface.co\u002Fantirez\u002Fdeepseek-v4-gguf) (q2: 81 GB, q4: 153 GB) |\n| 📦 **Server wrapper** | `~\u002F.local\u002Fbin\u002Fds4-server-up` (boots on demand) |\n| 🚀 **Claude Code wrapper** | `~\u002F.local\u002Fbin\u002Fclaude-ds4` (drop-in replacement for `claude`) |\n| 📏 **Context** | 1 M tokens; 200 K is sane for most agent runs |\n| 💾 **Disk KV cache** | Persists across restarts — first prefill is the only one that ever happens |\n\n### ⭐ Our Own Abliterated Upload\n\nThe Llama 3.3 70B in this lineup isn't from a generic mirror — **we packaged and uploaded our own 8-bit MLX abliterated build** to HuggingFace so anyone running this repo can pull it with one command:\n\n```bash\nMLX_MODEL=divinetribe\u002FLlama-3.3-70B-Instruct-abliterated-8bit-mlx \\\n  bash scripts\u002Fstart-mlx-server.sh\n```\n\n| | |\n|---|---|\n| 🤗 **HuggingFace** | [`divinetribe\u002FLlama-3.3-70B-Instruct-abliterated-8bit-mlx`](https:\u002F\u002Fhuggingface.co\u002Fdivinetribe\u002FLlama-3.3-70B-Instruct-abliterated-8bit-mlx) |\n| 📐 **Quant** | 8-bit affine, group size 64 |\n| 💾 **Disk** | ~75 GB (15 safetensors shards) |\n| 🧠 **Params** | 71 B dense |\n| 📏 **Context** | 128 K tokens |\n| 🔓 **Abliteration base** | [huihui-ai abliterated build](https:\u002F\u002Fhuggingface.co\u002Fhuihui-ai) of Meta's Llama 3.3 70B Instruct ([what abliteration means](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Fmlabonne\u002Fabliteration)) |\n| 🍎 **MLX conversion + 8-bit pack** | by us — chosen to preserve quality over minimal footprint |\n\n> ⚠️ **Use it responsibly.** \"Abliterated\" suppresses the model's built-in refusal direction so it doesn't refuse benign-but-edgy requests. It is **not** a general capability upgrade, and you remain bound by the upstream Llama 3.3 license.\n\n---\n\n## 🎮 The Modes\n\nFour ways to run the lineup. Each one is a double-clickable launcher in `launchers\u002F`.\n\n| Mode | What it does | Launcher |\n|---|---|---|\n| 🤖 **Code** | Run Claude Code with a local model — same UX, no API key | `Claude Local.command`, `Gemma 4 Code.command`, `Llama 70B.command` |\n| 🌐 **Browser** | Local AI controls real Brave browser via Chrome DevTools | `Browser Agent.command` |\n| 🎤 **Hands-Free Voice** | Speak in, hear replies in your cloned voice — full loop, 100% on-device | `Narrative Gemma.command` + NarrateClaude |\n| 📱 **Phone** | iMessage in → text\u002Fimage\u002Fvideo out, full pipeline | `~\u002F.claude\u002Fimessage-*.sh` |\n\n---\n\n## 🤔 What Is This?\n\nYour MacBook has a powerful GPU built right into the chip. This project uses that GPU to run **massive AI models — the same kind that power ChatGPT and Claude — entirely on your computer**.\n\n🚫 No internet needed\n💰 No monthly subscription\n🔒 No one sees your code or data\n✅ Full Claude Code experience — write code, edit files, manage projects, control your browser, or run a **full hands-free voice session** where you speak every question and hear every reply in your own cloned voice (both directions on-device)\n\n```\n         📱 You (Mac or Phone)\n          │\n     🤖 Claude Code           ← the AI coding tool you know\n          │\n     ⚡ MLX Native Server      ← our server (~1000 lines of Python)\n          │\n     🥊 Pick your fighter     ← Gemma 4 31B · Llama 3.3 70B · Qwen 3.5 122B\n          │\n     🖥️  Apple Silicon GPU    ← your M-series chip does all the work\n```\n\n---\n\n## 🔒 Safety + How the Data Flows\n\nThis is the part we're proudest of. **Your code never leaves your Mac.** Not for a model call. Not for telemetry. Not for \"anonymous analytics\". Not ever.\n\n### 🛡️ The Data-Flow Diagram\n\n```\n   ┌─────────────────────────────────────────────────────────────┐\n   │                    🖥️  YOUR MACBOOK                          │\n   │                                                             │\n   │   📝 Your code         ┌────────────────────┐               │\n   │       │                │  🤖 Claude Code     │               │\n   │       └───────────────▶│  (CLI on your Mac)  │               │\n   │                        └────────┬───────────┘               │\n   │                                 │  HTTP localhost:4000       │\n   │                                 ▼                            │\n   │                        ┌────────────────────┐               │\n   │                        │  ⚡ MLX Server      │               │\n   │                        │  (Python, ours)    │               │\n   │                        └────────┬───────────┘               │\n   │                                 │  Metal API                 │\n   │                                 ▼                            │\n   │                        ┌────────────────────┐               │\n   │                        │  🧠 Local model     │               │\n   │                        │  (Gemma·Llama·Qwen)│               │\n   │                        └────────┬───────────┘               │\n   │                                 │                            │\n   │                                 ▼                            │\n   │                        ┌────────────────────┐               │\n   │                        │  🖥️  Apple GPU      │               │\n   │                        │  (unified memory)  │               │\n   │                        └────────────────────┘               │\n   │                                                             │\n   │             🚫 ZERO outbound network calls                  │\n   │             🚫 ZERO telemetry                               │\n   │             🚫 ZERO phone-home                              │\n   └─────────────────────────────────────────────────────────────┘\n                   │\n                   ✗  ←  Nothing from *our* code crosses this line.\n                   │\n   ┌─────────────────────────────────────────────────────────────┐\n   │                    ☁️  THE INTERNET                          │\n   │                  (your code never goes here)                 │\n   └─────────────────────────────────────────────────────────────┘\n```\n\n### 🔍 What We Audited (Every Component)\n\n| Component | Source | Outbound calls | Verdict |\n|-----------|--------|:---:|:---:|\n| **server.py** (ours) | We wrote it line by line | **0** | ✅ Safe |\n| **browser agent** (separate repo) | [nicedreamzapp\u002Fbrowser-agent](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fbrowser-agent) — we wrote it | **0** (talks to localhost CDP only) | ✅ Safe |\n| **mlx-lm** | Apple ML team | **0** | ✅ Safe |\n| **MLX framework** | Apple | **0** | ✅ Safe |\n| **Model weights** | HuggingFace verified mlx-community repos | **0** at runtime | ✅ Safe |\n| **iMessage scripts** | Pure shell + AppleScript | localhost only (Studio Record port 17494) | ✅ Safe |\n| **Claude Code CLI** | Anthropic (closed-source binary) | **0** with our launchers — `lsof`-verified, only `localhost:4000` | ✅ Safe |\n\n> ✅ **Verified offline (as of v0.1.0).** Claude Code 2.1's own binary previously reached out to `api.anthropic.com` on startup for telemetry, statsig feature flags, marketplace auto-install, and the autoupdater — even with `ANTHROPIC_BASE_URL` set. PR #32 (thanks [@tadrianonet](https:\u002F\u002Fgithub.com\u002Ftadrianonet)) plugs all four channels via documented Anthropic env vars, and the new launchers set them automatically:\n>\n> ```bash\n> CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1\n> DISABLE_AUTOUPDATER=1\n> CLAUDE_CODE_DISABLE_OFFICIAL_MARKETPLACE_AUTOINSTALL=1\n> CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1\n> ```\n>\n> Run `lsof -p $(pgrep -f claude)` while a session is active — you'll see only `localhost:4000`. Your prompts, code, and completions never leave the machine. Our code (server.py, launchers, scripts) has always made **zero** outbound connections; the Claude Code CLI now matches.\n\n### 🚫 What We Ripped Out\n\n> ⚠️ We **[removed LiteLLM](https:\u002F\u002Fx.com\u002FTahseen_Rahman\u002Fstatus\u002F2035501506242240520)** after supply-chain attack concerns. Every dependency was re-audited from scratch. If a package had unexplained network calls, it didn't ship.\n\n### ✅ What This Means in Practice\n\n| Scenario | Cloud Claude | This Repo |\n|---|:---:|:---:|\n| Working with NDA \u002F proprietary code | ❌ Risky | ✅ Air-gapped (`lsof`-verified) |\n| Coding on a plane (no wifi) | ❌ Doesn't work | ✅ Works |\n| Running on a kill-switch firewall | ❌ Blocked | ✅ Works |\n| Healthcare \u002F legal \u002F finance review | ⚠️ Compliance burden | ✅ Stays on-device |\n| Worry about training-data leakage | ⚠️ Trust required | ✅ Mathematically impossible |\n\n> 🔒 **The math is simple:** if there are no outbound HTTP calls, your data cannot leak. We grep'd every file for `requests`, `urllib`, `urlopen`, `httpx`, `socket.connect` — the only network calls in the entire codebase are to `localhost`. Run `lsof -i -P` while it's running. You'll see nothing leaving your Mac.\n\n---\n\n## 📊 Benchmarks\n\nThree generations of optimization. Each one got faster.\n\n### ⚡ Speed Comparison\n\n| Generation | Approach | Speed |\n|---|---|---:|\n| 🐌 Gen 1 | Ollama | 30 tok\u002Fs |\n| 🏃 Gen 2 | llama.cpp | 41 tok\u002Fs |\n| 🚀 Gen 3 | **MLX Native (ours)** | **65 tok\u002Fs** |\n\n### ⏱️ Real-World Claude Code Task\n\nHow long to ask Claude Code to write a function:\n\n| Setup | Time |\n|---|---:|\n| 😴 Ollama + Proxy | 133 s |\n| 😐 llama.cpp + Proxy | 133 s |\n| 🔥 **MLX Native (no proxy)** | **17.6 s** |\n\n> **7.5× faster ⚡** — one change (killing the proxy) produced the entire delta. ~1000 lines of Python, no C++ fork, no generic inference backend.\n\n### 🥊 Lineup Comparison\n\n| Model | tok\u002Fs | RAM | Best For |\n|---|:---:|:---:|---|\n| 🟢 Gemma 4 31B Abliterated | ~15 | ~18 GB | Daily coding on a 64 GB Mac |\n| 🟠 Llama 3.3 70B Abliterated | ~7 | ~70 GB | Hardest reasoning, full precision |\n| 🔵 **Qwen 3.5 122B-A10B** | **65** | ~75 GB | Maximum throughput, MoE sparsity |\n\n> Qwen 122B numbers are measured on M5 Max 128 GB. Gemma and Llama are observed real-world approximations. Full benchmarks for all three pending — see [BENCHMARKS.md](docs\u002FBENCHMARKS.md).\n\n### ☁️ vs Cloud APIs\n\n| | 🖥️ **Our Local Setup** | ☁️ Claude Sonnet | ☁️ Claude Opus |\n|---|:---:|:---:|:---:|\n| Speed | 65 tok\u002Fs | ~80 tok\u002Fs | ~40 tok\u002Fs |\n| Monthly cost | **$0** 🎉 | $20-100+ | $20-100+ |\n| Privacy | **100% local** 🔒 | Cloud | Cloud |\n| Works offline | **Yes** ✈️ | No | No |\n| Data leaves your Mac | **Never** | Always | Always |\n\n> 💡 Our local setup **beats cloud Opus on raw speed** (65 vs 40 tok\u002Fs) at $0\u002Fmonth.\n\n---\n\n## 🔧 Tool-Call Reliability (v2 — March 2026)\n\nLocal models don't format tool calls perfectly. They *want* to call a tool but mix XML and JSON syntax. Claude Code sees no valid tool call, re-prompts, and the model does it again. The result: **infinite loops where the AI says \"let me do that\" but never actually does anything.**\n\nWe fixed this. Here's what was happening and what we did about it.\n\n### 🐛 The Problem\n\nThe model was generating garbled tool calls like this:\n```\n\u003Ctool_call>\n\u003Cfunction=Bash>\u003Cparameter=command>rm -rf \u002Ftmp\u002Fold\u003C\u002Fparameter>\u003C\u002Ffunction>\n\u003C\u002Ftool_call>\n```\n\nInstead of the correct JSON format Claude Code expects:\n```json\n\u003Ctool_call>\n{\"name\": \"Bash\", \"arguments\": {\"command\": \"rm -rf \u002Ftmp\u002Fold\"}}\n\u003C\u002Ftool_call>\n```\n\nThe JSON parser choked, Claude Code saw no tool call, re-prompted the model, and the model garbled it the exact same way again — creating an infinite loop.\n\n### ✅ The Fix (4 changes to `server.py`)\n\n| Change | What | Why |\n|--------|------|-----|\n| **KV Cache** | 4-bit → 8-bit, quantization starts at token 1024 | Model retains conversation context instead of \"forgetting\" earlier messages |\n| **Temperature** | 0.7 → 0.2 | Less randomness = more consistent tool formatting |\n| **Garbled Recovery** | New `recover_garbled_tool_json()` function | Catches XML-in-JSON hybrids, `\u003Cfunction=X>\u003Cparameter=Y>` inside `\u003Ctool_call>` tags, and infers tool names from parameter keys |\n| **Retry Logic** | Up to 2 retries when tool intent is detected but parsing fails | Re-prompts with explicit formatting instructions before giving up |\n\n### 🧪 Test Results\n\nWe built an automated test suite (`scripts\u002Ftest_mlx_server.py`) that sends real Anthropic API requests to the server simulating multi-step tasks — the exact kind that were failing before.\n\n```\nTest Suite: 14 tests per run\n─────────────────────────────\n  ✅ Simple Bash commands\n  ✅ Directory creation (mkdir -p)\n  ✅ File reading (Read tool)\n  ✅ Complex Bash with pipes\n  ✅ File editing (Edit tool with find\u002Freplace)\n  ✅ Multi-tool sequences (Glob → Read)\n  ✅ 5 rapid-fire sequential commands\n  ✅ Multi-step calendar scenario (create → delete → verify)\n```\n\n**Results: 98\u002F98 tests passed across 7 consecutive runs. Zero failures.**\n\nThe multi-step calendar scenario — create 12 month folders, delete all but September, verify — was the exact task that triggered infinite loops before the fix. Now it passes every time.\n\n```bash\n# Run the test suite yourself:\npython3 scripts\u002Ftest_mlx_server.py\n```\n\n### ⚙️ Tuning\n\nYou can override defaults with environment variables:\n\n| Variable | Default | What It Does |\n|----------|---------|-------------|\n| `MLX_MODEL` | `divinetribe\u002Fgemma-4-31b-it-abliterated-4bit-mlx` | Pick which fighter to load |\n| `MLX_KV_BITS` | `8` | KV cache quantization bits (4 saves memory, 8 improves coherence) |\n| `MLX_KV_QUANT_START` | `1024` | Token position where KV quantization begins |\n| `MLX_TOOL_RETRIES` | `2` | Max retries when a garbled tool call is detected |\n| `MLX_MAX_TOKENS` | `8192` | Max output tokens per response |\n| `MLX_SUPPRESS_THINKING` | `1` | Pre-fill an empty thinking block so Gemma 4 skips its reasoning chain entirely. Saves ~1 min\u002Frequest. Set to `0` if you want the model to reason before responding. |\n\n---\n\n## 📱 Control From Your Phone — Full Media Pipeline\n\nYou don't have to be at your Mac to use this. Text a command, get back a full video.\n\n```\n📱 Your iPhone                    💻 Your Mac\n     │                                │\n     │── \"find me an article  ──────>│── imessage-receive.sh reads it\n     │    and send me a video\"        │── local model plans the task\n     │                                │── Brave browser finds the article\n     │                                │── speak narrates in your voice\n     │                                │── Studio Record captures it all\n     │                                │── build_production_video.py edits it\n     │\u003C── 🎥 video in iMessage ──────│── imessage-send-video.sh ships it\n     │                                │\n   🛋️  From your couch            🖥️  At your desk\n```\n\n**Everything works — text, images, and video:**\n\n| Command | What happens | You get |\n|---|---|---|\n| \"summarize this article\" | Local model reads + replies | 💬 Text |\n| \"send me a screenshot of X\" | Claude screenshots | 📸 Image in iMessage |\n| \"screen record you doing Y\" | Records + sends | 🎥 Video in iMessage |\n| \"make me a produced video\" | Full edit pipeline | 🎬 Title card + subs |\n\n**Full pipeline repo:** [nicedreamzapp\u002Fclaude-screen-to-phone](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-screen-to-phone)\n→ Clone it, run `setup.sh`, fill in your phone number. Works with this local AI stack or Claude cloud.\n\nWe built this **before** Anthropic shipped their Dispatch feature. Same concept, but ours uses iMessage, runs on your local model, and can send back media — not just text.\n\n> 💡 **Pro tip:** Anthropic's Dispatch doesn't read your CLAUDE.md. Mention it in your message or it'll miss your custom setup. Our iMessage system doesn't have this problem.\n\n---\n\n## 💡 How We Got Here\n\nMost people trying to run Claude Code locally hit the same wall:\n\n> Claude Code speaks **Anthropic API**. Local models speak **OpenAI API**. Different languages. 🤷\n\nSo everyone builds a **proxy** to translate between them. That proxy adds latency, complexity, and breaks things.\n\n**We took a different approach:**\n\n| 🐌 What everyone else does | 🚀 What we did |\n|---|---|\n| Claude Code → **Proxy** → Ollama → Model | Claude Code → **Our Server** → Model |\n| 3 processes, 2 API translations | **1 process, 0 translations** |\n| 133 seconds per task | **17.6 seconds per task** |\n\n> 🎯 That one change — **eliminating the proxy** — made it **7.5x faster**.\n\n---\n\n## 💻 What You Need\n\n| Your Mac | RAM | What You Can Run |\n|----------|-----|-------------------|\n| M1\u002FM2\u002FM3\u002FM4 (base) | 8-16 GB | 🟡 Small models (4B) |\n| M1\u002FM2\u002FM3\u002FM4 Pro | 18-36 GB | 🟠 Gemma 4 31B (tight) |\n| M2\u002FM3\u002FM4\u002FM5 Max | 64-128 GB | 🟢 **Gemma 4 31B** + 🔵 Qwen 3.5 122B |\n| M2\u002FM3\u002FM4 Ultra | 128-192 GB | 🔵 Multiple large models, all three fighters |\n\nAlso need:\n- 🐍 **Python 3.12+** (for MLX)\n- 🤖 **Claude Code** (`npm install -g @anthropic-ai\u002Fclaude-code`)\n\n---\n\n## 🚀 Quick Start (3 Commands)\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-code-local\ncd claude-code-local\nbash setup.sh\n```\n\n`setup.sh` auto-detects your RAM, picks a model from the lineup, downloads it, installs the MLX server, and creates a `Claude Local.command` launcher on your Desktop.\n\n**Then double-click `Claude Local.command`.** You're coding locally.\n\n> 🐛 **If the launcher asks you to sign in to a Claude account:** your `claude` CLI is too old. The launchers pass `--bare` to force local-only API-key auth, but older versions of the CLI don't support that flag and fall through to the Anthropic login prompt. Fix:\n> ```bash\n> npm install -g @anthropic-ai\u002Fclaude-code\n> claude --version   # should print a recent version\n> ```\n\n> 🛠️ **Note for contributors \u002F hackers:** `setup.sh` installs the server as a **symlink** at `~\u002F.local\u002Fmlx-native-server\u002Fserver.py` pointing back at this repo's `proxy\u002Fserver.py`. Edit the file in the repo, restart the MLX server, done — no re-running `setup.sh`, no copying, no silent drift between \"what I committed\" and \"what's actually running.\" There is one source of truth for the server, and it's `proxy\u002Fserver.py` in the repo.\n\n### Or do it manually\n\n```bash\n# 1. Set up the MLX virtualenv\npython3.12 -m venv ~\u002F.local\u002Fmlx-server\n~\u002F.local\u002Fmlx-server\u002Fbin\u002Fpip install mlx-lm\n\n# 2. Pick a fighter and download (one time, ~18-75 GB)\nbash scripts\u002Fdownload-and-import.sh gemma   # or 'llama' or 'qwen'\n\n# 3. Start the server\nMLX_MODEL=divinetribe\u002Fgemma-4-31b-it-abliterated-4bit-mlx \\\n  bash scripts\u002Fstart-mlx-server.sh\n\n# 4. Launch Claude Code\nANTHROPIC_BASE_URL=http:\u002F\u002Flocalhost:4000 \\\nANTHROPIC_API_KEY=sk-local \\\nclaude --model claude-sonnet-4-6\n```\n\n> 💡 **Or just double-click a launcher** in `launchers\u002F`. They do all of this automatically.\n\n---\n\n## 🔧 How It Works\n\n```\n┌──────────────────────────────────────────────────┐\n│              Your MacBook (M5 Max)               │\n│                                                  │\n│  📝 You type ──> 🤖 Claude Code                  │\n│                      │                           │\n│                      ▼                           │\n│                 ⚡ MLX Server (port 4000)        │\n│                      │                           │\n│                      ▼                           │\n│                 🥊 Local model ──> 🖥️  GPU        │\n│                 (Gemma·Llama·Qwen)               │\n│                      │                           │\n│                      ▼                           │\n│  📝 Answer \u003C─── ✨ Clean response                │\n│                                                  │\n│         🔒 Nothing leaves this box. Ever.        │\n└──────────────────────────────────────────────────┘\n```\n\nThe server (`proxy\u002Fserver.py`) is **one file, ~1000 lines**. It does six things:\n\n1. 📦 **Loads the model** — Apple's MLX framework, native Metal GPU, unified memory. Handles Gemma's `RotatingKVCache` quirk automatically so sliding-window models don't crash on the first request.\n2. 🔌 **Speaks Anthropic API** — Claude Code thinks it's talking to Anthropic's cloud. It's not.\n3. 🔧 **Translates tool use** — Three different tool-call formats in and out: Gemma 4 native (`\u003C|tool_call>call:Name{...}\u003Ctool_call|>`), Llama 3.3 raw JSON (`{\"type\":\"function\",...}`), and HuggingFace `\u003Ctool_call>` JSON (Qwen and others). All converted ↔ Anthropic `tool_use` blocks, with garbled-output recovery for small models.\n4. 🧹 **Cleans the output** — Local models think out loud in `\u003Cthink>` \u002F `\u003C|channel>thought` tags, emit stop markers (`\u003Cturn|>`, `\u003C|python_tag|>`), and sometimes drop in reasoning preamble. A real-time `ThinkingFilter` strips thinking blocks token-by-token during generation — before they accumulate in the buffer — then `clean_response` handles the rest.\n5. ⚡ **Reuses prompt caches across requests** — so Claude Code's 4K-token system prompt doesn't get re-prefilled on every turn. Huge speedup for short questions.\n6. 🎯 **Code mode** — auto-detects Claude Code coding sessions (any of Bash\u002FRead\u002FEdit\u002FWrite\u002FGrep\u002FGlob in the tools list), swaps Claude Code's ~10K-token harness prompt for a slim ~150-token one tuned for local models, **and strips verbose tool descriptions down to name + parameter types**. In practice: 35 tools with full descriptions = ~5 600 prompt tokens; after code mode, ~200 tokens — a **28× reduction** that cuts prefill time from ~60 s to ~2 s on Gemma 4 31B. Also stops models from refusing with \"I am not able to execute this task.\"\n\n---\n\n## 🔌 MCP Servers — Claude Code's plugin ecosystem, 100% local\n\n> **The only way to run Claude Code's full MCP plugin ecosystem 100% local on Apple Silicon.**\n\nClaude Code talks to the world through **MCP servers** — Anthropic's plugin protocol. There's a fast-growing ecosystem of them: filesystem, GitHub, Postgres, Slack, web search, Apple Notes, Notion, Chrome DevTools, and a couple hundred more. They're how Claude Code reads your files, browses the web, queries your databases, controls your browser.\n\nMost local-LLM proxies break MCP. They strip the tool definitions, mangle the `tool_use` blocks, or refuse to forward the streaming format Claude Code expects. So even if you swap in a \"Claude alternative,\" your plugins stop working.\n\n`claude-code-local` doesn't break MCP. The proxy passes tool definitions through to your local model and translates the model's `tool_use` blocks back into Anthropic's format — across all three model families (Gemma 4 native, Llama 3.3 raw JSON, Qwen `\u003Ctool_call>` JSON), with garbled-output recovery for small models. From Claude Code's perspective it's talking to Anthropic. From your MCP server's perspective, the same Claude Code is calling it. Nothing in the middle changes.\n\n### How to plug in a server\n\nWire MCP servers up the normal Claude Code way (`~\u002F.claude.json` or per-project `.mcp.json`). Make sure your `ANTHROPIC_BASE_URL` is pointed at the local proxy, then add the server. Three quick examples:\n\n**1. Filesystem — let the local model read\u002Fwrite a folder**\n\n```bash\n# Anthropic's reference filesystem MCP server\nclaude mcp add filesystem -- npx -y @modelcontextprotocol\u002Fserver-filesystem ~\u002Fprojects\n```\n\nNow you can launch Claude Code (any of the launchers in `launchers\u002F`) and ask it to *\"summarize every README in `~\u002Fprojects`\"* — it'll call the filesystem MCP server, which streams files back to your local Gemma\u002FQwen, which writes the summary. Zero cloud round-trips.\n\n**2. GitHub — issues, PRs, code search, all local**\n\n```bash\nclaude mcp add github --env GITHUB_TOKEN=$GITHUB_TOKEN -- npx -y @modelcontextprotocol\u002Fserver-github\n```\n\nNow your local model can read GitHub issues, draft PRs, search code across repos. The model still runs on your Mac; only the GitHub API calls leave the building (which is fine — that's GitHub's data, not yours).\n\n**3. Web search — for when the local model needs fresh info**\n\n```bash\n# Brave Search MCP (free tier, no PII)\nclaude mcp add brave-search --env BRAVE_API_KEY=$BRAVE_API_KEY -- npx -y @modelcontextprotocol\u002Fserver-brave-search\n```\n\nNow your local Gemma can answer *\"what's the latest version of MLX?\"* without hallucinating.\n\n### `MLX_BROWSER_MODE` — optimized for chrome-devtools MCP\n\nClaude Code's chrome-devtools MCP integration sends a 30+ tool list and a 10K-token system prompt to every request. That's fine for cloud Claude. It's death for a local model.\n\nSet `MLX_BROWSER_MODE=1` when starting the proxy and it auto-detects Claude Code MCP browser sessions (by looking for `mcp__chrome-devtools__*` tool registrations), strips the bloat, and keeps only the 9 essential browser-control tools. Same browser automation, ~99% fewer tokens to chew through.\n\n```bash\nMLX_BROWSER_MODE=1 .\u002Fscripts\u002Fstart-mlx-server.sh\n```\n\nDirect clients (anything that brings its own system prompt + tools) are passed through untouched — only Claude Code MCP sessions get the optimization.\n\n### What this unlocks\n\nHonestly the whole MCP ecosystem becomes available to you with no compromise. Every tool the cloud-Claude-Code-using developer has — filesystem, GitHub, web search, browser automation, database access, calendar, anything someone has shipped an MCP server for — works the same against your local Gemma or Qwen. The 200+ tool universe is yours, just running on your machine instead of someone else's.\n\n---\n\n## 🌐 Browser Agent\n\nA standalone browser agent that controls your **real Brave browser** via Chrome DevTools Protocol — powered entirely by local AI. No Claude Code wrapper needed.\n\n> 🧭 **The browser agent lives in its own repo:** [`nicedreamzapp\u002Fbrowser-agent`](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fbrowser-agent). It's not bundled inside this repo. The `Browser Agent.command` launcher here points at the installed location (`~\u002F.local\u002Fbrowser-agent\u002Fagent.py`) that you get from cloning the browser-agent repo separately. Keeping it in its own project keeps both repos focused and stops \"edit the wrong file\" drift between a vendored copy and the real source of truth.\n\n```\n         📝 Your task\n          │\n     🤖 agent.py              ← autonomous browser agent (separate repo)\n          │\n     ⚡ MLX Server             ← local AI decides what to do\n     (Gemma · Llama · Qwen)\n          │\n     🌐 Brave (CDP port 9222) ← clicks, types, navigates your real browser\n          │\n     📊 Context Meter          ← shows memory usage so you know its limits\n```\n\n**Context memory pipeline** — the agent doesn't forget what it's doing:\n\n| | 🐌 Old Behavior | 🚀 New Pipeline |\n|---|---|---|\n| **Memory** | Hard drop after 5 steps | Smart trim at 60% of 32K budget |\n| **When trimming** | Deletes old steps entirely | Compresses into summary |\n| **Original task** | Lost after step 6+ | Re-injected every cycle |\n| **Visibility** | None — flying blind | Color-coded context meter |\n| **Response tokens** | 1,024 | 2,048 |\n\nThe context meter shows green\u002Fyellow\u002Fred after each step:\n```\n  Step 5 snapshot() 2.2s\n         → [101] heading \"The Best Coffee Cake Recipe\"...\n  [Context: 18% ████░░░░░░░░░░░░░░░░ 6K\u002F32K tokens]    ← green = plenty of room\n```\n\n> 💡 **Double-click** `Browser Agent.command` to launch. It starts the MLX server, opens Brave with remote debugging, and drops you into the agent.\n\n---\n\n## 🎤 Hands-Free Voice Mode — The Whole Loop On-Device\n\nTalk to your Mac. It talks back in your own cloned voice. **Nothing touches the internet in either direction.**\n\nThis is the feature I'm proudest of in the whole stack, and the one I haven't seen anyone else demo publicly. Most \"AI voice\" demos use cloud STT (Whisper API, Deepgram, Google Cloud Speech) and cloud TTS (ElevenLabs cloud, OpenAI, Azure) — so your voice hits someone else's server before you see a word of transcript, and every reply makes another cloud round-trip back as audio. This doesn't. **Both sides of the loop run fully on your Mac, end to end.**\n\n### The full voice loop\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                     YOUR MACBOOK (M-series)                     │\n│                                                                 │\n│    🎙️  Your voice                                               │\n│         │                                                       │\n│         ▼                                                       │\n│    🎧 listen  (custom Swift binary)                             │\n│       • Apple SFSpeechRecognizer — on-device engine             │\n│       • Continuous listening, stability-based utterance end     │\n│       • Auto-pauses during playback to stop feedback loops      │\n│       • Wedge-detection watchdog, preventive 10-min recycle     │\n│         │                                                       │\n│         ▼                                                       │\n│    📬 dispatch  (bash watchdog + router)                        │\n│         │                                                       │\n│         ▼                                                       │\n│    ⌨️  inject  (AppleScript → target Terminal window by id)     │\n│         │                                                       │\n│         ▼                                                       │\n│    🤖 claude  (narration persona loaded from CLAUDE.md)         │\n│         │                                                       │\n│         ▼                                                       │\n│    ⚡ MLX Server → 🥊 Gemma 4 31B  (local, 4-bit, ~15 tok\u002Fs)    │\n│         │                                                       │\n│         ▼                                                       │\n│    🔊 ~\u002F.local\u002Fbin\u002Fspeak  \"naturally phrased reply\"             │\n│       • Pocket TTS with your own cloned voice                   │\n│       • Or any TTS that takes text + plays audio                │\n│         │                                                       │\n│         ▼                                                       │\n│    🎵 afplay  (listen pauses itself during this so the          │\n│                model's own voice doesn't feed back in)          │\n│         │                                                       │\n│         ▼                                                       │\n│    👂 You hear it                                               │\n│         │                                                       │\n│         └──────────────► and you keep talking                   │\n│                                                                 │\n│           🔒 Your voice never leaves this box. Ever.            │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n### What makes this actually work\n\n- 🎙️ **Speech-in** — a compiled Swift binary wraps Apple's `SFSpeechRecognizer` (the same on-device engine that powers macOS Dictation) in a *continuous* listening loop rather than the usual Fn-Fn toggle. End of utterance is detected via **partial-result stability**: if the transcribed text stops changing for 2.5 seconds, the recognizer finalizes that sentence. That's way more robust than silence\u002FRMS heuristics against background noise, fans, or music.\n- 🔊 **Speech-out** — a CLI at `~\u002F.local\u002Fbin\u002Fspeak` wraps **Pocket TTS** driving a cloned copy of Matt's own voice. Any TTS that accepts a string and plays audio slots in — macOS `say`, Piper, local ElevenLabs, your choice.\n- 🔁 **Feedback-loop prevention** — the listener auto-pauses while `afplay` is running, so the TTS output of one turn never gets picked up as input for the next. No \"the model talking to itself\" loops.\n- 🧠 **Speak-every-turn is enforced via system prompt** — `NarrativeGemma\u002FCLAUDE.md` is loaded as the narration persona. It tells Gemma to narrate every tool call, every reasoning step, every result, *before* it writes the text reply. You're never staring at a silent terminal wondering if it's thinking.\n- 🛡️ **Real production hardening** — 10-minute preventive process recycle (dodges a known `SFSpeech` daemon wedge), queue-backlog detection with a non-zero exit code when the listener is stuck. Not a demo script — a tool that has to run unattended for hours.\n\n### Why it matters\n\n\"Voice-controlled AI\" is everywhere right now, but under the hood almost every public demo is a cloud pipeline wearing a local-looking coat. If the network drops, the demo dies. If your client's laptop blocks outbound connections, the demo dies. If you're on a plane, in a Faraday cage, or debugging on a disconnected-by-policy machine, the demo dies.\n\nThis setup doesn't die. **Apple's on-device speech engine is a fully local model that already ships with the OS**, and accessing it via `SFSpeechRecognizer` is a first-class macOS API — it's just that almost nobody wraps it in a continuous-listen daemon with production hardening and plumbs it to a local LLM with a cloned-voice reply stream. Now there's one.\n\n### How to wire it up\n\n> 🛠️ **The listening stack lives in its own repo.** The `Listen.swift` binary, the `dictation` \u002F `dispatch` \u002F `inject` scripts, and the `narrative-claude.sh` launcher are **a sibling project**: [`nicedreamzapp\u002FNarrateClaude`](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002FNarrateClaude). Same design as the browser agent: one repo per focused tool, so edits don't drift between a vendored copy and the real source of truth.\n\n### The two halves of the loop, and where each half lives\n\n**🗣️ The speak-and-think half (this repo, `claude-code-local`):**\n- `launchers\u002FNarrative Gemma.command` — boots the MLX server with Gemma 4 31B and injects the narration persona via `MLX_APPEND_SYSTEM_PROMPT_FILE` so Gemma narrates every turn\n- `NarrativeGemma\u002FCLAUDE.md` — the narration persona itself (opt-in, sanitized, generic)\n- `~\u002F.local\u002Fbin\u002Fspeak` — your chosen TTS CLI (Matt uses Pocket TTS with a cloned voice; `say \"$@\"` works as a three-line stub if you don't have a fancier setup)\n\n**🎧 The listen-and-inject half ([`NarrateClaude`](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002FNarrateClaude), sibling repo):**\n- A compiled Swift binary wrapping Apple's `SFSpeechRecognizer` in continuous-listen mode with stability-based end-of-utterance detection and wedge-recovery\n- A bash dispatch pipeline that respawns the listener, watches the target Terminal window, and tears everything down cleanly when you close the session\n- An AppleScript injector that writes transcribed utterances straight into the bound Terminal tab by window ID\n- A `narrative-claude.sh` one-click launcher that opens the Terminal, starts Claude Code, captures the window ID, and starts the listener\n\n### Running the full hands-free loop\n\n```bash\n# 1. Install this repo (claude-code-local) — gives you the MLX server + Narrative launcher\ngit clone https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-code-local.git \"$HOME\u002FDesktop\u002FLocal AI Setup\"\ncd \"$HOME\u002FDesktop\u002FLocal AI Setup\" && bash setup.sh\n\n# 2. Install the sibling NarrateClaude repo — gives you the listening pipeline\ngit clone https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002FNarrateClaude.git ~\u002FNarrateClaude\ncd ~\u002FNarrateClaude && chmod +x dictation\u002Fbin\u002F* narrative-claude.sh\n.\u002Fdictation\u002Fbin\u002Fdictation setup   # compiles the Swift listener + grants permissions\n\n# 3. Launch the full loop\nbash ~\u002FNarrateClaude\u002Fnarrative-claude.sh\n```\n\n> 💡 **Double-click** `Narrative Gemma.command` from this repo to run the model-and-speak side standalone (keyboard in, voice out — useful when you don't want to be on mic). **Run `narrative-claude.sh`** from the NarrateClaude repo to launch the full hands-free loop (voice in, voice out, no keyboard at all).\n\n---\n\n## ✈️ When To Use This\n\n| Situation | Use This? | Why |\n|-----------|:---------:|-----|\n| On a plane | ✅ | Full AI coding, no internet needed |\n| Sensitive client code | ✅ | Nothing leaves your machine |\n| Don't want API fees | ✅ | $0\u002Fmonth forever |\n| Want fastest possible | ☁️ | Cloud Sonnet is still slightly faster |\n| Need Claude-level reasoning | ☁️ | Local models are good, not Claude-level |\n| Controlling from phone | ✅ | iMessage pipeline works offline |\n| Healthcare \u002F legal \u002F finance review | ✅ | 100% on-device, audit-friendly |\n\n---\n\n## 📁 What's In This Repo\n\n```\n📦 claude-code-local\u002F\n ├── ⚡ proxy\u002F\n │   └── server.py              ← MLX Native Anthropic Server with tool-call recovery (~1000 lines)\n ├── 🚀 launchers\u002F\n │   ├── Claude Local.command    ← Default fighter — Claude Code + local model\n │   ├── Gemma 4 Code.command    ← 🟢 THE QUICK ONE\n │   ├── Llama 70B.command       ← 🟠 THE WISE ONE\n │   ├── Browser Agent.command   ← 🌐 Autonomous Brave browser control\n │   ├── Narrative Gemma.command ← 🎭 Auto-narration mode\n │   └── lib\u002Fclaude-local-common.sh ← Shared: model-aware restart, local-cache resolver, health-wait\n ├── 🎭 NarrativeGemma\u002F\n │   └── CLAUDE.md              ← Narration persona (sanitized, generic, opt-in)\n ├── 🛠️  scripts\u002F\n │   ├── download-and-import.sh ← Download a fighter (`gemma` \u002F `llama` \u002F `qwen`)\n │   ├── persistent-download.sh ← Auto-retry downloader for big models\n │   ├── start-mlx-server.sh    ← Server start helper\n │   ├── test_mlx_server.py     ← Tool-call reliability test suite\n │   └── upload-mlx-quant.sh    ← Publish your own MLX-quantized uploads to HF\n ├── 📊 docs\u002F\n │   ├── BENCHMARKS.md          ← Detailed speed comparisons\n │   └── TWITTER-THREAD.md      ← Social media content\n ├── 📱 IMESSAGE_MEDIA_PIPELINE.md ← Phone control + media sending docs\n └── setup.sh                    ← One-command installer\n```\n\n---\n\n## 🛤️ The Journey\n\nWe didn't start here. We went through three generations in one night:\n\n| Gen | What We Tried | Speed | 💡 What We Learned |\n|:---:|---|:---:|---|\n| 1️⃣ | Ollama + custom proxy | 30 tok\u002Fs | Ollama works but Claude Code can't talk to it directly |\n| 2️⃣ | llama.cpp TurboQuant + proxy | 41 tok\u002Fs | TurboQuant compresses KV cache 4.9x, but the proxy is the bottleneck |\n| 3️⃣ | **MLX native server** | **65 tok\u002Fs** | **Kill the proxy. Speak Anthropic API directly. 7.5x faster.** |\n| 4️⃣ | **The lineup** | 65 \u002F 15 \u002F 7 tok\u002Fs | Three brains, one server. Same MLX, same Anthropic API — swap one env var to change the fighter. |\n\n> 🎯 Each generation taught us something. Killing the proxy made it fast. Adding the lineup made it flexible.\n\n---\n\n## 🧩 The Complete Local-First Stack\n\n`claude-code-local` is the **brain** — MLX Anthropic server, launcher lineup, tool-call translation. It pairs with three sibling repos to form a **local-first ambient computing stack** that never sends a keystroke, a voice clip, or a page load to the cloud. Each repo stands alone.\n\n#### 🤖 claude-code-local — **Brain** *(you are here)*\nMLX + Gemma 31B \u002F Llama 70B \u002F Qwen 122B · Anthropic API server · tool-call parsing · prompt cache. Zero cloud, 65 tok\u002Fs on Apple Silicon.\n\n#### 🎤 [NarrateClaude](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002FNarrateClaude) — **Ears + Mouth**\nTalk to Claude, hear replies in your cloned voice — both directions on-device. Fully hands-free loop using Apple SFSpeech + cloned-voice TTS.\n\n#### 🌐 [browser-agent](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fbrowser-agent) — **Hands**\nDrives a real Brave browser via Chrome DevTools Protocol. Handles iframes, Shadow DOM, ProseMirror.\n\n#### 📱 [claude-screen-to-phone](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-screen-to-phone) — **Remote**\nTurns your iPhone into a full Claude Code terminal. Text any command — git, shell, file edits, deploys, browser tasks — and get back whatever Claude produces (text, screenshots, screen recordings, produced videos) right in Messages. Works over iMessage — no bots, no third-party apps, no cloud relay.\n\n**Pair any combination. All four = ambient computing on one Mac, nothing in the cloud.**\n\n### 🪴 Why this matters — the ambient-computing angle\n\nThe real goal isn't \"a faster Claude Code\" — it's **getting off screens and mice**. Hunched-over-screen computing is breaking our bodies: carpal tunnel, curved spines, $1500 ergonomic chairs bought to patch the damage the rest of the desk is doing. That era is ending. These three repos are pieces of what comes next — **computing that's around you instead of in front of you**. Screens become optional, typing becomes optional, sitting still becomes optional, but your data and your voice never leave your house.\n\n👉 **For the full manifesto, see the \"[Why I Built This — Ambient Computing Starts Here](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002FNarrateClaude#-why-i-built-this--ambient-computing-starts-here)\" section in the NarrateClaude README.** That's where the philosophy lives; the repos are just the first implementations.\n\n---\n\n## 🛣️ What's Next\n\nWe ship fast and in public. Rough direction for the next few weeks — if any of these excite you, hit **Watch** (top-right of the repo) to get the release ping.\n\n- 🟡 **Full Qwen 3.5 122B benchmark suite** — reliability, tool-call pass rate, cold-start vs warm, long-context behavior vs Gemma\n- 🟡 **Fully-local Whisper fallback** — drop-in alternative to the Apple `SFSpeechRecognizer` path for older Macs and non-English voices\n- 🟡 **One-click DMG installer** — double-click-to-run setup for folks who just want Claude Code + local AI without a terminal\n- 🟡 **`MLX_MODEL=\u003Chf-url>`** — point at any HuggingFace repo and have the lineup auto-register a new fighter\n- 🟡 **More fighters** — open to PRs adding launchers for DeepSeek, Mistral, Phi, anything MLX-compatible\n\n> 💡 Want something that's not on this list? [**Open an issue →**](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-code-local\u002Fissues\u002Fnew). Every serious request gets read and usually replied to within 24h.\n\n---\n\n## 🤝 Contributing & Ideas\n\nA lot has changed since this repo was one night of \"can I run Claude Code on Ollama.\" It's now a full local-AI stack: a ~1000-line MLX-native Anthropic server, prompt-cache reuse, Gemma \u002F Llama \u002F Qwen native tool-call parsing, code mode (auto-strips Claude Code's 10K-token harness prompt for local models), the browser agent, narration mode, an iMessage pipeline, model-aware launcher restart, and — the piece I think is the biggest deal — a **fully on-device hands-free voice loop** (Apple `SFSpeechRecognizer` + cloned-voice TTS) that lives in the sibling NarrateClaude project. Way past what \"The Journey\" table above covers.\n\nI built this because it solves *my* workflow end to end. Coding on planes, sensitive client work, drafting from my phone, handing off to local models when I don't want cloud latency or cloud bills, and (the thing I come back to most) running actual coding sessions hands-free — speak a request, listen to Gemma narrate the plan, hear it confirm the result, keep talking. No keyboard, no screen-watching. The whole loop is in-place today. I'd love to hear how others could use it.\n\n**If you have ideas, bug reports, a new launcher for a model I don't run, a better code-mode prompt, or a workflow this doesn't cover — open an issue or a PR.** I read them all. Especially interested in hearing from:\n\n- 🧠 People on older Apple Silicon (M1 \u002F M2, 16–36 GB) who know which models actually fit and still do useful coding work\n- 🎤 Anyone who wants to stress-test the hands-free voice loop on different hardware, different TTS voices, or different dictation accents — we're currently running it on one M5 Max with one cloned voice\n- 🔊 TTS recipes beyond Pocket TTS — Piper, local ElevenLabs, MLX-TTS, Kyutai Moshi, or anything else that slots cleanly into `~\u002F.local\u002Fbin\u002Fspeak`\n- 🔌 Folks with workflows this doesn't touch yet — what would *you* want from a local Claude Code?\n- 🐛 Anyone who runs into edge cases I'll never hit on an M5 Max with 128 GB\n\nSmall PRs welcome, huge PRs welcome, issues with no PR welcome. The whole point is that it's yours to bend.\n\n---\n\n## 🙏 Credits\n\nBuilt on the shoulders of giants:\n\n| Project | What It Does | By |\n|---------|-------------|-----|\n| 🤖 [Claude Code](https:\u002F\u002Fclaude.ai\u002Fclaude-code) | AI coding agent | Anthropic |\n| 🍎 [MLX](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx) | Apple Silicon ML framework | Apple |\n| 📦 [mlx-lm](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx-examples) | Model loading + inference | Apple |\n| 🟢 [Gemma](https:\u002F\u002Fblog.google\u002Ftechnology\u002Fdevelopers\u002Fgemma-open-models\u002F) | The 31B fighter (base weights) | Google DeepMind |\n| ⭐ [Gemma 4 31B Abliterated 4-bit MLX](https:\u002F\u002Fhuggingface.co\u002Fdivinetribe\u002Fgemma-4-31b-it-abliterated-4bit-mlx) | **Our own MLX-packed abliterated upload** — THE QUICK ONE in the lineup | divinetribe (us) |\n| 🟠 [Llama](https:\u002F\u002Fllama.meta.com\u002F) | The 70B fighter (base weights) | Meta |\n| ⭐ [Llama 3.3 70B Abliterated 8-bit MLX](https:\u002F\u002Fhuggingface.co\u002Fdivinetribe\u002FLlama-3.3-70B-Instruct-abliterated-8bit-mlx) | **Our own MLX-packed abliterated upload** — THE WISE ONE in the lineup | divinetribe (us) |\n| 🔧 [huihui-ai](https:\u002F\u002Fhuggingface.co\u002Fhuihui-ai) | Original abliteration of Llama 3.3 70B Instruct | huihui-ai |\n| 📖 [Abliteration explained](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Fmlabonne\u002Fabliteration) | The technique we built on | Maxime Labonne |\n| 🔵 [Qwen 3.5](https:\u002F\u002Fqwenlm.github.io\u002F) | The 122B fighter | Alibaba |\n| ⚡ [TurboQuant](https:\u002F\u002Fresearch.google\u002Fblog\u002Fturboquant-redefining-ai-efficiency-with-extreme-compression\u002F) | KV cache compression research | Google Research |\n\nTested on **Apple M5 Max** with **128 GB unified memory**.\n\n---\n\n## 👋 Who built this\n\nBuilt by Matt Macosko in Arcata, CA. All of this is part of the **[Nice Dreamz LLC](https:\u002F\u002Fnicedreamzwholesale.com)** umbrella — the consulting + open-source side of what I do day-to-day.\n\nIf this repo is useful to you, here's the rest of the work:\n\n| | |\n|---|---|\n| 🔒 **[AirGap AI](https:\u002F\u002Fnicedreamzwholesale.com\u002Fairgap\u002F)** | Private, on-device AI consulting for law firms, accountants, doctors, therapists — anyone handling other people's confidential work. Book a 15-min call. |\n| 🖥️ **[Nice Dreamz Software](https:\u002F\u002Fnicedreamzwholesale.com\u002Fsoftware\u002F)** | The rest of the open-source lineup — NarrateClaude, the browser agent, CemaniHomesteadRobot, VisionBuilder, and more. |\n| 🌿 **[Divine Tribe](https:\u002F\u002Fineedhemp.com)** | The hardware side — Core XL, V5, Ruby Twist. 13 years of building physical products. |\n| 📰 **[Marijuana Union](https:\u002F\u002Fmarijuanaunion.com)** | Community + news site. Where the long-form writing lives. |\n| 🌱 **[Tribe Seed Bank](https:\u002F\u002Ftribeseedbank.com)** | Seeds marketplace. |\n\nFind me:\n\n\u003Cp>\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002FNiceDreamzApps\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-@NiceDreamzApps-000000?style=flat-square&logo=x&logoColor=white\" alt=\"X\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fdivine-tribe-vaporizers\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinkedIn-Divine_Tribe-0A66C2?style=flat-square&logo=linkedin&logoColor=white\" alt=\"LinkedIn\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002F@nicedreamzapps\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FYouTube-@nicedreamzapps-FF0000?style=flat-square&logo=youtube&logoColor=white\" alt=\"YouTube\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-@nicedreamzapp-181717?style=flat-square&logo=github&logoColor=white\" alt=\"GitHub\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.instagram.com\u002Fdivinetribevaporizers\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FInstagram-@divinetribevaporizers-E4405F?style=flat-square&logo=instagram&logoColor=white\" alt=\"Instagram\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n## 🛟 Related: `claude-failover`\n\nIf you'd rather **keep using Claude as primary** but want a local backstop for when your Max plan limit pinches or Anthropic has an outage, see the sibling project:\n\n👉 **[github.com\u002Fnicedreamzapp\u002Fclaude-failover](https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-failover)** — one-command flip between Claude and a local mlx-lm model. Lazy-loads, zero RAM cost when not in failover. Different angle from this repo: Claude stays default, local kicks in only when you flip the switch.\n\n---\n\n## 💬 Community\n\nA Discord for builders running, contributing to, or hacking on `claude-code-local`, `NarrateClaude`, and `browser-agent`. Share what you're building, ask questions, swap MLX tips. Quiet, builder-tone, no bots.\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FZdSqgAxUW\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1497121921580404818?label=Join%20NiceDreamzApps%20on%20Discord&logo=discord&color=5865F2&style=for-the-badge\" alt=\"Join the NiceDreamzApps Discord\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n👉 **[discord.gg\u002FZdSqgAxUW](https:\u002F\u002Fdiscord.gg\u002FZdSqgAxUW)**\n\n---\n\n\u003Cp align=\"center\">\n  \u003Cstrong>📜 MIT License\u003C\u002Fstrong> — Use it however you want.\u003Cbr>\u003Cbr>\n  ⭐ \u003Cstrong>Star this repo if it helped you!\u003C\u002Fstrong> ⭐\u003Cbr>\u003Cbr>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-code-local\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fnicedreamzapp\u002Fclaude-code-local?style=for-the-badge&logo=github&color=f5c542&labelColor=1f2328\" alt=\"GitHub stars\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnicedreamzapp\u002Fclaude-code-local\u002Fnetwork\u002Fmembers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fnicedreamzapp\u002Fclaude-code-local?style=for-the-badge&logo=github&color=4c9a2a&labelColor=1f2328\" alt=\"GitHub forks\">\u003C\u002Fa>\n\u003C\u002Fp>\n","Claude Code Local 是一个能够让用户在Apple Silicon设备上完全本地运行Claude Code的项目。它支持Qwen 3.5 (122B)、Llama 3.3 (70B) 和 Gemma 4 (31B) 等多种AI模型，提供高达65 tok\u002Fs的处理速度，并且可以在四种不同模式下工作。该项目特别强调隐私保护与离线可用性，无需连接云端即可实现全部功能，非常适合需要高保密性的NDA、法律及医疗保健工作流程。基于Python开发，采用MIT许可证开源发布。",2,"2026-06-11 03:48:55","high_star"]