[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-84059":3},{"id":4,"name":5,"fullName":6,"owner":5,"repo":5,"description":7,"homepage":8,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":12,"stars30d":12,"stars90d":15,"forks30d":15,"starsTrendScore":12,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":29,"readmeContent":30,"aiSummary":9,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":31,"discoverSource":32},84059,"llamastash","llamastash\u002Fllamastash","A fast terminal native app (TUI) and CLI with init wizard for launching local LLMs with zero overhead","https:\u002F\u002Fllamastash.dev",null,"Rust",56,4,52,2,0,44.5,"MIT License",false,"main",true,[22,23,24,25,26,27,28],"ai","gguf","llamacpp","llm","lmstudio","local-ai","ollama","2026-06-12 04:01:42","# LlamaStash\n\n![ci](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)\n![release](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash\u002Factions\u002Fworkflows\u002Frelease.yml\u002Fbadge.svg)\n[![crates.io](https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fllamastash.svg)](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fllamastash)\n[![Crate downloads](https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fd\u002Fllamastash?label=Crate%20downloads)](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fllamastash)\n[![GitHub Downloads](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fdownloads\u002Fllamastash\u002Fllamastash\u002Ftotal.svg?label=GitHub%20downloads)](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash\u002Freleases)\n[![GitHub Release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fllamastash\u002Fllamastash?color=%23c694ff)](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash\u002Freleases)\n![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blueviolet.svg)\n![Code size](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flanguages\u002Fcode-size\u002Fllamastash\u002Fllamastash)\n[![Coverage](https:\u002F\u002Fcoveralls.io\u002Frepos\u002Fgithub\u002Fllamastash\u002Fllamastash\u002Fbadge.svg?branch=main)](https:\u002F\u002Fcoveralls.io\u002Fgithub\u002Fllamastash\u002Fllamastash?branch=main)\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fllamastash\u002Fllamastash?style=social)](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash)\n\n[![Follow Deepu K Sasidharan on Bluesky](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBluesky-Follow%20deepu105-1185fe?logo=bluesky&logoColor=white)](https:\u002F\u002Fbsky.app\u002Fprofile\u002Fdeepu105.bsky.social)\n\n![Logo](https:\u002F\u002Fraw.githubusercontent.com\u002Fllamastash\u002Fllamastash\u002Fmain\u002Fassets\u002Flogo-h.jpg)\n\n**Zero-overhead, terminal-native `llama.cpp` launcher.**\n\nA fast TUI **and** CLI with init wizard for launching local LLMs via [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp). One Rust binary that's a TUI, a CLI, a daemon, and an OpenAI-compatible proxy. Zero overhead vs raw `llama-server`. See [benchmarks](docs\u002Fbenchmarks.md).\n\n## Why\n\nHeavy abstractions (Ollama, LM Studio) hide llama.cpp; raw `llama-server` use is tedious. LlamaStash is a fast, transparent launcher that is also a first-class shell-tool surface for agents — one binary, daemon on demand, same primitives in the TUI and the CLI.\n\n> **AI agents installing this for a user:** jump to [`INSTALL.md` § For AI agents](INSTALL.md#for-ai-agents). The non-interactive install + verify contract, and exit-code branching live there.\n\n![TUI Gif](https:\u002F\u002Fraw.githubusercontent.com\u002Fllamastash\u002Fllamastash\u002Fmain\u002Fassets\u002Ftui.gif)\n\n## Install\n\nPick whichever channel you prefer — all install the same binary. Full per-platform notes, troubleshooting, and the agent-friendly non-interactive path live in [`INSTALL.md`](INSTALL.md).\n\n```bash\n# macOS + Linux, one-shot\ncurl -fsSL https:\u002F\u002Fllamastash.dev\u002Finstall.sh | sh\n\n# Windows 11 (PowerShell, no admin elevation)\nirm https:\u002F\u002Fllamastash.dev\u002Finstall.ps1 | iex\n\n# Homebrew (macOS + Linuxbrew)\nbrew install llamastash\u002Fllamastash\u002Fllamastash\n\n# Arch Linux (AUR — pick one)\nyay -S llamastash       # source build\n\n# From crates.io (any platform with a Rust toolchain)\ncargo install llamastash\n\n# Windows via Scoop bucket\nscoop bucket add llamastash https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fscoop-llamastash && scoop install llamastash\n```\n\n_Windows: 64-bit Windows 10 1809+ \u002F Windows 11, PowerShell 5.1+, [Windows Terminal](https:\u002F\u002Faka.ms\u002Fterminal) recommended for the TUI; the bundled `llama-server` needs the VC++ 2015–2022 Redistributable (x64). See [INSTALL.md](INSTALL.md#platform-notes)._\n\nThen run `llamastash init` — the interactive wizard installs `llama-server` for your hardware, downloads a starter GGUF, writes a tuned config, and smoke-launches it.\n\n## Quickstart\n\n```bash\n# Open the TUI. Scans default caches; daemon auto-spawns on demand.\nllamastash\n\n# List discovered models. TTY → padded + table; piped or\n# `--no-colors` → TSV bytes. `--json` is the agent contract.\nllamastash list\nllamastash list --json | jq\n\n# Launch a model by name, name substring, path, or canonical id.\nllamastash start qwen-coder --ctx 16384 --reasoning on\n\n# Drive a smoke-test request against the running endpoint.\ncurl -s http:\u002F\u002F127.0.0.1:41100\u002Fv1\u002Fchat\u002Fcompletions \\\n  -H 'Content-Type: application\u002Fjson' \\\n  -d '{\"model\": \"qwen-coder\", \"messages\": [{\"role\": \"user\", \"content\": \"hi\"}]}'\n\n# Stop it.\nllamastash stop qwen-coder\n```\n\n**Tip — mouse focus.** Mouse capture is off by default so the terminal keeps native click-and-drag text selection. To opt in on every TUI run, alias the binary in your shell rc:\n\n```bash\n# bash \u002F zsh\nalias llamastash='llamastash --mouse-focus'\n\n# fish\nalias llamastash 'llamastash --mouse-focus'\n```\n\nOr set it permanently in `config.yaml`:\n\n```yaml\nmouse_focus: true\n```\n\nEither source flips on click-to-focus for the Models list, the right pane, and the tab labels (`Settings`\u002F`Logs`\u002F`Chat`\u002F`Embed`\u002F`Rerank`). Most terminals still expose a bypass modifier (Shift on iTerm2 \u002F Alacritty \u002F foot \u002F wezterm, Option on Apple Terminal) so ad-hoc selection stays reachable.\n\nFull subcommand reference: [`docs\u002Fusage.md`](docs\u002Fusage.md). Proxy client setup (including an OpenCode example): [`docs\u002Fusage.md#opencode-setup`](docs\u002Fusage.md#opencode-setup). Prefer a Vulkan `llama-server` build on AMD\u002FNVIDIA hosts: [`docs\u002Fusage.md#preferring-a-vulkan-llama-server-build`](docs\u002Fusage.md#preferring-a-vulkan-llama-server-build). Architecture and IPC contract: [`docs\u002Farchitecture.md`](docs\u002Farchitecture.md). When things go wrong: [`docs\u002Ftroubleshooting.md`](docs\u002Ftroubleshooting.md).\n\n## Agent Skills\n\nThe CLI ships with an [Agent Skills](https:\u002F\u002Fagentskills.io) manifest so supported agents can load repo-specific instructions for using `llamastash` as a local model-management CLI.\n\n- Canonical skill bundle: [`skills\u002Fllamastash\u002F`](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash\u002Ftree\u002Fmain\u002Fskills\u002Fllamastash)\n\n**Claude Code plugin marketplace:** install the repo as a plugin, then install the bundled skill:\n\n```text\n\u002Fplugin marketplace add llamastash\u002Fllamastash\n\u002Fplugin install llamastash@llamastash\n\u002Freload-plugins\n```\n\nManual install examples:\n\n```bash\n# OpenClaw\nmkdir -p ~\u002F.openclaw\u002Fskills && cp -r skills\u002Fllamastash ~\u002F.openclaw\u002Fskills\u002F\n\n# OpenCode\nmkdir -p ~\u002F.config\u002Fopencode\u002Fskills && cp -r skills\u002Fllamastash ~\u002F.config\u002Fopencode\u002Fskills\u002F\n```\n\nThe skill teaches agents to prefer `--json`, branch on LlamaStash's documented exit codes, reuse exact discovered model names, and read `status --json` `proxy.listen` before configuring an OpenAI-compatible client.\n\n## Features\n\nFull detail per feature in [`FEATURES.md`](FEATURES.md) — including trade-offs, contracts, and links into [`docs\u002Fusage.md`](docs\u002Fusage.md).\n\n### [Zero-to-chat in one command](FEATURES.md#zero-to-chat-in-one-command)\n\n- [`llamastash init` — first-run wizard](FEATURES.md#llamastash-init--first-run-wizard) that detects hardware, installs `llama-server`, picks + downloads a starter GGUF, writes a tuned config, and smoke-launches.\n- [Hardware-aware model recommender](FEATURES.md#hardware-aware-model-recommender) with a VRAM-fit filter + composite ranking over a CI-refreshed benchmark snapshot.\n- [`llamastash doctor`](FEATURES.md#llamastash-doctor--read-only-health-check) — typed, agent-branchable findings; always exits `0`.\n\n### [Discovers what you already have](FEATURES.md#discovers-what-you-already-have)\n\n- [Auto-scans HuggingFace, Ollama, and LM Studio caches](FEATURES.md#auto-scans-huggingface-ollama-and-lm-studio-caches), plus user paths.\n- [Rich GGUF intelligence](FEATURES.md#rich-gguf-intelligence) — architecture, params, quant, native context, chat template, KV-cache-aware memory estimates.\n- [Smart deduplication](FEATURES.md#smart-deduplication) — symlinks collapsed, split GGUFs unified, Ollama blobs named.\n- [Live filesystem watching](FEATURES.md#live-filesystem-watching) — new GGUFs appear without a restart.\n\n### [Launches anything, supervises everything](FEATURES.md#launches-anything-supervises-everything)\n\n- [Daemon-on-demand](FEATURES.md#daemon-on-demand) — one binary as TUI + CLI + daemon; running models survive TUI close.\n- [Multi-model concurrency](FEATURES.md#multi-model-concurrency) — per-model port from a configurable range, `\u002Fhealth`-probed state machine.\n- [GPU-aware built-in arch defaults](FEATURES.md#gpu-aware-built-in-arch-defaults) — sensible flags per `(architecture, gpu_backend)` with zero YAML.\n- [Intelligent context auto-fit](FEATURES.md#intelligent-context-auto-fit) — when `ctx` is unset, llamastash picks the largest context that fits free VRAM (or RAM, CPU-only) from the GGUF attention geometry. Sidesteps llama.cpp `--fit`'s 4096 collapse on Linux 7+ iGPUs (AMD Strix Halo) where unified-memory free space is mis-reported.\n- [Typed launch-knob editor](FEATURES.md#typed-launch-knob-editor) with `(source)` chips and a layered preset → last-params → arch-defaults → built-ins resolver.\n- [Multi-GPU device selection](FEATURES.md#typed-launch-knob-editor) — pin a model to one card (`--device`) instead of letting llama.cpp split it across every GPU; the picker lists exactly what `llama-server --list-devices` reports.\n- [Named presets, favorites, last-params recall](FEATURES.md#named-presets-favorites-last-params-recall).\n\n### [A TUI that doesn't get in your way](FEATURES.md#a-tui-that-doesnt-get-in-your-way)\n\n- [Keyboard-driven everywhere](FEATURES.md#keyboard-driven-everywhere) — vim-style `hjkl` + `Ctrl+F`\u002F`Ctrl+B` paging, `0`\u002F`$` top\u002Fbottom, `gt`\u002F`gT` tab cycling; `\u002F` filter, `u`\u002F`c`\u002F`p` yank, `?` help.\n- [Right pane is your smoke test](FEATURES.md#right-pane-is-your-smoke-test) — Logs \u002F Chat \u002F Embed \u002F Rerank over the same OpenAI-compatible endpoints external clients use.\n- [In-TUI HuggingFace browser](FEATURES.md#in-tui-huggingface-browser) — search, sort, paginate, per-file hardware fit, download strip with cancel.\n- [Theming and rebinding](FEATURES.md#theming-and-rebinding) — five themes + custom palette; every action rebindable.\n- [Accessible by default](FEATURES.md#accessible-by-default) — dual-encoded status (color + glyph), readable on mono terminals.\n- [Adaptive layout — works from 60 cells up](FEATURES.md#adaptive-layout--works-from-60-cells-up) — below 100 cells the right pane goes drill-in-only; list columns and hint chips drop by priority rank as the pane shrinks so the model name stays readable.\n\n### [First-class CLI for agents and scripts](FEATURES.md#first-class-cli-for-agents-and-scripts)\n\n- [Subcommands cover every TUI capability](FEATURES.md#subcommands-cover-every-tui-capability) with `--json` as the stable agent contract.\n- [Documented exit codes per failure class](FEATURES.md#documented-exit-codes-per-failure-class) — pin numbers, not message text.\n- [Colored TTY output, byte-stable TSV when piped](FEATURES.md#colored-tty-output-byte-stable-tsv-when-piped) — existing `awk` \u002F `column` pipelines keep working.\n- [`llamastash pull \u003Chf-repo>`](FEATURES.md#llamastash-pull-hf-repo--standalone-hf-fetch) — same primitive as the wizard, with disk-space prechecks.\n- [`llamastash recommend`](FEATURES.md#llamastash-recommend--hardware-aware-picks-in-your-shell) — the recommender on its own, agent-friendly.\n- [Reproducible pulls via `--revision \u003CSHA>`](FEATURES.md#reproducible-pulls-via---revision-sha).\n\n### [Drop-in OpenAI + Ollama proxy](FEATURES.md#drop-in-openai--ollama-proxy)\n\n- [OpenAI-compatible endpoint](FEATURES.md#openai-compatible-endpoint) at `http:\u002F\u002F127.0.0.1:11435\u002Fv1` by default (or the next free port up to `11440`) — drives every discovered model through one URL; OpenCode, Pi (pi.dev), Cline, llm-cli, the OpenAI SDKs all work as-is. Auto-starts the requested model; falls back to a Ready peer with audit headers (`x-llamastash-served-by`, `x-llamastash-fallback-reason`) when launch fails. The default port is `11435` (one above Ollama's well-known `11434`) so a llamastash daemon and an Ollama install can co-exist without a port collision.\n- [Ollama discovery surface](FEATURES.md#ollama-discovery-surface) — `GET \u002Fapi\u002Ftags` \u002F `\u002Fapi\u002Fversion` \u002F `\u002Fapi\u002Fps`, `POST \u002Fapi\u002Fshow` so tools that auto-detect Ollama via `OLLAMA_HOST` recognise llamastash and fall through to the OpenAI-compat endpoints for inference.\n- **Ollama drop-in mode is opt-in.** Enable with `--ollama-compat` (or `proxy.ollama_compat: true` \u002F `LLAMASTASH_OLLAMA_COMPAT=1`) and the proxy claims port `11434`, answers `GET \u002F` with the byte-exact `\"Ollama is running\"` handshake string, and works as a transparent replacement for the official `ollama` CLI and other Ollama-Go-based clients. Leaving compat off keeps the safe coexistence default (port `11435`, `\"LlamaStash is running\"` identity).\n- [Loopback-only, no authentication](FEATURES.md#auth-posture) — single-user local threat model; the proxy refuses LAN binds.\n\n### [Built to be safe to run](FEATURES.md#built-to-be-safe-to-run)\n\n- [Bearer-token loopback control plane (`runtime.json` `0600`)](FEATURES.md#bearer-token-control-plane) — the per-daemon token + URL live in `$XDG_STATE_HOME\u002Fllamastash\u002Fruntime.json`; same-UID trust, no network exposure.\n- [Hardened fetch substrate](FEATURES.md#hardened-fetch-substrate) — HTTPS-only, host allowlist, redirect\u002Fbody-size caps, IP-literal refusal.\n- [Archive-bomb defenses on installers](FEATURES.md#archive-bomb-defenses-on-installers) — entry\u002Fsize\u002Fratio caps; SHA-256 verified before extract.\n- [Atomic, mode-checked config + state writes](FEATURES.md#atomic-mode-checked-config--state-writes) — `0600` final mode; corrupt state quarantined, not fatal.\n- [Side-by-side daemons](FEATURES.md#side-by-side-daemons) — isolated instances via `LLAMASTASH_*_DIR` (state \u002F config \u002F cache); each daemon publishes its own `runtime.json`.\n\n_**Note**: This is beta software. Rough edges are to be expected. Windows and macOS support is not as well-tested as Linux; Same goes for non-AMD GPUs. Please report issues if you hit them. The `llama-server` builds are unmodified upstream binaries; any bugs in them are out of scope for LlamaStash._\n\n## Benchmarks\n\nLlamaStash spawns the unmodified upstream `llama-server`. Three suites track what that means in practice — **Suite A** asserts the wrapper adds no measurable overhead vs raw `llama-server`, **Suite B** compares LlamaStash-as-shipped against Ollama + LM Studio on the same hardware through their OpenAI-compatible endpoints, **Suite C** measures the proxy hop vs hitting `llama-server` directly (TTFT p50 +0.45 ms, decode unchanged). Full write-up + per-workload tables: [`docs\u002Fbenchmarks.md`](docs\u002Fbenchmarks.md).\n\nEach cell below is **decode tok\u002Fs \u002F TTFT ms** on the `chat_turn` workload (50-token prompt → 64 tokens decode). LlamaStash matches raw `llama-server` within ≤1% in normalized mode on every platform. Re-run on your hardware: `make bench-end-to-end` (Suite B) or `make bench-overhead` (Suite A).\n\n![LlamaStash vs raw llama-server vs LM Studio vs Ollama — decode tok\u002Fs across AMD APU, Apple M1, NVIDIA RTX 3050 Ti (defaults mode, log scale)](https:\u002F\u002Fraw.githubusercontent.com\u002Fllamastash\u002Fllamastash\u002Fmain\u002Fassets\u002Fcharts\u002F00-hero-all-hardware.png)\n\n### AMD APU - Linux (Ryzen AI Max+ 395 \u002F Radeon 8060S 128GB unified, system ROCm 7.2.3, llama.cpp build `b9440 (e6123e208)`)\n\n`chat_turn` `normalized` mode, decode tok\u002Fs \u002F TTFT ms. One bench JSON per row (no averaging).\n\n| Tool               | small (E2B Q4) |  mid (31B Q4) | large_dense (27B Q8) | large_moe (35B-A3B Q8) | Engine notes                           |\n| ------------------ | -------------: | ------------: | -------------------: | ---------------------: | -------------------------------------- |\n| **LlamaStash**     |  **82.1 \u002F 51** | **9.9 \u002F 468** |        **7.5 \u002F 406** |         **42.3 \u002F 178** | local HIP\u002FROCm                         |\n| raw `llama-server` |      81.0 \u002F 51 |     9.9 \u002F 466 |            7.5 \u002F 406 |             43.1 \u002F 185 | local HIP\u002FROCm                         |\n| LM Studio 2.18.0   |     91.1 \u002F 187 |    — (crash¹) |           — (crash¹) |             — (crash¹) | bundled ROCm 6.4 vendor (see footnote) |\n| Ollama 0.24.0      |     50.8 \u002F 224 |    4.8 \u002F 1096 |           2.6 \u002F 1750 |             12.2 \u002F 484 | bundled                                |\n\n¹ LM Studio's bundled ROCm vendor libraries (v6.4) abort in `ggml_cuda_error` during backend init on `gfx1151` (Strix Halo) across all LMS-shipped runtimes. System ROCm 7.2.3 loads the same models cleanly via raw `llama-server`, so this is an LMS vendor-bundle limitation. LMS Vulkan numbers are in the [benchmark blog](https:\u002F\u002Fdeepu.tech\u002Fbenchmarking-llamastash) and in the [final report](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash\u002Fblob\u002Fmain\u002Fdocs\u002Fbenchmarks\u002Flinux-amd-apu-final-report.md).\n\n#### AMD APU — Vulkan addendum (LlamaStash vs LM Studio, 2026-06-01)\n\n| Tool             | small (E2B Q4) |   mid (31B Q4) | large_dense (27B Q8) | large_moe (35B-A3B Q8) |\n| ---------------- | -------------: | -------------: | -------------------: | ---------------------: |\n| **LlamaStash**   | **101.2 \u002F 55** | **10.8 \u002F 671** |            7.5 \u002F 196 |          **50.7 \u002F 72** |\n| LM Studio 2.18.0 |     93.6 \u002F 191 |     7.1 \u002F 2307 |        **8.0 \u002F 801** |             38.4 \u002F 227 |\n\nSame backend (Vulkan b9440 \u002F `vulkan-avx2@2.18.0`), same GGUF bytes. raw `llama-server` and Ollama omitted: the wrapper-overhead claim already covered by the HIP table; Ollama mainline has no Vulkan support.\n\n### NVIDIA - Linux (RTX 3050 Ti, 4 GiB VRAM, llama.cpp build `b9360`)\n\n| Tool               | CUDA (gemma-3-4b Q3 `defaults`) | Vulkan (gemma-3-4b Q3 `defaults`) |\n| ------------------ | ------------------------------: | --------------------------------: |\n| **LlamaStash**     |                 **41.1 \u002F 74** ✦ |                    **42.0 \u002F 113** |\n| raw `llama-server` |                      36.6 \u002F 110 |                        37.5 \u002F 148 |\n| LM Studio 2.16.0   |                  **48.7 \u002F 318** |                    **48.3 \u002F 308** |\n| Ollama 0.24.0      |                      40.7 \u002F 120 |                        42.0 \u002F 115 |\n\n✦ LlamaStash leads raw `llama-server` in defaults mode (12–16% decode, 33–49% TTFT) via the hardware-aware defaults overlay; normalized mode: within ≤0.5 tok\u002Fs. Vulkan decode ≥ CUDA on this hardware in 26 of 28 cells (median +5%). Curated report with six findings: [`linux-nvidia-final.md`](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash\u002Fblob\u002Fmain\u002Fdocs\u002Fbenchmarks\u002Flinux-nvidia-final.md).\n\n### Apple M1 - macOS (16 GB unified memory, Metal, llama.cpp build `9330 (328874d05)`)\n\n| Tool               | small (Qwen2.5-0.5B Q4) |\n| ------------------ | ----------------------: |\n| **LlamaStash**     |         **95.6 \u002F 18** ✦ |\n| raw `llama-server` |               91.9 \u002F 20 |\n| LM Studio          |               88.4 \u002F 68 |\n| Ollama 0.24.0      |              79.6 \u002F 102 |\n\n✦ LlamaStash leads raw `llama-server` on M1 in `defaults` mode (99.0 vs 92.3 tok\u002Fs, 15 vs 19 ms TTFT) because its Metal defaults overlay injects hardware-optimal knobs at startup. Normalized mode: 92.2 vs 91.5 — within 1%. Curated report: [`macos-m1-final-report.md`](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash\u002Fblob\u002Fmain\u002Fdocs\u002Fbenchmarks\u002Fmacos-m1-final-report.md).\n\n## Screenshots\n\n![Init](https:\u002F\u002Fraw.githubusercontent.com\u002Fllamastash\u002Fllamastash\u002Fmain\u002Fassets\u002Finit.gif)\n\n![TUI 1](https:\u002F\u002Fraw.githubusercontent.com\u002Fllamastash\u002Fllamastash\u002Fmain\u002Fassets\u002Ftui_3.png)\n![TUI 2](https:\u002F\u002Fraw.githubusercontent.com\u002Fllamastash\u002Fllamastash\u002Fmain\u002Fassets\u002Ftui_2.png)\n\n## Configuration\n\nLlamaStash reads `$XDG_CONFIG_HOME\u002Fllamastash\u002Fconfig.yaml` on Linux (fallback `~\u002F.config\u002Fllamastash\u002Fconfig.yaml`), `~\u002FLibrary\u002FApplication Support\u002Fllamastash\u002Fconfig.yaml` on macOS, and `%APPDATA%\\llamastash\\config\\config.yaml` on Windows. A fully-annotated sample lives at [`config.example.yaml`](config.example.yaml) — copy it to the path above and edit. The full schema reference is in [`docs\u002Fusage.md`](docs\u002Fusage.md#configuration).\n\nQuick tour of the top-level keys:\n\n| Key                           | What it controls                                                                                                                                                          |\n| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `theme`                       | Built-in palette: `macchiato` (default), `latte`, `gruvbox-dark`, `solarized-dark`, `mono`. Set to `custom` to use the `custom_theme` block. Cycle live with `t:theme`.   |\n| `custom_theme`                | User-defined palette. Inherits unspecified slots from `base:` (default macchiato). Accepts `#RRGGBB` hex or ANSI names. Once defined, `Custom` joins the `t:theme` cycle. |\n| `model_paths`                 | Extra directories to scan for `.gguf` files. Merged with `-p\u002F--model-path` and `LLAMASTASH_MODEL_PATHS`.                                                                  |\n| `disable_default_cache_paths` | Per-bucket toggles (`huggingface`, `ollama`, `lm_studio`) for the auto-walked caches.                                                                                     |\n| `disable_scan`                | Skip filesystem scanning entirely. Same as `--no-scan` \u002F `LLAMASTASH_NO_SCAN=1`.                                                                                          |\n| `port_range`                  | Inclusive `{start, end}` TCP range the supervisor picks from. Default `41100..=41300`.                                                                                    |\n| `llama_server_path`           | Absolute path to `llama-server`. Overridable by `--llama-server` and `LLAMASTASH_LLAMA_SERVER`.                                                                           |\n| `probe_timeout_secs`          | Health-probe deadline per launch. Default `120`. Bump for 70B+ on slow disks.                                                                                             |\n| `keybindings`                 | Action-name → key-spec overrides. Kdash-style dialect (`ctrl+q`, `shift+tab`, `f1`, …).                                                                                   |\n\nEnvironment variables:\n\n| Variable                  | Purpose                                                                                                                                                                                                                                                                                                        |\n| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `LLAMASTASH_CONFIG`       | Override config-file path                                                                                                                                                                                                                                                                                      |\n| `LLAMASTASH_LLAMA_SERVER` | Path to `llama-server`                                                                                                                                                                                                                                                                                         |\n| `LLAMASTASH_NO_SCAN`      | Skip filesystem scanning                                                                                                                                                                                                                                                                                       |\n| `LLAMASTASH_IPC_URL`      | Point a CLI\u002FTUI at a non-default daemon control plane (verbatim URL, e.g. `http:\u002F\u002F127.0.0.1:48134`). Must be set together with `LLAMASTASH_IPC_TOKEN`.                                                                                                                                                         |\n| `LLAMASTASH_IPC_TOKEN`    | Bearer token for the control-plane URL. See `LLAMASTASH_IPC_URL`.                                                                                                                                                                                                                                              |\n| `LLAMASTASH_OFFLINE`      | Disable outbound network for `init`, `pull`, and `doctor` fetch paths. Accepts `true` \u002F `false` when bound via clap's `--offline` flag; the runtime `fetch::offline_requested` check also accepts `1` \u002F `yes` for compatibility with scripts that follow the `XDG`\u002F`gh` convention. Equivalent to `--offline`. |\n| `HF_TOKEN`                | HuggingFace API token. Read by `init` and `pull` only; never propagated into spawned `llama-server` children. Cache-file (`~\u002F.cache\u002Fhuggingface\u002Ftoken`) source is refused if its mode is group\u002Fworld-readable.                                                                                                 |\n| `HF_ENDPOINT`             | Override the HuggingFace API endpoint host. Must be `https:\u002F\u002F` and on the HF-allowlist (`huggingface.co` and its LFS CDN); non-allowlisted values are refused. Default: `https:\u002F\u002Fhuggingface.co`.                                                                                                              |\n\n### Default scan paths\n\nWhen `model_paths` and `--model-path` are empty, LlamaStash walks these caches automatically. Each bucket is independently toggleable via `disable_default_cache_paths.\u003Cbucket>: true` in `config.yaml`, or globally via `--no-scan` \u002F `LLAMASTASH_NO_SCAN=1`.\n\n| Bucket      | Linux                                             | macOS                                                    |\n| ----------- | ------------------------------------------------- | -------------------------------------------------------- |\n| HuggingFace | `~\u002F.cache\u002Fhuggingface\u002Fhub`                        | `~\u002FLibrary\u002FCaches\u002Fhuggingface\u002Fhub`                       |\n| Ollama      | `~\u002F.ollama\u002Fmodels`                                | `~\u002F.ollama\u002Fmodels`                                       |\n| LM Studio   | `~\u002F.lmstudio\u002Fmodels`, `~\u002F.cache\u002Flm-studio\u002Fmodels` | `~\u002FLibrary\u002FCaches\u002FLMStudio\u002Fmodels`, `~\u002F.lmstudio\u002Fmodels` |\n\nFiles anywhere under these roots that end in `.gguf` (and aren't `.gguf.part`) get parsed and added to the catalog.\n\n## CLI exit codes\n\nEvery non-interactive subcommand returns a documented exit code so agent scripts can branch on failure class. Pin against numbers, not message text — they are the public contract.\n\n| Code | Meaning                                                                                                                                                                                                        |\n| ---- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `0`  | Success                                                                                                                                                                                                        |\n| `64` | Usage error (missing required arg, invalid combination — clap-emitted)                                                                                                                                         |\n| `65` | Daemon unreachable (`runtime.json` absent, control plane refused connection, timeout)                                                                                                                          |\n| `66` | Model reference matched zero or multiple models (stderr lists candidates)                                                                                                                                      |\n| `67` | `start_model` failed at the supervisor (probe timeout, port allocation failure)                                                                                                                                |\n| `68` | `stop_model` \u002F `stop_all` failed                                                                                                                                                                               |\n| `69` | `pull` download failed (transport, checksum, or HF cache write)                                                                                                                                                |\n| `70` | `llama-server` binary not found (`--llama-server`, `LLAMASTASH_LLAMA_SERVER`, or `$PATH`)                                                                                                                      |\n| `71` | Unexpected error (catch-all)                                                                                                                                                                                   |\n| `72` | `init` aborted before substantive work — failed precondition, integrity check, or rate-limited GH API. Safe to re-run.                                                                                         |\n| `73` | `init` download failed mid-step — disk space, transport, or HF cache write. Partial state recorded; re-run picks up where it stopped.                                                                          |\n| `74` | `init` smoke-launch failed — phase-1 dry-run exceeded VRAM ceiling, or `--version` probe returned non-zero. Binary is installed; re-run smoke with `init --only smoke` or use `llamastash doctor` to diagnose. |\n\n> **Note on sysexits.h**: the numbers above are deliberately reused from `\u003Csysexits.h>` for familiarity, but LlamaStash's _meanings_ diverge from the standard ones. Scripts that import `EX_NOHOST` (68) expecting \"host unreachable\" will get our \"stop failed\"; `EX_DATAERR` (65) is reused for \"daemon unreachable\", not \"data error\". Branch on LlamaStash's table above, not the libc constants.\n\n## Platforms\n\nLinux (x86_64, aarch64), macOS (Apple Silicon, Intel), and Windows 11 (x86_64). One binary, one TUI, one CLI — the daemon's control plane is bearer-token-authed HTTP loopback on every platform, and the supervisor uses the OS's native process-group semantics (POSIX `setsid` + signals, Windows Job Objects + CTRL+BREAK). Windows AMD GPU detection and `aarch64-pc-windows-msvc` are on the roadmap.\n\n## Roadmap\n\nTracked in detail in [`TODO.md`](https:\u002F\u002Fgithub.com\u002Fllamastash\u002Fllamastash\u002Fblob\u002Fmain\u002FTODO.md). The headline items on deck:\n\n- **llama.cpp version pinning** — prevent silent drift \u002F incompatibility on `brew upgrade`.\n- **MCP and LAN-exposed HTTP surfaces** — Model Context Protocol, plus auth + TLS + LAN binding for the proxy. The loopback OpenAI-compatible proxy ships today (see [Drop-in OpenAI + Ollama proxy](#drop-in-openai--ollama-proxy)); the rest of the v1 R34 deferral (Anthropic compat, MCP, network exposure) stays on the roadmap.\n- **Anthropic API compatibility** — `\u002Fv1\u002Fmessages` shim on top of the existing OpenAI-compatible endpoints.\n- **Per-PID VRAM attribution** via NVML's `nvmlDeviceGetComputeRunningProcesses`. Today the right pane shows per-model RAM + CPU%; VRAM is reported only at the host level.\n- **GPU\u002FCPU offload split UI** — first-class control over which layers go where.\n- **Windows AMD GPU detection** — pick a probe path (DXGI \u002F WMI \u002F ADLX). 0.0.2 shows \"GPU detection unavailable\" on Windows AMD hosts.\n- **`aarch64-pc-windows-msvc`** — Snapdragon X \u002F Surface Pro coverage. Deferred from 0.0.2.\n- **MLX and vLLM backends** — if the surface area lands cheaply alongside llama.cpp.\n- **Docker-ready packaging** — official images plus a documented `docker run` path.\n\n## Contributing\n\nBug reports, design discussion, and PRs welcome. Start with [`CONTRIBUTING.md`](CONTRIBUTING.md).\n\n## AI Usage\n\nMultiple AI Coding Harnesses and LLMs were heavily used to create this project.\n\n## License\n\nMIT © Deepu K Sasidharan\n\n## Terms of use\n\n- The Software shall be used for Good, not Evil.\n- This software shall not be used for any military purposes including intelligence agencies.\n\n## Related projects\n\n- [`kdash`](https:\u002F\u002Fgithub.com\u002Fkdash-rs\u002Fkdash) — Kubernetes dashboard TUI by the same author. LlamaStash borrows its engineering and release scaffolding from kdash: the org layout (`llamastash\u002Fllamastash`, `llamastash\u002Fhomebrew-llamastash`, `llamastash\u002Fllamastash.github.io`), the brew-tap structure, the `cli.rs` subdomain setup, and the release-on-tag workflow shape. The product itself is independent.\n- [`jwt-ui`](https:\u002F\u002Fgithub.com\u002Fjwt-rs\u002Fjwt-ui) — JWT decoder \u002F encoder TUI by the same author.\n\n## Star History\n\nIf LlamaStash is useful to you, a star helps other people find it.\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=llamastash\u002Fllamastash&type=date&legend=top-left)](https:\u002F\u002Fwww.star-history.com\u002F?type=date&repos=llamastash%2Fllamastash)\n","2026-06-11 04:12:13","CREATED_QUERY"]