[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-5376":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":36,"lastSyncTime":37,"discoverSource":38},5376,"llmfit","AlexsJones\u002Fllmfit","AlexsJones","Hundreds of models & providers. One command to find what runs on your hardware.","",null,"Rust",27759,1696,80,49,0,27,367,1965,220,119.69,"MIT License",false,"main",true,[27,28,29,30,31,32],"gguf","llm","localai","mlx","skill","unsloth","2026-06-12 04:00:25","# llmfit\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Ficon.svg\" alt=\"llmfit icon\" width=\"128\" height=\"128\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cb>English\u003C\u002Fb> ·\n  \u003Ca href=\"README.zh.md\">中文\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAlexsJones\u002Fllmfit\u002Factions\u002Fworkflows\u002Fci.yml\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FAlexsJones\u002Fllmfit\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg\" alt=\"CI\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fcrates.io\u002Fcrates\u002Fllmfit\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fllmfit.svg\" alt=\"Crates.io\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg\" alt=\"License\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fabout.signpath.io\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSignPath-signed-brightgreen?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIxNiIgaGVpZ2h0PSIxNiIgZmlsbD0id2hpdGUiIHZpZXdCb3g9IjAgMCAxNiAxNiI+PHBhdGggZD0iTTEwLjA2NyA0LjU2N2wtNC43MzQgNC43MzMtMS40LTEuNGExIDEgMCAwIDAtMS40MTQgMS40MTRsMi4xIDIuMWExIDEgMCAwIDAgMS40MTQgMGw1LjQ0LTUuNDRhMSAxIDAgMCAwLTEuNDE0LTEuNDE0eiIvPjwvc3ZnPg==\" alt=\"Signed with SignPath\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n> **New: [Community Benchmarks](#community-benchmarks-b)** — Browse real-world performance data from actual users. Press `b` to see measured tok\u002Fs, TTFT, and VRAM for any GPU — not just yours. Pick from 27+ hardware presets (RTX 5090 to Apple M1) with `H` to compare real numbers before you buy or build.\n\n**Hundreds of models & providers. One command to find what runs on your hardware.**\n\nA terminal tool that right-sizes LLM models to your system's RAM, CPU, and GPU. Detects your hardware, scores each model across quality, speed, fit, and context dimensions, and tells you which ones will actually run well on your machine.\n\nShips with an interactive TUI (default) and a classic CLI mode. Supports multi-GPU setups, MoE architectures, dynamic quantization selection, speed estimation, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio).\n\n**New: [Community Benchmarks](#community-benchmarks-b) (`b`)** — See real-world tok\u002Fs, TTFT, and VRAM usage from other users running the same hardware as you. Powered by [localmaxxing.com](https:\u002F\u002Flocalmaxxing.com), this bridges the gap between estimated and actual performance.\n\nAlso: [Download Manager](#download-manager-d) (`D`), [Advanced Configuration](#advanced-configuration-a) (`A`), and [Hardware Simulation](#hardware-simulation-s) — Press `D` to manage downloads, view history, delete models, and configure the download directory. Press `A` to tune TPS efficiency, run mode factors, and scoring weights. Press `S` to simulate different hardware.\n\n> **Sister projects:**\n> - [sympozium](https:\u002F\u002Fgithub.com\u002Fsympozium-ai\u002Fsympozium\u002F) — managing agents in Kubernetes.\n> - [llmserve](https:\u002F\u002Fgithub.com\u002FAlexsJones\u002Fllmserve) — a simple TUI for serving local LLM models. Pick a model, pick a backend, serve it.\n> - [llama-panel](https:\u002F\u002Fgithub.com\u002FAlexsJones\u002Fllama-panel) — a native macOS app for managing local llama-server instances.\n\n![demo](assets\u002Fdemo.gif)\n\n---\n\n## Install\n\n### Windows\n```sh\nscoop install llmfit\n```\n\nIf Scoop is not installed, follow the [Scoop installation guide](https:\u002F\u002Fscoop.sh\u002F).\n\n### macOS \u002F Linux\n\n#### Homebrew\n```sh\nbrew install llmfit\n```\n\n#### MacPorts\n```sh\nport install llmfit\n```\n\n#### Quick install\n```sh\ncurl -fsSL https:\u002F\u002Fllmfit.axjns.dev\u002Finstall.sh | sh\n```\n\nDownloads the latest release binary from GitHub and installs it to `\u002Fusr\u002Flocal\u002Fbin` (or `~\u002F.local\u002Fbin` if no sudo).\n\n**Install to `~\u002F.local\u002Fbin` without sudo:**\n```sh\ncurl -fsSL https:\u002F\u002Fllmfit.axjns.dev\u002Finstall.sh | sh -s -- --local\n```\n\n### uv \u002F pip\nTo install or update llmfit:\n```sh\nuv tool install -U llmfit\n```\n\nTo run without installing:\n```sh\nuvx llmfit\n```\n\nYou can also install llmfit as a Python package in the normal way with tools such as pip or uv.\n\n### Docker \u002F Podman\n```sh\ndocker run ghcr.io\u002Falexsjones\u002Fllmfit\n```\nThis prints JSON from `llmfit recommend` command. The JSON could be further queried with `jq`.\n```\npodman run ghcr.io\u002Falexsjones\u002Fllmfit recommend --use-case coding | jq '.models[].name'\n```\n\n### From source\n```sh\ngit clone https:\u002F\u002Fgithub.com\u002FAlexsJones\u002Fllmfit.git\ncd llmfit\ncargo build --release\n# binary is at target\u002Frelease\u002Fllmfit\n```\n\n---\n\n## Usage\n\n### TUI (default)\n\n```sh\nllmfit\n```\n\nLaunches the interactive terminal UI. Your system specs (CPU, RAM, GPU name, VRAM, backend) are shown at the top. Models are listed in a scrollable table sorted by composite score. Each row shows the model's score, estimated tok\u002Fs, best quantization for your hardware, run mode, memory usage, and use-case category.\n\n| Key                        | Action                                                                |\n|----------------------------|-----------------------------------------------------------------------|\n| `Up` \u002F `Down` or `j` \u002F `k` | Navigate models                                                       |\n| `\u002F`                        | Enter search mode (partial match on name, provider, params, use case) |\n| `Esc` or `Enter`           | Exit search mode                                                      |\n| `Ctrl-U`                   | Clear search                                                          |\n| `f`                        | Cycle fit filter: All, Runnable, Perfect, Good, Marginal              |\n| `a`                        | Cycle availability filter: All, GGUF Avail, Installed                 |\n| `s`                        | Cycle sort column: Score, Params, Mem%, Ctx, Date, Use Case           |\n| `v`                        | Enter Visual mode (select multiple models)                            |\n| `V`                        | Enter Select mode (column-based filtering)                            |\n| `t`                        | Cycle color theme (saved automatically)                               |\n| `p`                        | Open Plan mode for selected model (hardware planning)                 |\n| `P`                        | Open provider filter popup                                            |\n| `U`                        | Open use-case filter popup                                            |\n| `C`                        | Open capability filter popup                                          |\n| `L`                        | Open license filter popup                                             |\n| `R`                        | Open runtime\u002Fbackend filter popup (llama.cpp, MLX, vLLM)             |\n| `S`                        | Open hardware simulation popup (override RAM\u002FVRAM\u002FCPU)                |\n| `A`                        | Open advanced configuration popup (tune efficiency, run mode factors) |\n| `b`                        | Open community benchmarks view (localmaxxing.com)                     |\n| `h`                        | Open help popup (all key bindings)                                    |\n| `m`                        | Mark selected model for compare                                       |\n| `c`                        | Open compare view (marked vs selected)                                |\n| `x`                        | Clear compare mark                                                    |\n| `i`                        | Toggle installed-first sorting (any detected runtime provider)        |\n| `d`                        | Download selected model (provider picker when multiple are available) |\n| `D`                        | Open Download Manager (history, deletion, config)                     |\n| `r`                        | Refresh installed models from runtime providers                       |\n| `Enter`                    | Toggle detail view for selected model                                 |\n| `PgUp` \u002F `PgDn`            | Scroll by 10                                                          |\n| `g` \u002F `G`                  | Jump to top \u002F bottom                                                  |\n| `q`                        | Quit                                                                  |\n\n### Vim-like modes\n\nThe TUI uses Vim-inspired modes shown in the bottom-left status bar. The current mode determines which keys are active.\n\n#### Normal mode\n\nThe default mode. Navigate, search, filter, and open views. All keys in the table above apply here.\n\n#### Visual mode (`v`)\n\nSelect a contiguous range of models for bulk comparison. Press `v` to anchor at the current row, then navigate with `j`\u002F`k` or arrow keys to extend the selection. Selected rows are highlighted.\n\n| Key                 | Action                                                 |\n|---------------------|--------------------------------------------------------|\n| `j` \u002F `k` or arrows | Extend selection up\u002Fdown                               |\n| `c`                 | Compare all selected models (opens multi-compare view) |\n| `m`                 | Mark current model for two-model compare               |\n| `Esc` or `v`        | Exit Visual mode                                       |\n\nThe multi-compare view displays a table where rows are attributes (Score, tok\u002Fs, Fit, Mem%, Params, Mode, Context, Quant, etc.) and columns are models. Best values are highlighted. Use `h`\u002F`l` or arrow keys to scroll horizontally if more models are selected than fit on screen.\n\n#### Select mode (`V`)\n\nColumn-based actions. Press `V` (shift-v) to enter Select mode, then use `h`\u002F`l` or arrow keys to move between column headers. The active column is visually highlighted. Press `Enter` or `Space` to trigger that column's current action.\n\n| Column                        | Filter action                                                             |\n|-------------------------------|---------------------------------------------------------------------------|\n| Inst                          | Cycle availability filter                                                 |\n| Model                         | Enter search mode                                                         |\n| Provider                      | Open provider popup                                                       |\n| Params                        | Open parameter-size bucket popup (\u003C3B, 3-7B, 7-14B, 14-30B, 30-70B, 70B+) |\n| Score, tok\u002Fs, Mem%, Ctx, Date | Sort by that column                                                       |\n| Quant                         | Open quantization popup                                                   |\n| Mode                          | Open run-mode popup (GPU, MoE, CPU+GPU, CPU)                              |\n| Fit                           | Cycle fit filter                                                          |\n| Use Case                      | Open use-case popup                                                       |\n\nRow navigation still works in Select mode so you can see the effect of actions as you apply them: `j`\u002F`k`, arrow keys, `Ctrl-U`, `Ctrl-D`, `PageUp`, `PageDown`, `Home`, and `End`. Press `Esc` to return to Normal mode.\n\n### TUI Plan mode (`p`)\n\nPlan mode inverts normal fit analysis: instead of asking \"what fits my hardware?\", it estimates \"what hardware is needed for this model config?\".\n\nUse `p` on a selected row, then:\n\n| Key                    | Action                                                    |\n|------------------------|-----------------------------------------------------------|\n| `Tab` \u002F `j` \u002F `k`      | Move between editable fields (Context, Quant, Target TPS) |\n| `Left` \u002F `Right`       | Move cursor in current field                              |\n| Type                   | Edit current field                                        |\n| `Backspace` \u002F `Delete` | Remove characters                                         |\n| `Ctrl-U`               | Clear current field                                       |\n| `Esc` or `q`           | Exit Plan mode                                            |\n\nPlan mode shows estimates for:\n- minimum and recommended VRAM\u002FRAM\u002FCPU cores\n- feasible run paths (GPU, CPU offload, CPU-only)\n- upgrade deltas to reach better fit targets\n\n### Hardware Simulation (`S`)\n\nPress `S` to open the hardware simulation popup. Override RAM, VRAM, and CPU core count to see which models would fit on different target hardware. All model scores, fit levels, and speed estimates are recalculated instantly against the simulated specs.\n\n![Hardware Simulation](assets\u002Fsimulation.png)\n\n| Key                    | Action                                  |\n|------------------------|-----------------------------------------|\n| `Tab` \u002F `j` \u002F `k`      | Switch between RAM, VRAM, CPU fields    |\n| Type digits            | Edit the selected field                 |\n| `Enter`               | Apply simulation                        |\n| `Ctrl-R`              | Reset to real detected hardware         |\n| `Esc`                 | Cancel and close                        |\n\nWhen simulation is active, a `SIM` badge appears in the system bar and status bar. The entire model table reflects the simulated hardware until you reset.\n\n### Advanced Configuration (`A`)\n\nPress `A` to open the Advanced Configuration popup. This panel lets you tune the parameters behind TPS estimation, run mode penalties, and composite scoring — addressing [issue #449](https:\u002F\u002Fgithub.com\u002FAlexsJones\u002Fllmfit\u002Fissues\u002F449) where tok\u002Fs was overestimated for certain models (e.g., Qwen3 30B).\n\nAll changes are applied immediately and the model table is recalculated. Close with `Esc` to accept or `Ctrl-R` to reset to defaults.\n\n| Field              | Description                                                             | Default |\n|--------------------|-------------------------------------------------------------------------|---------|\n| **Efficiency**     | Global efficiency factor for bandwidth-based TPS. Accounts for overhead | `0.55`  |\n| **GPU factor**     | Speed multiplier for pure GPU inference                                 | `1.0`   |\n| **CPU Offload**    | Speed multiplier when weights spill to system RAM                       | `0.5`   |\n| **MoE Offload**    | Speed multiplier for Mixture-of-Experts expert switching                | `0.8`   |\n| **Tensor Par**     | Speed multiplier for tensor-parallel inference                          | `0.9`   |\n| **CPU Only**       | Speed multiplier for CPU-only execution                                 | `0.3`   |\n| **Context cap**    | Max context length used for memory estimation (leave blank for default) | `auto`  |\n\n| Key                    | Action                                  |\n|------------------------|-----------------------------------------|\n| `Tab` \u002F `j` \u002F `k`      | Switch between fields                   |\n| Type digits \u002F `.`      | Edit the selected field                 |\n| `Left` \u002F `Right`       | Move cursor within the field            |\n| `Backspace` \u002F `Delete` | Remove characters                       |\n| `Ctrl-U`               | Clear the current field                 |\n| `Enter`                | Apply changes and recalculate all scores|\n| `Esc` \u002F `q`            | Close without applying                  |\n\n### Download Manager (`D`)\n\nPress `D` to open the Download Manager view. This full-screen view replaces the main model table and provides three sections:\n\n- **Active Download** — shows the current download in progress with a progress bar, model name, and status message.\n- **Config** — displays (and allows editing) the GGUF models directory. The configured path persists across sessions.\n- **History** — a navigable list of past downloads (newest first) with model name, provider, status, and date. Failed downloads can be removed from history, and successful downloads can be deleted from the provider.\n\nUse `Tab` \u002F `Shift-Tab` to cycle focus between sections.\n\n| Key                    | Action                                           |\n|------------------------|--------------------------------------------------|\n| `Tab` \u002F `Shift-Tab`   | Cycle focus: Active → Config → History           |\n| `j` \u002F `k` or arrows   | Navigate the history list (when History focused)  |\n| `x`                   | Delete selected model (prompts for confirmation)  |\n| `y` \u002F `n`             | Confirm or cancel deletion                        |\n| `e`                   | Edit download directory (when Config focused)     |\n| `Enter`               | Confirm directory edit                            |\n| `Esc` \u002F `D` \u002F `q`    | Close and return to the model table               |\n\nFor failed downloads (e.g. 404 errors), `x` removes the entry from history. For successful downloads, it deletes the model from the provider (supported for Ollama and llama.cpp).\n\n### Community Benchmarks (`b`)\n\nPress `b` to open the Community Benchmarks view. Instead of relying solely on llmfit's theoretical speed estimates, this view shows **real-world performance data** from other users with the same hardware — actual measured tok\u002Fs, time-to-first-token, and peak VRAM usage.\n\n![Community Benchmarks](assets\u002Fbenchmark.jpeg)\n\nData is sourced from [localmaxxing.com](https:\u002F\u002Flocalmaxxing.com), a community benchmark database. When you open the view, llmfit auto-detects your hardware (GPU model, VRAM tier, Apple Silicon chip family, OS) and queries for matching results.\n\n| Column       | Description                                              |\n|--------------|----------------------------------------------------------|\n| **Model**    | HuggingFace model ID                                     |\n| **Engine**   | Inference runtime used (llama.cpp, vLLM, Ollama, MLX...) |\n| **Quant**    | Quantization format (Q4_K_M, Q8_0, etc.)                |\n| **tok\u002Fs**    | Measured output token generation speed                   |\n| **Total t\u002Fs**| Total throughput (prompt + generation)                   |\n| **TTFT**     | Time to first token (latency)                            |\n| **VRAM**     | Peak memory usage during inference                       |\n| **Ctx**      | Context length used in the benchmark                     |\n| **User**     | Submitter (verified users marked with `*`)               |\n\n| Key                    | Action                                  |\n|------------------------|-----------------------------------------|\n| `j` \u002F `k` or arrows   | Navigate results                        |\n| `H`                    | Open hardware picker (browse any GPU)   |\n| `r`                    | Refresh \u002F re-fetch from API             |\n| `b` \u002F `q` \u002F `Esc`     | Close and return to model table         |\n\nPress `H` to open the hardware picker — a scrollable list of 27 popular GPUs and chips (RTX 5090 through CPU-only, plus Apple Silicon M1–M4 variants, AMD RX\u002FMI series, and NVIDIA datacenter cards). Select one to instantly load benchmarks for that hardware, even if it's not what you're running on. Select \"My Hardware (auto-detect)\" to go back to your own system.\n\n#### API key setup\n\nPublic benchmarks work without authentication. For full access, provide your [localmaxxing.com](https:\u002F\u002Flocalmaxxing.com) API key:\n\n```sh\n# Via environment variable (recommended)\nexport LOCALMAXXING_API_KEY=\"bhk_your_key_here\"\nllmfit\n\n# Or via CLI flag\nllmfit --api-key \"bhk_your_key_here\"\n```\n\n| Variable | Description |\n|---|---|\n| `LOCALMAXXING_API_KEY` | Bearer token for localmaxxing.com API |\n\n### Themes\n\nPress `t` to cycle through 10 built-in color themes. Your selection is saved automatically to `~\u002F.config\u002Fllmfit\u002Ftheme` and restored on next launch.\n\n| Theme                    | Description                                       |\n|--------------------------|---------------------------------------------------|\n| **Default**              | Original llmfit colors                            |\n| **Dracula**              | Dark purple background with pastel accents        |\n| **Solarized**            | Ethan Schoonover's Solarized Dark palette         |\n| **Nord**                 | Arctic, cool blue-gray tones                      |\n| **Monokai**              | Monokai Pro warm syntax colors                    |\n| **Gruvbox**              | Retro groove palette with warm earth tones        |\n| **Catppuccin Latte**     | 🌻 Light theme — harmonious pastel inversion      |\n| **Catppuccin Frappé**    | 🪴 Low-contrast dark — muted, subdued aesthetic   |\n| **Catppuccin Macchiato** | 🌺 Medium-contrast dark — gentle, soothing tones  |\n| **Catppuccin Mocha**     | 🌿 Darkest variant — cozy with color-rich accents |\n\n### Web dashboard\n\nWhen you run `llmfit` in non-JSON mode, it automatically starts a background web dashboard on `0.0.0.0:8787`. Open it in any browser on the same network:\n\n```\nhttp:\u002F\u002F\u003Cyour-machine-ip>:8787\n```\n\nOverride the host or port with environment variables:\n\n```sh\nLLMFIT_DASHBOARD_HOST=0.0.0.0 LLMFIT_DASHBOARD_PORT=9000 llmfit\n```\n\n| Variable | Default | Description |\n|---|---|---|\n| `LLMFIT_DASHBOARD_HOST` | `0.0.0.0` | Interface to bind the dashboard server |\n| `LLMFIT_DASHBOARD_PORT` | `8787` | Port to bind the dashboard server |\n\nTo disable the auto-started dashboard, pass `--no-dashboard`:\n\n```sh\nllmfit --no-dashboard\n```\n\n### CLI mode\n\nUse `--cli` or any subcommand to get classic table output:\n\n```sh\n# Table of all models ranked by fit\nllmfit --cli\n\n# Only perfectly fitting models, top 5\nllmfit fit --perfect -n 5\n\n# Show detected system specs\nllmfit system\n\n# List all models in the database\nllmfit list\n\n# Search by name, provider, or size\nllmfit search \"llama 8b\"\n\n# Detailed view of a single model\nllmfit info \"Mistral-7B\"\n\n# Top 5 recommendations (JSON, for agent\u002Fscript consumption)\nllmfit recommend --json --limit 5\n\n# Recommendations filtered by use case\nllmfit recommend --json --use-case coding --limit 3\n\n# Force a specific runtime (bypass automatic MLX selection on Apple Silicon)\nllmfit recommend --force-runtime llamacpp\nllmfit recommend --force-runtime llamacpp --use-case coding --limit 3\n\n# Plan required hardware for a specific model configuration\nllmfit plan \"Qwen\u002FQwen3-4B-MLX-4bit\" --context 8192\nllmfit plan \"Qwen\u002FQwen3-4B-MLX-4bit\" --context 8192 --quant mlx-4bit\nllmfit plan \"Qwen\u002FQwen3-4B-MLX-4bit\" --context 8192 --target-tps 25 --json\n\n# Run as a node-level REST API (for cluster schedulers \u002F aggregators)\nllmfit serve --host 0.0.0.0 --port 8787\n```\n\n### REST API (`llmfit serve`)\n\n`llmfit serve` starts an HTTP API that exposes the same fit\u002Fscoring data used by TUI\u002FCLI, including filtering and top-model selection for a node.\n\n```sh\n# Liveness\ncurl http:\u002F\u002Flocalhost:8787\u002Fhealth\n\n# Node hardware info\ncurl http:\u002F\u002Flocalhost:8787\u002Fapi\u002Fv1\u002Fsystem\n\n# Full fit list with filters\ncurl \"http:\u002F\u002Flocalhost:8787\u002Fapi\u002Fv1\u002Fmodels?min_fit=marginal&runtime=llamacpp&sort=score&limit=20\"\n\n# Key scheduling endpoint: top runnable models for this node\ncurl \"http:\u002F\u002Flocalhost:8787\u002Fapi\u002Fv1\u002Fmodels\u002Ftop?limit=5&min_fit=good&use_case=coding\"\n\n# Search by model name\u002Fprovider text\ncurl \"http:\u002F\u002Flocalhost:8787\u002Fapi\u002Fv1\u002Fmodels\u002FMistral?runtime=any\"\n```\n\nSupported query params for `models`\u002F`models\u002Ftop`:\n\n- `limit` (or `n`): max number of rows returned\n- `perfect`: `true|false` (forces perfect-only when `true`)\n- `min_fit`: `perfect|good|marginal|too_tight`\n- `runtime`: `any|mlx|llamacpp`\n- `use_case`: `general|coding|reasoning|chat|multimodal|embedding`\n- `provider`: provider text filter (substring)\n- `search`: free-text filter across name\u002Fprovider\u002Fsize\u002Fuse-case\n- `sort`: `score|tps|params|mem|ctx|date|use_case`\n- `include_too_tight`: include non-runnable rows (default `false` on `\u002Ftop`, `true` on `\u002Fmodels`)\n- `max_context`: per-request context cap for memory estimation\n- `force_runtime`: `mlx|llamacpp|vllm` — override automatic runtime selection during analysis\n\nValidate API behavior locally:\n\n```sh\n# spawn server automatically and run endpoint\u002Fschema\u002Ffilter assertions\npython3 scripts\u002Ftest_api.py --spawn\n\n# or test an already-running server\npython3 scripts\u002Ftest_api.py --base-url http:\u002F\u002F127.0.0.1:8787\n```\n\n### Hardware overrides\n\nHardware autodetection can fail on some systems (e.g. broken `nvidia-smi`, VMs, passthrough setups), or you may want to evaluate model fit against different target hardware. Use `--memory`, `--ram`, and `--cpu-cores` to override detected values:\n\n```sh\n# Override GPU VRAM\nllmfit --memory=32G\n\n# Override system RAM\nllmfit --ram=128G\n\n# Override CPU core count\nllmfit --cpu-cores=16\n\n# Combine overrides to simulate target hardware\nllmfit --memory=24G --ram=64G --cpu-cores=8 fit\nllmfit --memory=24G --ram=64G system --json\n\n# Works with all modes: TUI, CLI, and subcommands\nllmfit --memory=24G --cli\nllmfit --memory=24G fit --perfect -n 5\nllmfit --ram=64G recommend --json\n```\n\nAccepted suffixes for `--memory` and `--ram`: `G`\u002F`GB`\u002F`GiB` (gigabytes), `M`\u002F`MB`\u002F`MiB` (megabytes), `T`\u002F`TB`\u002F`TiB` (terabytes). Case-insensitive. If no GPU was detected, `--memory` creates a synthetic GPU entry so models are scored for GPU inference. On unified-memory systems (Apple Silicon), `--ram` also updates VRAM; use `--memory` to override VRAM independently.\n\n### Context-length cap for estimation\n\nUse `--max-context` to cap context length used for memory estimation (without changing each model's advertised maximum context):\n\n```sh\n# Estimate memory fit at 4K context\nllmfit --max-context 4096 --cli\n\n# Works with subcommands\nllmfit --max-context 8192 fit --perfect -n 5\nllmfit --max-context 16384 recommend --json --limit 5\n```\n\nIf `--max-context` is not set, llmfit will use `OLLAMA_CONTEXT_LENGTH` when available.\n\n### JSON output\n\nAdd `--json` to any subcommand for machine-readable output:\n\n```sh\nllmfit --json system     # Hardware specs as JSON\nllmfit --json fit -n 10  # Top 10 fits as JSON\nllmfit recommend --json  # Top 5 recommendations (JSON is default for recommend)\nllmfit plan \"Qwen\u002FQwen2.5-Coder-0.5B-Instruct\" --context 8192 --json\n```\n\n`plan` JSON includes stable fields for:\n- request (`context`, `quantization`, `target_tps`)\n- estimated minimum\u002Frecommended hardware\n- per-path feasibility (`gpu`, `cpu_offload`, `cpu_only`)\n- upgrade deltas\n\n---\n\n## How it works\n\n1. **Hardware detection** -- Reads total\u002Favailable RAM via `sysinfo`, counts CPU cores, and probes for GPUs:\n   - **NVIDIA** -- Multi-GPU support via `nvidia-smi`. Aggregates VRAM across all detected GPUs. Falls back to VRAM estimation from GPU model name if reporting fails.\n   - **AMD** -- Detected via `rocm-smi`.\n   - **Intel Arc** -- Discrete VRAM via sysfs, integrated via `lspci`.\n   - **Apple Silicon** -- Unified memory via `system_profiler`. VRAM = system RAM.\n   - **Ascend** -- Detected via `npu-smi`.\n   - **Backend detection** -- Automatically identifies the acceleration backend (CUDA, Metal, ROCm, SYCL, CPU ARM, CPU x86, Ascend) for speed estimation.\n\n2. **Model database** -- Hundreds models sourced from the HuggingFace API, stored in `data\u002Fhf_models.json` and embedded at compile time. Memory requirements are computed from parameter counts across a quantization hierarchy (Q8_0 through Q2_K). VRAM is the primary constraint for GPU inference; system RAM is the fallback for CPU-only execution.\n\n   **MoE support** -- Models with Mixture-of-Experts architectures (Mixtral, DeepSeek-V2\u002FV3) are detected automatically. Only a subset of experts is active per token, so the effective VRAM requirement is much lower than total parameter count suggests. For example, Mixtral 8x7B has 46.7B total parameters but only activates ~12.9B per token, reducing VRAM from 23.9 GB to ~6.6 GB with expert offloading.\n\n3. **Dynamic quantization** -- Instead of assuming a fixed quantization, llmfit tries the best quality quantization that fits your hardware. It walks a hierarchy from Q8_0 (best quality) down to Q2_K (most compressed), picking the highest quality that fits in available memory. If nothing fits at full context, it tries again at half context.\n\n4. **Multi-dimensional scoring** -- Each model is scored across four dimensions (0–100 each):\n\n   | Dimension   | What it measures                                                               |\n   |-------------|--------------------------------------------------------------------------------|\n   | **Quality** | Parameter count, model family reputation, quantization penalty, task alignment |\n   | **Speed**   | Estimated tokens\u002Fsec based on backend, params, and quantization                |\n   | **Fit**     | Memory utilization efficiency (sweet spot: 50–80% of available memory)         |\n   | **Context** | Context window capability vs target for the use case                           |\n\n   Dimensions are combined into a weighted composite score. Weights vary by use-case category (General, Coding, Reasoning, Chat, Multimodal, Embedding). For example, Chat weights Speed higher (0.35) while Reasoning weights Quality higher (0.55). Models are ranked by composite score, with unrunnable models (Too Tight) always at the bottom.\n\n5. **Speed estimation** -- Token generation in LLM inference is memory-bandwidth-bound: each token requires reading the full model weights once from VRAM. When the GPU model is recognized, llmfit uses its actual memory bandwidth to estimate throughput:\n\n   Formula: `(bandwidth_GB_s \u002F model_size_GB) × efficiency_factor`\n\n   The efficiency factor (0.55) and per-mode speed multipliers are tunable via the Advanced Configuration popup (`A` in the TUI). The defaults account for kernel overhead, KV-cache reads, and memory controller effects. This approach is validated against published benchmarks from llama.cpp ([Apple Silicon](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp\u002Fdiscussions\u002F4167), [NVIDIA T4](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp\u002Fdiscussions\u002F4225)) and real-world measurements.\n\n   The bandwidth lookup table covers ~80 GPUs across NVIDIA (consumer + datacenter), AMD (RDNA + CDNA), and Apple Silicon families.\n\n   For unrecognized GPUs, llmfit falls back to per-backend speed constants:\n\n   | Backend      | Speed constant |\n   |--------------|----------------|\n   | CUDA         | 220            |\n   | Metal        | 160            |\n   | ROCm         | 180            |\n   | SYCL         | 100            |\n   | CPU (ARM)    | 90             |\n   | CPU (x86)    | 70             |\n   | NPU (Ascend) | 390            |\n\n   Fallback formula: `K \u002F params_b × quant_speed_multiplier`, with per-mode penalties tunable via the Advanced Configuration popup (`A` in the TUI).\n\n6. **Fit analysis** -- Each model is evaluated for memory compatibility:\n\n   **Run modes:**\n   - **GPU** -- Model fits in VRAM. Fast inference.\n   - **MoE** -- Mixture-of-Experts with expert offloading. Active experts in VRAM, inactive in RAM.\n   - **CPU+GPU** -- VRAM insufficient, spills to system RAM with partial GPU offload.\n   - **CPU** -- No GPU. Model loaded entirely into system RAM.\n\n   **Fit levels:**\n   - **Perfect** -- Recommended memory met on GPU. Requires GPU acceleration.\n   - **Good** -- Fits with headroom. Best achievable for MoE offload or CPU+GPU.\n   - **Marginal** -- Tight fit, or CPU-only (CPU-only always caps here).\n   - **Too Tight** -- Not enough VRAM or system RAM anywhere.\n\n---\n\n## Model database\n\nThe model list is generated by `scripts\u002Fscrape_hf_models.py`, a standalone Python script (stdlib only, no pip dependencies) that queries the HuggingFace REST API. Hundreds models & providers including Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, IBM Granite, Allen Institute OLMo, xAI Grok, Cohere, BigCode, 01.ai, Upstage, TII Falcon, HuggingFace, Zhipu GLM, Moonshot Kimi, Baidu ERNIE, and more. The scraper automatically detects MoE architectures via model config (`num_local_experts`, `num_experts_per_tok`) and known architecture mappings.\n\nModel categories span general purpose, coding (CodeLlama, StarCoder2, WizardCoder, Qwen2.5-Coder, Qwen3-Coder), reasoning (DeepSeek-R1, Orca-2), multimodal\u002Fvision (Llama 3.2 Vision, Llama 4 Scout\u002FMaverick, Qwen2.5-VL), chat, enterprise (IBM Granite), and embedding (nomic-embed, bge).\n\nSee [MODELS.md](MODELS.md) for the full list.\n\nThe model database is embedded at compile time, so **end users** get updates by upgrading llmfit itself (`brew upgrade llmfit`, `scoop update llmfit`, or downloading a newer release). The commands below are for **contributors** refreshing the database from source:\n\nTo refresh the model database:\n\n```sh\n# Automated update (recommended)\nmake update-models\n\n# Or run the script directly\n.\u002Fscripts\u002Fupdate_models.sh\n\n# Or manually\npython3 scripts\u002Fscrape_hf_models.py\ncargo build --release\n```\n\nThe scraper writes `data\u002Fhf_models.json`, which is baked into the binary via `include_str!`. The automated update script backs up existing data, validates JSON output, and rebuilds the binary.\n\nBy default, the scraper enriches models with known GGUF download sources from providers like [unsloth](https:\u002F\u002Fhuggingface.co\u002Funsloth) and [bartowski](https:\u002F\u002Fhuggingface.co\u002Fbartowski). Results are cached in `data\u002Fgguf_sources_cache.json` (7-day TTL) to avoid repeated API calls. Use `--no-gguf-sources` to skip enrichment for a faster scrape.\n\n---\n\n## Project structure\n\n```\nsrc\u002F\n  main.rs         -- CLI argument parsing, entrypoint, TUI launch\n  hardware.rs     -- System RAM\u002FCPU\u002FGPU detection (multi-GPU, backend identification)\n  models.rs       -- Model database, quantization hierarchy, dynamic quant selection\n  fit.rs          -- Multi-dimensional scoring (Q\u002FS\u002FF\u002FC), speed estimation, MoE offloading\n  providers.rs    -- Runtime provider integration (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio), install detection, pull\u002Fdownload\n  display.rs      -- Classic CLI table rendering + JSON output\n  tui_app.rs      -- TUI application state, filters, navigation\n  tui_ui.rs       -- TUI rendering (ratatui)\n  tui_events.rs   -- TUI keyboard event handling (crossterm)\ndata\u002F\n  hf_models.json  -- Model database (206 models)\nskills\u002F\n  llmfit-advisor\u002F -- OpenClaw skill for hardware-aware model recommendations\nscripts\u002F\n  scrape_hf_models.py        -- HuggingFace API scraper\n  update_models.sh            -- Automated database update script\n  install-openclaw-skill.sh   -- Install the OpenClaw skill\nMakefile           -- Build and maintenance commands\n```\n\n---\n\n## Publishing to crates.io\n\nThe `Cargo.toml` already includes the required metadata (description, license, repository). To publish:\n\n```sh\n# Dry run first to catch issues\ncargo publish --dry-run\n\n# Publish for real (requires a crates.io API token)\ncargo login\ncargo publish\n```\n\nBefore publishing, make sure:\n\n- The version in `Cargo.toml` is correct (bump with each release).\n- A `LICENSE` file exists in the repo root. Create one if missing:\n\n```sh\n# For MIT license:\ncurl -sL https:\u002F\u002Fopensource.org\u002Flicense\u002FMIT -o LICENSE\n# Or write your own. The Cargo.toml declares license = \"MIT\".\n```\n\n- `data\u002Fhf_models.json` is committed. It is embedded at compile time and must be present in the published crate.\n\nTo publish updates:\n\n```sh\n# Bump version\n# Edit Cargo.toml: version = \"0.2.0\"\ncargo publish\n```\n\n---\n\n## Dependencies\n\n| Crate                  | Purpose                                          |\n|------------------------|--------------------------------------------------|\n| `clap`                 | CLI argument parsing with derive macros          |\n| `sysinfo`              | Cross-platform RAM and CPU detection             |\n| `serde` \u002F `serde_json` | JSON deserialization for model database          |\n| `tabled`               | CLI table formatting                             |\n| `colored`              | CLI colored output                               |\n| `ureq`                 | HTTP client for runtime\u002Fprovider API integration |\n| `ratatui`              | Terminal UI framework                            |\n| `crossterm`            | Terminal input\u002Foutput backend for ratatui        |\n\n---\n\n## Runtime provider integration\n\nllmfit supports multiple local runtime providers:\n\n- **Ollama** (daemon\u002FAPI based pulls)\n- **llama.cpp** (direct GGUF downloads from Hugging Face + local cache detection)\n- **MLX** (Apple Silicon \u002F mlx-community model cache + optional server) — MLX downloads map to `mlx-community\u002F*` repos on HuggingFace, not the original model publisher\n- **Docker Model Runner** (Docker Desktop's built-in model serving)\n- **LM Studio** (local model server with REST API for model management + downloads)\n\nWhen more than one compatible provider is available for a model, pressing `d` in the TUI opens a provider picker modal.\n\n### Ollama integration\n\nllmfit integrates with [Ollama](https:\u002F\u002Follama.com) to detect which models you already have installed and to download new ones directly from the TUI.\n\n### Requirements\n\n- **Ollama must be installed and running** (`ollama serve` or the Ollama desktop app)\n- llmfit connects to `http:\u002F\u002Flocalhost:11434` (Ollama's default API port)\n- No configuration needed — if Ollama is running, llmfit detects it automatically\n\n### Remote Ollama instances\n\nTo connect to Ollama running on a different machine or port, set the `OLLAMA_HOST` environment variable:\n\n```sh\n# Connect to Ollama on a specific IP and port\nOLLAMA_HOST=\"http:\u002F\u002F192.168.1.100:11434\" llmfit\n\n# Connect via hostname  \nOLLAMA_HOST=\"http:\u002F\u002Follama-server:666\" llmfit\n\n# Works with all TUI and CLI commands\nOLLAMA_HOST=\"http:\u002F\u002F192.168.1.100:11434\" llmfit --cli\nOLLAMA_HOST=\"http:\u002F\u002F192.168.1.100:11434\" llmfit fit --perfect -n 5\n```\n\nThis is useful for:\n- Running llmfit on one machine while Ollama serves from another (e.g., GPU server + laptop client)\n- Connecting to Ollama running in Docker containers with custom ports\n- Using Ollama behind reverse proxies or load balancers\n\n### How it works\n\nOn startup, llmfit queries `GET \u002Fapi\u002Ftags` to list your installed Ollama models. Each installed model gets a green **✓** in the **Inst** column of the TUI. The system bar shows `Ollama: ✓ (N installed)`.\n\nWhen you press `d` on a model, llmfit sends `POST \u002Fapi\u002Fpull` to Ollama to download it. The row highlights with an animated progress indicator showing download progress in real-time. Once complete, the model is immediately available for use with Ollama.\n\nIf Ollama is not running, Ollama-specific operations are skipped; the TUI still supports other providers like llama.cpp where available.\n\n### llama.cpp integration\n\nllmfit integrates with [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp) as a runtime\u002Fdownload provider in both TUI and CLI.\n\nRequirements:\n\n- `llama-cli` or `llama-server` available in `PATH` (for runtime detection)\n- network access to Hugging Face for GGUF downloads\n\nHow it works:\n\n- llmfit maps HF models to known GGUF repos (with heuristic fallbacks)\n- downloads GGUF files into the local llama.cpp model cache\n- marks models installed when matching GGUF files are present locally\n\n#### Environment variables\n\n| Variable | Default | Description |\n|---|---|---|\n| `LLAMA_CPP_PATH` | *(none)* | Directory containing llama.cpp binaries (`llama-cli`, `llama-server`). Checked before `PATH` lookup. |\n| `LLAMA_SERVER_PORT` | `8080` | Port used when probing a running `llama-server` health endpoint for runtime detection. |\n\nIf llama.cpp is installed in a non-standard location, set `LLAMA_CPP_PATH` so llmfit can find it without requiring it in your `PATH`.\n\n### Docker Model Runner integration\n\nllmfit integrates with [Docker Model Runner](https:\u002F\u002Fdocs.docker.com\u002Fdesktop\u002Ffeatures\u002Fmodel-runner\u002F), Docker Desktop's built-in model serving feature.\n\nRequirements:\n\n- Docker Desktop with Model Runner enabled\n- Default endpoint: `http:\u002F\u002Flocalhost:12434`\n\nHow it works:\n\n- llmfit queries `GET \u002Fengines` to list models available in Docker Model Runner\n- models are matched to the HF database using Ollama-style tag mapping (Docker Model Runner uses `ai\u002F\u003Ctag>` naming)\n- pressing `d` in the TUI pulls via `docker model pull`\n\n### Remote Docker Model Runner instances\n\nTo connect to Docker Model Runner on a different host or port, set the `DOCKER_MODEL_RUNNER_HOST` environment variable:\n\n```sh\nDOCKER_MODEL_RUNNER_HOST=\"http:\u002F\u002F192.168.1.100:12434\" llmfit\n```\n\n### LM Studio integration\n\nllmfit integrates with [LM Studio](https:\u002F\u002Flmstudio.ai) as a local model server with built-in model download capabilities.\n\nRequirements:\n\n- LM Studio must be running with its local server enabled\n- Default endpoint: `http:\u002F\u002F127.0.0.1:1234`\n\nHow it works:\n\n- llmfit queries `GET \u002Fv1\u002Fmodels` to list models available in LM Studio\n- pressing `d` in the TUI triggers a download via `POST \u002Fapi\u002Fv1\u002Fmodels\u002Fdownload`\n- download progress is tracked by polling `GET \u002Fapi\u002Fv1\u002Fmodels\u002Fdownload-status`\n- LM Studio accepts HuggingFace model names directly, so no name mapping is needed\n\n### Remote LM Studio instances\n\nTo connect to LM Studio on a different host or port, set the `LMSTUDIO_HOST` environment variable:\n\n```sh\nLMSTUDIO_HOST=\"http:\u002F\u002F192.168.1.100:1234\" llmfit\n```\n\n### Model name mapping\n\nllmfit's database uses HuggingFace model names (e.g. `Qwen\u002FQwen2.5-Coder-14B-Instruct`) while Ollama uses its own naming scheme (e.g. `qwen2.5-coder:14b`). llmfit maintains an accurate mapping table between the two so that install detection and pulls resolve to the correct model. Each mapping is exact — `qwen2.5-coder:14b` maps to the Coder model, not the base `qwen2.5:14b`.\n\n---\n\n## Platform support\n\n- **Linux** -- Full support. GPU detection via `nvidia-smi` (NVIDIA), `rocm-smi` (AMD), sysfs\u002F`lspci` (Intel Arc) and `npu-smi` (Ascend).\n- **macOS (Apple Silicon)** -- Full support. Detects unified memory via `system_profiler`. VRAM = system RAM (shared pool). Models run via Metal GPU acceleration.\n- **macOS (Intel)** -- RAM and CPU detection works. Discrete GPU detection if `nvidia-smi` available.\n- **Windows** -- RAM and CPU detection works. NVIDIA GPU detection via `nvidia-smi` if installed.\n- **Android \u002F Termux \u002F PRoot** -- CPU and RAM detection usually work, but GPU autodetection is not currently supported. Mobile GPUs such as Adreno typically are not visible through the desktop\u002Fserver probing interfaces llmfit uses.\n\n### GPU support\n\n| Vendor                 | Detection method              | VRAM reporting                 |\n|------------------------|-------------------------------|--------------------------------|\n| NVIDIA                 | `nvidia-smi`                  | Exact dedicated VRAM           |\n| AMD                    | `rocm-smi`                    | Detected (VRAM may be unknown) |\n| Intel Arc (discrete)   | sysfs (`mem_info_vram_total`) | Exact dedicated VRAM           |\n| Intel Arc (integrated) | `lspci`                       | Shared system memory           |\n| Apple Silicon          | `system_profiler`             | Unified memory (= system RAM)  |\n| Ascend                 | `npu-smi`                     | Detected (VRAM may be unknown) |\n\nIf autodetection fails or reports incorrect values, use `--memory`, `--ram`, or `--cpu-cores` to override (see [Hardware overrides](#hardware-overrides) above).\n\n### Android \u002F Termux note\n\nOn Android setups such as **Termux + PRoot**, llmfit usually cannot see mobile GPUs through the standard Linux detection paths (`nvidia-smi`, `rocm-smi`, DRM\u002Fsysfs, `lspci`, etc.). In those environments, \"no GPU detected\" is expected with the current implementation.\n\nIf you still want GPU-style recommendations on a unified-memory phone or tablet, use a manual memory override:\n\n```sh\nllmfit --memory=8G fit -n 20\nllmfit recommend --json --memory=8G --limit 10\n```\n\nThis is a workaround for recommendation\u002Fscoring only; it does not provide true Android GPU runtime detection.\n\n---\n\n## Contributing\n\nContributions are welcome, especially new models.\n\n### Before submitting a PR\n\nPlease run `cargo fmt` before pushing your changes. Most CI check failures are caused by unformatted code:\n\n```sh\ncargo fmt\n```\n\n### Adding a model\n\n1. Add the model's HuggingFace repo ID (e.g., `meta-llama\u002FLlama-3.1-8B`) to the `TARGET_MODELS` list in `scripts\u002Fscrape_hf_models.py`.\n2. If the model is gated (requires HuggingFace authentication to access metadata), add a fallback entry to the `FALLBACKS` list in the same script with the parameter count and context length.\n3. Run the automated update script:\n   ```sh\n   make update-models\n   # or: .\u002Fscripts\u002Fupdate_models.sh\n   ```\n4. Verify the updated model list: `.\u002Ftarget\u002Frelease\u002Fllmfit list`\n5. Update [MODELS.md](MODELS.md) by running: `python3 \u003C\u003C 'EOF' \u003C scripts\u002F...` (see commit history for the generator script)\n6. Open a pull request.\n\nSee [MODELS.md](MODELS.md) for the current list and [AGENTS.md](AGENTS.md) for architecture details.\n\n---\n\n## OpenClaw integration\n\nllmfit ships as an [OpenClaw](https:\u002F\u002Fgithub.com\u002Fopenclaw\u002Fopenclaw) skill that lets the agent recommend hardware-appropriate local models and auto-configure Ollama\u002FvLLM\u002FLM Studio providers.\n\n### Install the skill\n\n```sh\n# From the llmfit repo\n.\u002Fscripts\u002Finstall-openclaw-skill.sh\n\n# Or manually\ncp -r skills\u002Fllmfit-advisor ~\u002F.openclaw\u002Fskills\u002F\n```\n\nOnce installed, ask your OpenClaw agent things like:\n\n- \"What local models can I run?\"\n- \"Recommend a coding model for my hardware\"\n- \"Set up Ollama with the best models for my GPU\"\n\nThe agent will call `llmfit recommend --json` under the hood, interpret the results, and offer to configure your `openclaw.json` with optimal model choices.\n\n### How it works\n\nThe skill teaches the OpenClaw agent to:\n\n1. Detect your hardware via `llmfit --json system`\n2. Get ranked recommendations via `llmfit recommend --json`\n3. Map HuggingFace model names to Ollama\u002FvLLM\u002FLM Studio tags\n4. Configure `models.providers.ollama.models` in `openclaw.json`\n\nSee [skills\u002Fllmfit-advisor\u002FSKILL.md](skills\u002Fllmfit-advisor\u002FSKILL.md) for the full skill definition.\n\n---\n\n## Alternatives\n\nIf you're looking for a different approach, check out [llm-checker](https:\u002F\u002Fgithub.com\u002FPavelevich\u002Fllm-checker) -- a Node.js CLI tool with Ollama integration that can pull and benchmark models directly. It takes a more hands-on approach by actually running models on your hardware via Ollama, rather than estimating from specs. Good if you already have Ollama installed and want to test real-world performance. Note that it doesn't support MoE (Mixture-of-Experts) architectures -- all models are treated as dense, so memory estimates for models like Mixtral or DeepSeek-V3 will reflect total parameter count rather than the smaller active subset.\n\n---\n\n## License\n\nMIT\n","llmfit 是一个终端工具，旨在帮助用户根据自身硬件条件（RAM、CPU 和 GPU）选择最适合的大规模语言模型。该项目采用 Rust 语言编写，支持数百种模型及服务提供商，并通过单一命令行指令自动检测用户的硬件配置，评估模型在质量、速度、适配度和上下文处理能力等方面的性能，从而推荐最合适的模型。它还提供了交互式 TUI 和经典 CLI 模式，支持多 GPU 设置、MoE 架构、动态量化选择等功能。此外，llmfit 引入了社区基准测试功能，让用户可以查看其他相同硬件配置下的实际运行表现数据。此工具非常适合需要高效利用本地资源进行大规模语言模型训练或推理的开发者和研究人员使用。",2,"2026-06-11 03:02:56","top_language"]