[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-70539":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":26,"discoverSource":27},70539,"pi-llamacpp","mitsuhiko\u002Fpi-llamacpp","mitsuhiko","An experimental pi extension that runs and manages qwen with llama.cpp",null,"TypeScript",144,12,102,2,0,30,3.34,"MIT License",false,"main",true,[],"2026-06-12 02:02:34","# pi-llamacpp\n\nPi provider extension for running Pi self-managed local\n[llama.cpp](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp) inference.\n\nThe extension registers Qwen3.6 GGUF models under the `llamacpp` provider,\ndownloads\u002Fbuilds a matching llama.cpp runtime and downloads the selected GGUF on\nfirst use, starts `llama-server`, and stops it automatically when pi shuts down.\n\n## Models\n\nCurrently registered:\n\n- `llamacpp\u002Fqwen-3.6-dense-2bit` (27B dense)\n- `llamacpp\u002Fqwen-3.6-dense-4bit` (27B dense)\n- `llamacpp\u002Fqwen-3.6-dense-8bit` (27B dense)\n- `llamacpp\u002Fqwen-3.6-moe-2bit` (35B-A3B MoE)\n- `llamacpp\u002Fqwen-3.6-moe-4bit` (35B-A3B MoE)\n- `llamacpp\u002Fqwen-3.6-moe-8bit` (35B-A3B MoE)\n\nThe model names describe the architecture:\n\n- `dense` is the Qwen3.6 27B dense model. All parameters participate in every\n  token, which makes compute and memory use more direct and predictable.\n- `moe` is the Qwen3.6 35B-A3B Mixture-of-Experts model. It has about 35B total\n  parameters, but routes each token through only a small active subset of\n  experts (about 3B active parameters). MoE can offer more total capacity for a\n  similar amount of active compute, but the full expert weights still need to be\n  stored and loaded.\n\nThe `moe` (35B-A3B) models are downloaded from\n[`havenoammo\u002FQwen3.6-35B-A3B-MTP-GGUF`](https:\u002F\u002Fhuggingface.co\u002Fhavenoammo\u002FQwen3.6-35B-A3B-MTP-GGUF)\nat revision `44ce525026e7e7d0e0915dc1bf83a783c813e75a`, and the `dense`\n(27B) models are downloaded from\n[`froggeric\u002FQwen3.6-27B-MTP-GGUF`](https:\u002F\u002Fhuggingface.co\u002Ffroggeric\u002FQwen3.6-27B-MTP-GGUF)\nat revision `431204640c8511573e61a7964a12cc452114a223`. Pinning the\nrevisions keeps downloads reproducible if upstream `main` moves; set\n`LLAMACPP_QWEN_35B_A3B_REVISION`, `LLAMACPP_QWEN_27B_REVISION`, or\n`LLAMACPP_QWEN_REVISION` to override.\nThese files need llama.cpp MTP\u002FNextN support, so the default runtime path builds\na pinned snapshot of [llama.cpp pull request #22673](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp\u002Fpull\u002F22673)\ninstead of using the stock binary release.\n\n## Install\n\n```sh\npi install https:\u002F\u002Fgithub.com\u002Fmitsuhiko\u002Fpi-llamacpp\n```\n\nFor local development from this checkout:\n\n```sh\n.\u002Finstall-pi-extension-local.sh\n```\n\nThen restart Pi or run `\u002Freload`.\n\n## Runtime layout\n\nRuntime state is kept under `~\u002F.pi\u002Fllamacpp`:\n\n- `source\u002F`: pinned llama.cpp source snapshots built locally (default: [PR #22673](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp\u002Fpull\u002F22673) snapshot for MTP\u002FNextN support)\n- `runtime\u002F`: extracted llama.cpp release archives when `LLAMACPP_RUNTIME_KIND=release`\n- `downloads\u002F`: release archives and resumable `.part` files\n- `models\u002Fhavenoammo\u002FQwen3.6-35B-A3B-MTP-GGUF\u002F`: cached `moe` (35B-A3B) GGUF model files\n- `models\u002Ffroggeric\u002FQwen3.6-27B-MTP-GGUF\u002F`: cached `dense` (27B) GGUF model files\n- `clients\u002F`: active Pi process leases\n- `server.json`: managed `llama-server` state\n- `log`: download\u002Fextract\u002Fserver\u002Fwatchdog log\n\nThe managed server binds to a random localhost port by default and records the\nactive endpoint in `server.json`. Set `LLAMACPP_PORT` only if you explicitly\nwant a fixed port.\n\n## Debugging\n\nUse `\u002Fllamacpp` inside Pi to show the live llama.cpp log, `\u002Fllamacpp status` for\npaths\u002Fstatus, and `\u002Fllamacpp stop` to stop the managed server when no other\nleases are active.\n","pi-llamacpp 是一个实验性的扩展项目，用于在本地运行和管理基于 llama.cpp 的 Qwen 模型。它通过注册 Qwen3.6 GGUF 模型、自动下载匹配的 llama.cpp 运行时及选定的 GGUF 模型，并启动 `llama-server` 服务来实现这一功能，在 Pi 关闭时自动停止服务。该项目支持多种量化版本的密集（27B）与混合专家（35B-A3B MoE）模型，以适应不同场景下的计算资源需求。适合需要在本地部署高性能语言模型进行推理的应用场合，例如自然语言处理任务中的文本生成、对话系统等。","2026-06-11 03:32:42","CREATED_QUERY"]