[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2943":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":13,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":10,"rankLanguage":10,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":14,"starSnapshotCount":14,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},2943,"QuantClaw-plugin","SparkEngineAI\u002FQuantClaw-plugin","SparkEngineAI","QuantClaw is a plug-and-play task-type routing quantization plugin for OpenClaw.","https:\u002F\u002Fsparkengineai.github.io\u002FQuantClaw\u002F",null,"TypeScript",116,1,0,11,42.5,"MIT License",false,"main",true,[22,23,24,25,26,27,28],"agents","claude","codex","harness","llm","openclaw","quantization","2026-06-12 04:00:16","\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Ffigs\u002Ffavicon.png\" alt=\"QuantClaw logo\" width=\"240\">\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">QuantClaw: Precision Where It Matters for OpenClaw\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\".\u002FREADME_zh.md\">中文文档\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n  \u003Cp>\n    \u003Ca href=\"https:\u002F\u002Fclawhub.ai\u002Fplugins\u002F%40sparkengineai%2Fquantclaw\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpenClaw-Plugin-0f172a\" alt=\"OpenClaw Plugin\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fsparkengineai.github.io\u002FQuantClaw\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-Live-0ea5e9\" alt=\"Blog\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.22577\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-f97316\" alt=\"Paper arXiv\">\u003C\u002Fa>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRouting-4bit%20%7C%208bit%20%7C%2016bit-2563eb\" alt=\"Routing tiers\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-16a34a\" alt=\"MIT License\">\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n![QuantClaw overview](.\u002Ffigs\u002Foverview.png)\n\nQuantClaw is a plug-and-play task-type routing quantization plugin for OpenClaw. It classifies each incoming request, maps it to a precision tier (`4bit`, `8bit`, or `16bit`), and routes the request to the right model target so you can balance quality, latency, and cost without asking users to choose precision manually.\n\n## 🔍 About QuantClaw\n\nQuantClaw is built from quantization studies on OpenClaw workloads rather than from fixed intuition. We evaluate quantized and high-precision models across 24 task types, 104 tasks, 6 models, and scales from 9B to 744B.\n\nResults on Claw-Eval (release v0.0.0):\n\n\u003Cdiv align=\"center\">\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth align=\"left\">Model\u003C\u002Fth>\n      \u003Cth align=\"center\">Params (B)\u003C\u002Fth>\n      \u003Cth align=\"center\">BF16 \u002F FP8\u003C\u002Fth>\n      \u003Cth align=\"center\">NVFP4\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>\u003Cstrong>GLM-4.7-Flash\u003C\u002Fstrong>\u003C\u002Ftd>\n      \u003Ctd align=\"center\">30\u003C\u002Ftd>\n      \u003Ctd align=\"center\">0.6370\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Cstrong>0.6034\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Cstrong>GLM-5\u003C\u002Fstrong>\u003C\u002Ftd>\n      \u003Ctd align=\"center\">744\u003C\u002Ftd>\n      \u003Ctd align=\"center\">0.7130\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Cstrong>0.7229\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Cstrong>MiniMax-M2.5\u003C\u002Fstrong>\u003C\u002Ftd>\n      \u003Ctd align=\"center\">229\u003C\u002Ftd>\n      \u003Ctd align=\"center\">0.6760\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Cstrong>0.6823\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Cstrong>Qwen3.5-9B\u003C\u002Fstrong>\u003C\u002Ftd>\n      \u003Ctd align=\"center\">9\u003C\u002Ftd>\n      \u003Ctd align=\"center\">0.4267\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Cstrong>0.4107\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Cstrong>Qwen3.5-35B-A3B\u003C\u002Fstrong>\u003C\u002Ftd>\n      \u003Ctd align=\"center\">35\u003C\u002Ftd>\n      \u003Ctd align=\"center\">0.6686\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Cstrong>0.6549\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Cstrong>Qwen3.5-397B-A17B\u003C\u002Fstrong>\u003C\u002Ftd>\n      \u003Ctd align=\"center\">397\u003C\u002Ftd>\n      \u003Ctd align=\"center\">0.7048\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Cstrong>0.6937\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\u003C\u002Fdiv>\n\n- High-sensitivity tasks such as coding, safety, and complex workflows benefit from higher precision.\n- Low-sensitivity tasks such as research, multimodal understanding, comprehension, knowledge lookup, office QA, and data analysis can often run well on lower precision.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Ffigs\u002Fsensitivity_chart.png\" alt=\"sensitivity chart\" width=\"600\">\n\u003C\u002Fp>\n\n## ✨ Key Features\n\n\u003Ctable align=\"center\">\n  \u003Ctr align=\"center\">\n    \u003Cth>\u003Cp align=\"center\"> Automatic Adaptation\u003C\u002Fp>\u003C\u002Fth>\n    \u003Cth>\u003Cp align=\"center\"> Intelligent Routing\u003C\u002Fp>\u003C\u002Fth>\n    \u003Cth>\u003Cp align=\"center\"> Full Customizability\u003C\u002Fp>\u003C\u002Fth>\n    \u003Cth>\u003Cp align=\"center\"> Built-in Observability\u003C\u002Fp>\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\u003Cp align=\"center\">\u003Cimg src=\"figs\u002FruleDetector.png\" width=\"400\" height=\"250\">\u003C\u002Fp>\u003C\u002Ftd>\n    \u003Ctd align=\"center\">\u003Cp align=\"center\">\u003Cimg src=\"figs\u002Fsession.png\" width=\"400\" height=\"250\">\u003C\u002Fp>\u003C\u002Ftd>\n    \u003Ctd align=\"center\">\u003Cp align=\"center\">\u003Cimg src=\"figs\u002Fconfig.png\" width=\"400\" height=\"250\">\u003C\u002Fp>\u003C\u002Ftd>\n    \u003Ctd align=\"center\">\u003Cp align=\"center\">\u003Cimg src=\"figs\u002Fdashboard.png\" width=\"400\" height=\"250\">\u003C\u002Fp>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">Rules first, then a judge model for requests.\u003C\u002Ftd>\n    \u003Ctd align=\"center\">Map each query to 4bit, 8bit, or 16bit targets.\u003C\u002Ftd>\n    \u003Ctd align=\"center\">Tune task types, patterns, targets, pricing, and backends.\u003C\u002Ftd>\n    \u003Ctd align=\"center\">Track routing, tokens, cost, sessions, and live config changes.\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## 🚀 Quick Start\n\n**Install**\n\n```bash\n# Prerequisite: OpenClaw is already installed.\n\n# Install from Clawhub (recommended)\nopenclaw plugins install clawhub:@sparkengineai\u002Fquantclaw\n\n# If OpenClaw is running from a source checkout and the CLI is not on PATH:\ncd \u002Fpath\u002Fto\u002Fopenclaw\nnode openclaw.mjs plugins install @sparkengineai\u002Fquantclaw\n\n# Or install from source\ngit clone https:\u002F\u002Fgithub.com\u002FSparkEngineAI\u002FQuantClaw-plugin.git .\u002Fquantclaw\nopenclaw plugins install .\u002Fquantclaw\n\n# If the OpenClaw CLI is not on PATH:\ncd \u002Fpath\u002Fto\u002Fopenclaw\nnode openclaw.mjs plugins install \u002Fpath\u002Fto\u002Fquantclaw\n```\n\n**Create or bootstrap the runtime config**\n\nQuantClaw reads its runtime config from:\n\n```text\n~\u002F.openclaw\u002Fquantclaw.json\n```\n\nIf the file does not exist, starting OpenClaw with the plugin enabled will generate a default `quantclaw.json`. If you are working from this repository directly, you can also start from the provided example:\n\n```bash\ncp config.example.json ~\u002F.openclaw\u002Fquantclaw.json\n```\n\n**Edit the detector chain and targets**\n\n```json\n{\n  \"quant\": {\n    \"enabled\": true,\n    \"detectors\": [\"ruleDetector\", \"loadModelDetector\"],\n    \"judge\": {\n      \"endpoint\": \"http:\u002F\u002F127.0.0.1:8000\",\n      \"model\": \"BAAI\u002Fbge-m3\",\n      \"providerType\": \"openai-compatible\",\n      \"apiKey\": \"\",\n      \"cacheTtlMs\": 300000\n    }\n  }\n}\n```\n\n**Start OpenClaw and open the dashboard**\n\n```text\nhttp:\u002F\u002F127.0.0.1:18789\u002Fplugins\u002Fquantclaw\u002Fstats\n```\n\n\n## ⚙️ Configuration Notes\n\nThe runtime schema supports:\n\n- ordered detectors: `ruleDetector`, `loadModelDetector`\n- per-task-type `id`, `description`, `precision`, `keywords`, and `patterns`\n- per-tier model targets with independent provider, model, endpoint, api key, and pricing\n- model-level pricing overrides for cost reporting\n- hot reload when `~\u002F.openclaw\u002Fquantclaw.json` changes\n\nExample `taskTypes` config:\n\n```json\n{\n  \"taskTypes\": [\n    {\n      \"id\": \"coding\",\n      \"precision\": \"16bit\",\n      \"description\": \"code review, bug analysis, implementation, debugging, kernels, async behavior, web development\",\n      \"keywords\": [\"code\", \"debug\", \"bug\", \"Python\", \"CUDA\", \"编程\", \"代码\"],\n      \"patterns\": [\n        \"fix the bug in this repository\",\n        \"(?=.*(?:refactor|重构))(?=.*(?:typescript|ts|node)).*\"\n      ]\n    }\n  ],\n  \"defaultTaskType\": \"standard\"\n}\n```\n\nExample `targets` config:\n\n```json\n{\n  \"targets\": {\n    \"4bit\": {\n      \"provider\": \"quantclaw-4bit\",\n      \"model\": \"glm-4.7-flash-int4-autoround\",\n      \"endpoint\": \"https:\u002F\u002Fapi.example.com\u002Fv1\",\n      \"apiKey\": \"${QC_4BIT_API_KEY}\",\n      \"displayName\": \"4-bit Target\",\n      \"pricing\": {\n        \"inputPer1M\": 0.051,\n        \"outputPer1M\": 0.34\n      }\n    },\n    \"16bit\": {\n      \"provider\": \"quantclaw-16bit\",\n      \"model\": \"glm-4.7-flash\",\n      \"endpoint\": \"https:\u002F\u002Fapi.openai.com\u002Fv1\",\n      \"apiKey\": \"${QC_16BIT_API_KEY}\",\n      \"displayName\": \"16-bit Target\",\n      \"pricing\": {\n        \"inputPer1M\": 0.06,\n        \"outputPer1M\": 0.4\n      }\n    }\n  }\n}\n```\n\nExample `modelPricing` overrides:\n\n```json\n{\n  \"modelPricing\": {\n    \"glm-4.7-flash\": {\n      \"inputPer1M\": 0.06,\n      \"outputPer1M\": 0.4\n    },\n    \"glm-4.7-flash-int4-autoround\": {\n      \"inputPer1M\": 0.051,\n      \"outputPer1M\": 0.34\n    }\n  }\n}\n```\n\nTarget-level `pricing` is used first for that precision tier. If it is absent, QuantClaw falls back to `modelPricing` for cost reporting.\n\n## 🧠 `loadModelDetector` Backends\n\n`loadModelDetector` supports either a local embedding-based router exposed through an OpenAI-compatible API or a regular OpenAI-compatible LLM judge.\n\nBuild a local embedding router index:\n\n```bash\npython router\u002Fembedding_task_router.py --model-name BAAI\u002Fbge-m3 --device cuda --config-path ~\u002F.openclaw\u002Fquantclaw.json --output-dir .\u002Fembedding_router_index-bge-m3 build --print-summary\n```\n\nServe that router as an OpenAI-compatible endpoint:\n\n```bash\npython router\u002Fembedding_task_router_server.py --model-name BAAI\u002Fbge-m3 --device cuda --output-dir .\u002Fembedding_router_index-bge-m3 --port 8012\n```\n\nIf your machine does not have a GPU, change `--device cuda` to `--device cpu`.\n\nIf you do not want to run the local embedding router, you can point `quant.judge.endpoint` at any OpenAI-compatible LLM endpoint instead.\n\n## 🙏 Acknowledgements\n\nWe especially acknowledge:\n\n- [Claw-Eval](https:\u002F\u002Fgithub.com\u002Fclaw-eval\u002Fclaw-eval)\n- [PinchBench](https:\u002F\u002Fgithub.com\u002Fpinchbench\u002Fskill)\n- [WildClawBench](https:\u002F\u002Fgithub.com\u002FInternLM\u002FWildClawBench)\n- [ClawXRouter](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FClawXRouter\u002Ftree\u002Fmain)\n\n## 👥 Core Contributors\n[Manyi Zhang](https:\u002F\u002Fopenreview.net\u002Fprofile?id=%7EManyi_Zhang2), [Ji-Fu Li*](https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Ji-Fu_Li1), [Zhongao Sun](https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Zhongao_Sun1), [Xiaohao Liu](https:\u002F\u002Fxiaohao-liu.github.io), [Zhenhua Dong](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=JeePtHEAAAAJ&hl=en), [Xianzhi Yu](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=tGnJRYQAAAAJ&hl=en), [Haoli Bai](https:\u002F\u002Fhaolibai.github.io\u002F) (Project Lead), [Xiaobo Xia](https:\u002F\u002Fxiaoboxia.github.io\u002F)\n\n*Follow SparkEngineAI on WeChat. We hope to share cutting-edge progress in AI Infra, light up stars in the AI field, and help everyone learn and draw inspiration.*\n\n\u003Cp align=\"left\">\n  \u003Cimg src=\".\u002Ffigs\u002FSparkEngineAI.jpg\" alt=\"SparkEngineAI official account\" width=\"240\">\n\u003C\u002Fp>\n\n## 📖 Citation\n\nIf QuantClaw helps your research, engineering work, or benchmark studies, please cite:\n\n```bibtex\n@article{zhang2026quantclaw,\n  title={QuantClaw: Precision Where It Matters for OpenClaw},\n  author={Zhang, Manyi and Li, Ji-Fu and Sun, Zhongao and Liu, Xiaohao and Dong, Zhenghua and Yu, Xianzhi and Bai, Haoli and Xia, Xiaobo},\n  journal={arXiv preprint arXiv:2604.22577},\n  year={2026}\n}\n```\n","QuantClaw 是一个即插即用的任务类型路由量化插件，专为 OpenClaw 设计。它能够自动分类每个传入请求，并将其映射到适当的精度级别（4位、8位或16位），然后将请求路由到正确的模型目标，从而在不需用户手动选择精度的情况下平衡质量、延迟和成本。该项目基于对OpenClaw工作负载的量化研究开发而成，通过评估不同任务类型下的量化与高精度模型表现来优化性能。适用于需要高效处理多样化任务且对计算资源敏感的场景，如代码生成、安全性检查及复杂工作流程管理等。",2,"2026-06-11 02:51:51","CREATED_QUERY"]