[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81348":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":11,"contributorsCount":11,"subscribersCount":11,"size":11,"stars1d":11,"stars7d":11,"stars30d":11,"stars90d":11,"forks30d":11,"starsTrendScore":11,"compositeScore":11,"rankGlobal":8,"rankLanguage":8,"license":13,"archived":14,"fork":14,"defaultBranch":15,"hasWiki":16,"hasPages":14,"topics":17,"createdAt":8,"pushedAt":8,"updatedAt":18,"readmeContent":19,"aiSummary":20,"trendingCount":11,"starSnapshotCount":11,"syncStatus":21,"lastSyncTime":22,"discoverSource":23},81348,"tokenward","cryptosaras\u002Ftokenward","cryptosaras",null,"TypeScript",4,0,41,"MIT License",false,"main",true,[],"2026-06-12 02:04:14","# tokenwarden\n\n**ccusage tells you what you spent. tokenwarden stops you from overspending.**\n\n> **In a hurry?** See [QUICKSTART.md](QUICKSTART.md) — install + use in 30 seconds.\n\ntokenwarden is the only tool in the Claude Code ecosystem that **acts** on cost rather than reporting it. Every other tool in this space — ccusage, Claude-Code-Usage-Monitor, cc-budget — tells you what happened or warns you what might happen. tokenwarden hooks directly into Claude Code's event system and enforces hard budgets, intercepts expensive calls, caps argument bloat, and coaches compaction. It is designed to be complementary to ccusage: **track with ccusage, enforce with tokenwarden.**\n\n![tokenwarden blocking an expensive Opus call](docs\u002Fdemo.gif)\n\u003C!-- TODO: record demo.gif before launch -->\n\n---\n\n## Quick start\n\n```bash\nnpx tokenwarden install\n```\n\nThis writes hook entries into your Claude Code `settings.json` (user scope by default; use `--project` to install into the project scope instead) and points them at the installed binary. The install is idempotent (safe to re-run) and fully reversible.\n\nBy default tokenwarden starts in **observe mode**: it logs every intervention it *would* make without actually blocking anything. This lets you see what it would enforce for a full session before it starts refusing calls.\n\nTo switch to enforcement:\n\n```bash\n# In your project directory\nnpx tokenwarden init   # creates .tokenwarden.json with your settings\n# edit .tokenwarden.json: set \"mode\": \"enforce\"\n```\n\nOr flip it globally:\n\n```bash\necho '{\"mode\":\"enforce\"}' > ~\u002F.tokenwarden.json\n```\n\n---\n\n## Features\n\n### 1. Hard budget gate\n\nPer-session, per-day, per-project, and per-model **USD caps** enforced via `PreToolUse` and `UserPromptSubmit` hooks. When cumulative spend reaches a cap, the hook returns a block decision and the reason is shown to the model and user. (v0.1 enforces dollar caps; the reliable live signal is `cost.total_cost_usd`, not a cumulative token count.)\n\n```\ntokenwarden: session budget reached — spent $8.50 of $5.00. Run \u002Fcompact, start a\nfresh session, or set TOKENWARDEN_FORCE=1 to override.\n```\n\nSubagent spend (`isSidechain:true`) counts toward the parent session\u002Fproject budget — this is the configuration that addresses the runaway fan-out scenario.\n\n### 2. Compaction coach\n\nAt 80% context usage, `UserPromptSubmit` injects an `additionalContext` nudge prompting the user to run `\u002Fcompact`. The `PreCompact` hook can veto wasteful auto-compaction that fires below 50% context fill.\n\n**What tokenwarden does not do:** it cannot trigger `\u002Fcompact` programmatically. Hooks have no mechanism to issue commands. The compaction coach nudges; the human runs `\u002Fcompact`.\n\n### 3. Expensive-call escalation\n\nWhen an expensive model (Opus by default) is about to make a **genuinely expensive call** — chiefly a subagent spawn (`Task`\u002F`Agent`), the documented runaway-cost vector — `PreToolUse` returns an `ask` decision so the call needs your confirmation. The reason is calibrated to a dollar estimate:\n\n```\ntokenwarden: Est. $0.60 on claude-opus-4-7 for a subagent spawn — genuinely\nexpensive on this model. Switch to a cheaper model, narrow the request, or set\nTOKENWARDEN_FORCE=1 to override.\n```\n\nCalls whose estimate falls below `escalation.trivialThresholdUsd` pass **silently** — ordinary Reads, Edits, and scoped searches never prompt, even on Opus. Friction is proportional to cost (and an oversized Read is silently capped by the bloat rule below, not escalated).\n\n**What tokenwarden does not do:** it cannot silently downgrade the model. The model field is not reliably available in hook input, and hooks have no model-selection control. Tokenwarden escalates to a human decision instead.\n\n### 4. Argument-bloat prevention\n\n`PreToolUse` uses `modifiedInput` to rewrite tool arguments before they execute:\n\n- `Read.limit` capped to 2000 lines (prevents dumping entire large files into context)\n- `Grep` and `Glob` refused (`ask`) when called without path\u002Fscope constraints\n- `Bash` timeout capped to 600 000 ms to prevent runaway background commands\n\n**What tokenwarden does not do:** it cannot prune or modify existing conversation history. It can only prevent future bloat at the point of the tool call, before output enters the context window.\n\n### 5. Subscription usage windows (Pro \u002F Max 5x \u002F Max 20x)\n\nIf you're on a flat-fee plan, dollars aren't your constraint — you don't pay per token. You throttle against **usage windows**: a rolling **5-hour** window and a rolling **7-day** window. tokenwarden tracks your consumption against those windows so you can see how close you are to a throttle, and (in `enforce` mode) gate before you burn the rest of your quota on low-value work.\n\nSet your plan and the windows turn on:\n\n```json\n{ \"subscription\": { \"plan\": \"max5x\" } }\n```\n\n```\ntokenwarden: max5x 5-hour usage window — ~$24.10 API-equivalent used of an est.\n$25.00 ceiling. On a flat-fee plan this is throttle risk, not a bill (ceiling is\nan estimate). Slow down, switch to a cheaper model, start a fresh session, or set\nTOKENWARDEN_FORCE=1 to override.\n```\n\nThere's no dollar budget here — the figure is **API-equivalent**: tokenwarden prices your token consumption at public API rates (the same number Claude Code's statusline reports for subscription accounts) and measures it within each rolling window. `tokenwarden status` shows both windows when a plan is set.\n\n**Two honest caveats, stated plainly:**\n\n- **The ceilings are estimates.** Anthropic does not publish hard 5-hour\u002Fweekly numbers, and they drift. The built-in per-plan ceilings are deliberately conservative starting points — **calibrate them to your own throttling** by setting `subscription.fiveHour.usd` \u002F `subscription.weekly.usd` once you learn where you actually hit the wall.\n- **It's a local lower-bound.** tokenwarden only counts windows it observed on this machine via its statusline capture. It cannot see usage from other machines, claude.ai, or sessions before you installed it. Treat it as a floor, not a precise meter.\n\n`plan: null` (the default) keeps this off — API \u002F pay-as-you-go users want the dollar `budgets` above instead.\n\n### Cost-telemetry layer\n\nTokenwarden reads the same `~\u002F.claude\u002Fprojects\u002F**\u002F*.jsonl` transcripts that ccusage reads, deduplicated by `requestId`. It also captures the live statusline JSON Claude Code emits (the more reliable real-time cost signal). All dollar figures are **estimates**, always labeled as such.\n\n---\n\n## How it works\n\nTokenwarden registers handlers for these Claude Code hook events:\n\n| Event | What tokenwarden does |\n|---|---|\n| `SessionStart` | Captures the active model id; initializes the session ledger entry |\n| `UserPromptSubmit` | Checks budget caps before the prompt is processed; injects compaction-coach nudge at 80% context |\n| `PreToolUse` | Enforces budget gate, escalation, and argument-bloat caps; returns `modifiedInput` to rewrite args |\n| `PreCompact` | Vetoes auto-compaction that fires below the wasteful threshold |\n| `Stop` | Records a session-finalize bookkeeping event (never blocks) |\n| `SessionEnd` | Reconciles final session totals against the ledger |\n\n**Operational guarantees:**\n\n- **Fail-open:** any internal error in a hook exits 0 (proceed). Tokenwarden never blocks your work because of its own bug.\n- **Hot-path performance:** hooks complete in well under 100 ms in practice (~50 ms measured cold-start from a single bundled file), comfortably inside the \u003C250 ms budget. A hard 2 s self-timeout guarantees a hung hook still fails open. The ledger is a small local JSON file read in O(1) — no JSONL re-scanning on each hook call. Note this is per tool call, so a turn with many tool calls adds roughly `calls × ~50 ms` of overhead (e.g. ~1 s across 20 calls) — small relative to model latency, but real.\n- **Observe-only first session:** the default `mode: \"observe\"` logs what would be blocked without blocking it. You opt into `mode: \"enforce\"` after seeing the logs.\n- **Idempotent and reversible install:** `tokenwarden install` does not duplicate hook entries; `tokenwarden uninstall` removes exactly what it added.\n\n### Config precedence\n\nProject `.tokenwarden.json` overrides user `~\u002F.tokenwarden.json`, which overrides built-in defaults. All three layers merge; missing keys fall back to the defaults.\n\n---\n\n## Config\n\nCreate `.tokenwarden.json` in your project (or `~\u002F.tokenwarden.json` for user-wide settings). All fields are optional and fall back to defaults.\n\n```json\n{\n  \"mode\": \"observe\",\n  \"budgets\": {\n    \"session\": { \"usd\": 5 },\n    \"daily\": { \"usd\": 25 },\n    \"project\": { \"usd\": null },\n    \"perModel\": {\n      \"opus\": { \"usd\": 10 }\n    }\n  },\n  \"subscription\": {\n    \"plan\": null,\n    \"fiveHour\": { \"usd\": null },\n    \"weekly\": { \"usd\": null }\n  },\n  \"compaction\": {\n    \"coachAtPercent\": 80,\n    \"vetoAutoBelowPercent\": 50\n  },\n  \"escalation\": {\n    \"enabled\": true,\n    \"expensiveModels\": [\"opus\"],\n    \"trivialThresholdUsd\": 0.5\n  },\n  \"bloat\": {\n    \"readMaxLines\": 2000,\n    \"refuseUnscopedSearch\": true,\n    \"bashMaxTimeoutMs\": 600000\n  },\n  \"pricingOverrides\": {}\n}\n```\n\n**Field reference:**\n\n| Field | Default | Description |\n|---|---|---|\n| `mode` | `\"observe\"` | `\"observe\"` logs interventions; `\"enforce\"` blocks them |\n| `budgets.session.usd` | `5` | Hard USD cap per Claude Code session |\n| `budgets.daily.usd` | `25` | Hard USD cap per calendar day |\n| `budgets.project.usd` | `null` | Hard USD cap for this project directory |\n| `budgets.perModel` | `{}` | Per-model caps keyed by model id substring |\n| `subscription.plan` | `null` | Flat-fee plan: `\"pro\"`, `\"max5x\"`, `\"max20x\"`, `\"custom\"`, or `null` (off) |\n| `subscription.fiveHour.usd` | `null` | Rolling 5-hour ceiling (API-equiv USD); `null` → built-in estimate for the plan |\n| `subscription.weekly.usd` | `null` | Rolling 7-day ceiling (API-equiv USD); `null` → built-in estimate for the plan |\n| `compaction.coachAtPercent` | `80` | Inject \u002Fcompact nudge at this context-fill % |\n| `compaction.vetoAutoBelowPercent` | `50` | Block auto-compaction that fires below this % |\n| `escalation.expensiveModels` | `[\"opus\"]` | Model id substrings triggering escalation |\n| `escalation.trivialThresholdUsd` | `0.5` | Calls below this pass silently |\n| `bloat.readMaxLines` | `2000` | Cap Read.limit via modifiedInput |\n| `bloat.refuseUnscopedSearch` | `true` | Refuse Grep\u002FGlob with no path constraint |\n| `bloat.bashMaxTimeoutMs` | `600000` | Cap Bash timeout |\n\n> Subagent (`isSidechain`) spend always counts toward the parent session\u002Fproject budget — this is inherent to the live `cost.total_cost_usd` signal and is what makes the runaway fan-out scenario enforceable.\n\n> The `subscription` ceilings are **estimates** — Anthropic publishes no hard 5-hour\u002Fweekly numbers. Leave `usd` as `null` to use the conservative built-in per-plan defaults, then set explicit values once you learn where you actually throttle. See [Subscription usage windows](#5-subscription-usage-windows-pro--max-5x--max-20x).\n\n---\n\n## Commands\n\n```\nnpx tokenwarden \u003Ccommand>\n```\n\n| Command | What it does |\n|---|---|\n| `install` | Writes hook entries into the chosen settings.json scope; idempotent |\n| `uninstall` | Removes exactly the hook entries tokenwarden added; no other changes |\n| `status` | Shows mode, active budgets, current session\u002Fday spend, and (if a plan is set) subscription usage windows |\n| `report` | Prints a spend summary parsed from JSONL transcripts (same data source as ccusage) |\n| `inspect` | Shows the raw ledger state and recent enforcement events |\n| `doctor` | Checks that hook entries are wired correctly and the binary is reachable |\n| `init` | Scaffolds a `.tokenwarden.json` in the current directory with the defaults |\n\n---\n\n## Data accuracy and estimates\n\nAll cost figures tokenwarden shows are **estimates**. Two reasons:\n\n**JSONL is a known lower bound.** ccusage issue #866 documents that Claude Code writes token counts from early streaming events and never updates them to finals. Input tokens can be undercounted by 100–174x, output by 10–17x in affected cases. Tokenwarden reads the same JSONL, deduplicated by `requestId`, and inherits this limitation. JSONL is used for retrospective reporting and trend signals.\n\n**Statusline cost is the reliable live signal.** Claude Code pipes a live JSON object to the configured statusline command including `cost.total_cost_usd` and context-window metrics. Tokenwarden uses this for real-time enforcement caps. This is more reliable than JSONL for hard-cap decisions.\n\n**Pricing is a versioned snapshot.** The pricing table was verified against platform.claude.com on 2026-05-21. Rates can change; tokenwarden shows \"pricing updated YYYY-MM-DD\" in the report output. You can override per-model rates via `pricingOverrides` in your config (useful for negotiated enterprise rates).\n\n---\n\n## Benchmarks\n\n> **Status: harness shipped, numbers pending.** tokenwarden ships the reproducible\n> measurement harness in [`benchmarks\u002F`](benchmarks\u002F) — but it does **not** yet\n> publish verified savings figures. We will not quote a savings percentage until\n> real runs back it (the same discipline that makes a flat \"60–80%\" claim\n> indefensible). `benchmarks\u002Fresults\u002F` is populated by running the harness; it is\n> intentionally empty in the repo until verified runs land before launch.\n\nThe harness measures token spend **with and without** tokenwarden across a fixed\nsuite of representative tasks (bugfix, feature-add, refactor, subagent fan-out,\nMCP-heavy, long session) at N ≥ 6 runs each, same model and prompts, pinned\nClaude Code version, measured from the JSONL logs (the same source ccusage reads),\nreported as **median + range per scenario** with raw logs published.\n\n**Targets we expect the data to support (to be validated, not yet measured):**\n\n- 30–50% lower token spend on typical sessions\n- up to ~70% on MCP-heavy \u002F long-running agentic sessions, where argument-bloat\n  prevention has the largest effect\n- the qualitative win that needs no benchmark: **a hard cap stops a runaway\n  unattended session before it costs $1,000s**\n\nSee [`benchmarks\u002FREADME.md`](benchmarks\u002FREADME.md) for the methodology and how to\nreproduce.\n\n---\n\n## Comparison\n\n| Tool | What it does | Acts on cost? |\n|---|---|---|\n| [ccusage](https:\u002F\u002Fgithub.com\u002Fryoppippi\u002Fccusage) | Reads JSONL; reports daily\u002Fweekly\u002Fsession spend | No — read-only reporting |\n| Claude-Code-Usage-Monitor | Live burn-rate display and spend predictions | No — warns, does not block |\n| cc-budget | Budget alerts | No — alerts only, no blocking |\n| Anthropic Console spend limits | Hard server-side org\u002Fworkspace caps | Org-level only — no per-session, per-tool, or per-model control |\n| **tokenwarden** | **Enforces per-session\u002Fday\u002Fproject\u002Fmodel caps via hooks** | **Yes — blocks, rewrites, and escalates** |\n\nTokenwarden is **complementary to ccusage**, not a replacement. Use ccusage for historical reporting and dashboards; use tokenwarden to enforce the budgets those reports reveal you need.\n\n---\n\n## Why now\n\nThe sharpest version of Claude Code cost pain in 2026 is not \"it's a bit expensive\" — it is **runaway unattended agent spend**. A few verified examples:\n\n- Uber burned its entire 2026 AI budget in approximately four months, primarily on Claude Code.\n- Indie developer Jenny Ouyang woke up to an unexpected $1,600 bill from invisible context accumulation.\n- Anthropic's own docs baseline individual developer costs at roughly $13\u002Factive day and $150–250\u002Fmonth — before subagent fan-out.\n\nThe tools that exist today tell you what you spent after the fact. Tokenwarden is the first tool in this niche that refuses a call before it runs.\n\n---\n\n## Requirements\n\n- Node.js >= 18\n- Claude Code (any recent version)\n- No other runtime dependencies\n\n## License\n\nMIT. See [LICENSE](LICENSE).\n\n\u003C!-- ![Star History](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=OWNER\u002Ftokenwarden&type=Date) -->\n\n---\n\n**GitHub topics:** `claude-code` `claude-usage` `claude-code-hooks` `cost-optimization` `cli` `ai-agents`\n","tokenwarden 是一个用于控制 Claude Code 生态系统中成本的工具。它通过直接介入事件系统来执行硬性预算限制、拦截昂贵调用、限制参数膨胀并指导压缩，从而防止过度消费。项目采用 TypeScript 编写，并以 MIT License 发布。适用于需要严格控制 AI 模型使用成本的企业和个人开发者场景，特别是当与 ccusage 等成本跟踪工具结合使用时，能够形成从监测到执行的完整闭环。",2,"2026-06-11 04:04:43","CREATED_QUERY"]