[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11156":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":31,"discoverSource":32},11156,"CodexSaver","fendouai\u002FCodexSaver","fendouai","Make Codex cheaper without making it dumber with DeepSeek.","",null,"Python",590,43,12,1,0,2,22,169,6,8.93,false,"main",true,[26,27],"codex","deepseek","2026-06-12 02:02:29","# CodexSaver\n\n> Make Codex cheaper without making it dumber.\n\n\u003Cp align=\"center\">\n  \u003Ca href=\".\u002FREADME_zh.md\">\u003Cstrong>中文文档\u003C\u002Fstrong>\u003C\u002Fa>\n\u003C\u002Fp>\n\n![CodexSaver](.\u002FCodexSaver.png)\n\nCodexSaver is an MCP tool that turns Codex into a cost-aware router.\nIt pushes low-risk development work to a cheaper worker LLM, keeps high-risk\njudgment in Codex, and returns enough interaction detail that you can feel when\nthe tool is active.\n\n- Lower-cost execution for tests, docs, search, and explanation work\n- Codex stays responsible for architecture, security, protected domains, and final review\n- Global-by-default Codex install, so every workspace can use the same MCP tool\n- DeepSeek by default, with presets for OpenAI, Anthropic, Gemini, Qwen, Ollama, LM Studio, and more\n- One-time local provider setup in `~\u002F.codexsaver\u002Fconfig.json`\n- Optional worker-output compression for shorter, review-friendly delegated results\n- Verified with tests, real DeepSeek calls, and end-to-end MCP launcher checks\n\n---\n\n## At A Glance\n\nCodexSaver is not trying to replace Codex.\nIt is trying to **shrink the amount of expensive Codex work** without losing engineering judgment.\n\nCurrent repo-local evidence:\n\n| Dimension | What CodexSaver already proves |\n|---|---|\n| Cost | v2 benchmark reached `45%` to `100%` estimated savings on 5 bounded tasks |\n| Speed | v2 5-task run completed successful tasks in `0.03s` to `14.95s`; v3 readonly swarm succeeded in `6.45s` |\n| Quality | v2 bounded work packets passed verifier gates; v3 readonly swarm produced `10` findings with `0.75` quality score |\n| Safety | protected paths, allowlisted commands, sandboxed patch apply, and Codex fallback are built in |\n\nThe important nuance:\n\n- v2 is the mature lane for single bounded tasks\n- v3 is the emerging lane for orchestrated specialists\n- the first clearly established v3 win is **readonly specialist orchestration**\n\n---\n\n## Where CodexSaver Wins\n\nCodexSaver is strongest in work that is:\n\n- low-risk\n- repetitive\n- easy to verify\n- parallelizable\n- expensive for Codex but cheap for a smaller worker\n\nIn practice, that means CodexSaver is best at:\n\n- code explanation\n- repository scanning\n- performance hinting\n- docstrings and README maintenance\n- bounded test generation\n- small bounded refactors with explicit file scope\n\nCodexSaver is not strongest at:\n\n- auth, security, payment, permissions\n- destructive migrations\n- ambiguous architecture decisions\n- multi-file behavioral changes with weak verification\n- anything that still needs Codex-level judgment at every step\n\nThe current product thesis is simple:\n\n```text\nCodexSaver wins first on readonly specialist orchestration.\nCodexSaver wins second on bounded, verifiable patch work.\nCodex remains the judge for everything risky or unclear.\n```\n\nThat ordering matters. It matches the current implementation and the current benchmark data.\n\n---\n\n## Why It Works\n\nCodexSaver improves cost, speed, and quality through a very specific technical split:\n\n### 1. Lower Cost\n\nCodex is used for judgment, not repetitive throughput work.\nCheap worker models handle:\n\n- explanation\n- docs\n- tests\n- small bounded implementation tasks\n\nThat means the expensive model is no longer paying for every routine step.\n\n### 2. Higher Speed\n\nWhen the task is decomposable, CodexSaver can parallelize specialist work:\n\n- one specialist explains\n- one specialist reviews performance\n- one specialist writes docs or tests\n\nFor these tasks, total latency starts to look like:\n\n```text\nmax(single specialist runtime) + orchestration overhead\n```\n\ninstead of:\n\n```text\nsum(all subtask runtimes)\n```\n\n### 3. Better Quality\n\nCodexSaver does not trust worker output blindly.\nIt improves output quality with hard boundaries:\n\n- router decides whether the task is safe enough\n- work packet limits the write scope\n- sandbox applies patch in isolation\n- verifier checks changed files, diff size, commands, and failures\n- Codex still reviews the result\n\nThat is why CodexSaver can be cheaper **without** becoming a \"YOLO auto-edit bot.\"\n\n---\n\n## Why This Exists\n\nMost coding sessions contain two very different kinds of work:\n\n- expensive thinking\n- cheap execution\n\nCodex is excellent at the first one. It is overqualified for much of the second.\n\nCodexSaver splits the flow on purpose:\n\n- `Codex` handles reasoning, ambiguity, protected domains, and approval\n- a configured worker provider handles low-risk throughput work\n\nThat gives you a practical pattern:\n\n```text\nUse the expensive model for judgment.\nUse the cheaper model for volume.\nNever confuse the two.\n```\n\n---\n\n## What It Feels Like\n\nWhen CodexSaver is active, tool responses are not silent blobs of JSON.\nThey include an `interaction` block that makes the routing decision visible:\n\n```json\n{\n  \"interaction\": {\n    \"tool\": \"codexsaver.delegate_task\",\n    \"mode\": \"delegated_execution\",\n    \"headline\": \"CodexSaver delegated this task to the configured worker provider.\",\n    \"route_label\": \"[CodexSaver] route=deepseek task_type=write_tests risk=low\",\n    \"next_step\": \"Review the worker result and apply it only if the patch looks safe.\"\n  }\n}\n```\n\nThree states matter:\n\n- `preview`: routing preview only, no external model call\n- `delegated_execution`: delegated run completed\n- `codex_takeover`: task stayed with Codex because risk was too high or the task was ambiguous\n\nWhen worker-output compression is enabled, the same `interaction` block includes\nthe active compression level so Codex can see why delegated replies are terser.\n\n---\n\n## V2: Bounded Work Packets\n\nCodexSaver v2 adds a stricter delegation lane for work that should be safe but\nstill deserves proof. Instead of asking the worker to \"just do the task\", Codex\nhands it a bounded work packet:\n\n- exact goal\n- allowed files or globs\n- forbidden paths\n- acceptance criteria\n- allowlisted commands\n- maximum iterations and diff size\n\nThe worker can propose patches, but CodexSaver applies them only inside a\ntemporary sandbox. The patch is accepted only when it stays within policy and\nthe allowlisted checks pass. If the task is already satisfied, v2 returns a\n`preflight_satisfied=true` result without spending a worker model call.\n\nCLI example:\n\n```bash\ncodexsaver work-packet \\\n  \"Create docs\u002Fv2-smoke.md with one sentence.\" \\\n  --files README.md \\\n  --allowed-file docs\u002Fv2-smoke.md \\\n  --acceptance \"docs\u002Fv2-smoke.md exists in sandbox\" \\\n  --allowed-command \"python -c \\\"from pathlib import Path; assert Path('docs\u002Fv2-smoke.md').exists()\\\"\" \\\n  --workspace .\n```\n\nMCP tool:\n\n```text\ncodexsaver.delegate_work_packet\n```\n\n---\n\n## V3: Orchestrated Specialists\n\nCodexSaver v3 extends v2 from a single bounded worker into a small orchestrated specialist system.\nThe important shift is architectural:\n\n- Codex still owns judgment and final review\n- CodexSaver plans a work graph\n- readonly specialists can run in parallel\n- bounded patch specialists reuse the v2 sandbox + verifier path\n- patch aggregation is conservative and falls back to Codex on overlap\n\nCurrent v3 status in this repo:\n\n- `explainer` and `perf_reviewer` can execute as a real `readonly_swarm`\n- mixed graphs can execute bounded patch nodes through the v2 work-packet runtime\n- overlapping `changed_files` across the same patch batch return `needs_codex`\n- v3.4 adds action-level risk policy, partial handoff, and DeepSeek participation metrics\n- v3 is implemented as a CodexSaver-owned orchestration layer, not as a fragile Codex-native subagent config dependency\n\nPrimary references:\n\n- [v3 spec](.\u002Fdocs\u002FSPEC_v3.md)\n- [v3 task list](.\u002Fdocs\u002FV3_TASKS.md)\n- [v3 benchmark, 2026-05-14](.\u002Fdocs\u002Fbenchmarks\u002Fv3-benchmark-2026-05-14.md)\n- [v3 project benchmark, 2026-05-15](.\u002Fdocs\u002Fbenchmarks\u002Fv3-project-benchmark-2026-05-15.md)\n- [v3.4 SWE-style benchmark, 2026-05-17](.\u002Fdocs\u002Fbenchmarks\u002Fv34-swe-benchmark-2026-05-17.md)\n\nCurrent benchmark status:\n\n- `readonly_swarm`: exercised, but still saw real-provider fallback in the 2026-05-14 fixture run\n- `impl + tests`: exercised, but still conservative and may return `needs_codex`\n- `impl + docs + explain`: completed successfully in the 2026-05-14 fixture run\n\nThis means v3 is already real and testable, but still in an honest early stage rather than a full replacement for every v2 workflow.\n\n### V3.4: Action-Level Delegation And Handoff\n\nv3.4 changes the router from \"does this task contain a risky word?\" to \"which actions inside this task are safe to delegate?\"\n\nExamples:\n\n- `schema + readonly inspection` can go to DeepSeek\n- `schema + dry-run validation plan` can go to DeepSeek\n- `schema + execute migration` stays in Codex\n- `database + destructive rebuild` is split into safe prep nodes plus blocked Codex-only actions\n\nThis is what lets DeepSeek carry more of the work without crossing the line into writes, migrations, secrets, auth, payment, or deployment execution. CodexSaver now returns a `handoff` object with delegated work done, blocked actions, and Codex next actions, so Codex can continue smoothly instead of starting over.\n\n### Core Selling Point: Readonly Specialist Orchestration Works\n\nThe most important v3 claim is no longer theoretical. In the project benchmark run on\n2026-05-15, the readonly specialist lane succeeded on the current CodexSaver codebase:\n\n- Task: `Explain installer flow and review performance`\n- Route: `deepseek`\n- Status: `success`\n- Savings: `52%`\n- Latency: `6.45s`\n- Quality score: `0.75`\n- Readonly findings: `10`\n\nThat is the current v3 core value:\n\n- Codex delegates explanation and performance analysis cheaply\n- specialists run in parallel\n- no patch is required\n- verification remains strict\n- Codex still reviews the result\n\nThis is the first domain where v3 is clearly better than \"just ask Codex to do everything.\"\n\n### What The 5-Task Project Benchmark Says\n\nThe project benchmark ran 5 typical tasks against a temporary copy of the current repository:\n\n- 2 \u002F 5 tasks succeeded in v3\n- both successful tasks were in CodexSaver's strongest current domains\n- one success was pure readonly orchestration\n- one success was a docs + explain mixed flow\n- 3 \u002F 5 patch-heavy tasks conservatively returned `needs_codex`\n\nInterpretation:\n\n- readonly specialist orchestration is already a real product advantage\n- bounded patch orchestration is promising but still less mature than the readonly lane\n- test-writer aggregation and patch verification are the main remaining bottlenecks\n\nIf you want the shortest honest description of v3 today, it is this:\n\n```text\nReadonly orchestration is established.\nSingle bounded patches are usable.\nComplex patch orchestration is still maturing.\n```\n\nCLI examples:\n\n```bash\ncodexsaver orchestrate \"Explain config loader logic and review performance\" --files codexsaver\u002Fconfig.py\ncodexsaver orchestrate \"Implement login and add tests\" --files src\u002Fuser_auth.py --dry-run\ncodexsaver specialist explainer \"Explain this module\" --files codexsaver\u002Fconfig.py\n```\n\nOptional project guidance install:\n\n```bash\n# low-intrusion: only add a managed CodexSaver block to AGENTS.md\ncodexsaver superpower install --profile basic --workspace .\n\n# more invasive: also add .codex\u002Fhooks.json, a prompt hook script, and local codex_hooks enablement\ncodexsaver superpower install --profile full --workspace .\n```\n\nProfile guidance:\n\n- `basic`: safest default, project-local AGENTS guidance only\n- `full`: AGENTS guidance + optional hook scaffolding + local `.codex\u002Fconfig.toml` feature flag\n\nThe goal is to bias Codex toward CodexSaver for low-risk work without silently mutating global config.\n\n---\n\n## Benchmarks\n\nCodexSaver now has two benchmark stories, and both matter:\n\n### v2: Mature Single-Task Lane\n\nReference:\n\n- [v2 benchmark, 2026-05-12](.\u002Fdocs\u002Fbenchmarks\u002Fv2-benchmark-2026-05-12.md)\n\nHeadline result:\n\n- `5 \u002F 5` successful bounded tasks\n- successful tasks landed at `45%` estimated savings\n- one already-satisfied task returned `preflight` with `100%` savings and `0.03s` latency\n\nThis is the current strongest production-ready lane:\n\n- bounded docs\n- bounded tests\n- small single-target implementation\n\n### v3: Real Project-Oriented Orchestration\n\nReference:\n\n- [v3 project benchmark, 2026-05-15](.\u002Fdocs\u002Fbenchmarks\u002Fv3-project-benchmark-2026-05-15.md)\n\nHeadline result on the current CodexSaver repository:\n\n- `2 \u002F 5` tasks succeeded\n- both successful tasks were in CodexSaver's strongest current domains\n- the cleanest success was `readonly_swarm`\n- patch-heavy orchestration still falls back conservatively\n\n### v3.4: SWE-Style Participation Benchmark\n\nReference:\n\n- [v3.4 SWE-style benchmark, 2026-05-17](.\u002Fdocs\u002Fbenchmarks\u002Fv34-swe-benchmark-2026-05-17.md)\n\nHeadline result on six local SWE-style tasks:\n\n- average DeepSeek participation reached `55.7%`\n- `5 \u002F 6` tasks reached at least `50%` DeepSeek participation\n- `2 \u002F 6` tasks completed successfully end-to-end\n- fallback tasks still preserved partial worker output through handoff\n\nSummary table:\n\n| Lane | Best current use | Status |\n|---|---|---|\n| v2 | single bounded patch tasks | mature |\n| v3 readonly | explain + scan + perf hint specialists | established |\n| v3.4 action-level orchestration | safe prep, dry-run planning, partial handoff | established enough to exceed 50% worker participation |\n| v3 patch orchestration | docs\u002Ftests\u002Fimpl mixed graphs | promising but still maturing |\n\nIf you are evaluating CodexSaver today, the right mental model is:\n\n- use v2 when you want reliable bounded implementation\n- use v3 when you want Codex to cheaply orchestrate readonly specialists\n- use v3.4 when a larger SWE task contains safe prep work plus blocked high-risk actions\n- treat multi-patch v3 graphs as an advancing frontier, not solved magic\n\n---\n\n## Quick Start\n\n### Recommended Global Install\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffendouai\u002FCodexSaver\ncd CodexSaver\n\npython -m pip install -e .\ncodexsaver auth set --provider deepseek --api-key YOUR_API_KEY\ncodexsaver install\ncodexsaver doctor --workspace .\n```\n\nThat is it. `codexsaver install` writes a global Codex MCP entry to\n`~\u002F.codex\u002Fconfig.toml` and points it at a stable launcher:\n`~\u002F.codexsaver\u002Fcodexsaver_mcp.py`.\n\nAfter that, every Codex workspace can call:\n\n```text\ncodexsaver.delegate_task\n```\n\nUse `--project` only when you want a repository-local `.codex\u002Fconfig.toml`:\n\n```bash\ncodexsaver install --project\n```\n\n### Provider Setup\n\nDeepSeek is the default because it is inexpensive and exposes an OpenAI-compatible API.\nSwitching providers is just one flag:\n\n```bash\ncodexsaver auth set --provider openai --api-key YOUR_API_KEY --model gpt-4o-mini\ncodexsaver auth set --provider anthropic --api-key YOUR_API_KEY --model claude-3-5-haiku-latest\ncodexsaver auth set --provider gemini --api-key YOUR_API_KEY --model gemini-2.0-flash\ncodexsaver auth set --provider qwen --api-key YOUR_API_KEY --model qwen-plus\ncodexsaver auth set --provider opencode-go --api-key YOUR_API_KEY --model deepseek-v4-flash\n```\n\nOpenCode Go uses `https:\u002F\u002Fopencode.ai\u002Fzen\u002Fgo\u002Fv1\u002Fchat\u002Fcompletions` and is useful\nwhen you want CodexSaver's worker lane to run through its low-cost DeepSeek V4\nFlash or Pro models. The default preset uses `deepseek-v4-flash`; switch to\n`deepseek-v4-pro` if you want the stronger Go model.\n\nFor local models:\n\n```bash\ncodexsaver auth set --provider ollama --model llama3.1\ncodexsaver auth set --provider lmstudio --model local-model\n```\n\nFor any custom OpenAI-compatible endpoint:\n\n```bash\ncodexsaver auth set \\\n  --provider custom \\\n  --api-key YOUR_API_KEY \\\n  --base-url https:\u002F\u002Fexample.com\u002Fv1\u002Fchat\u002Fcompletions \\\n  --model your-model\n```\n\nSee built-in presets:\n\n```bash\ncodexsaver auth providers\n```\n\n### Worker Output Compression\n\nCompression only affects delegated worker calls. It does not change Codex's own\nfinal answer. It is useful when you want cheaper workers to return shorter,\nmore reviewable summaries, findings, or patch notes.\n\n```bash\ncodexsaver compression show\ncodexsaver compression set --enabled true --level full\ncodexsaver compression set --enabled false\n```\n\nLevels:\n\n- `lite`: concise, keeps technical terms and exact details\n- `full`: compressed fragments, no greetings or filler, preserves code and errors\n- `ultra`: telegraphic, essential facts and identifiers only\n- `wenyan`: terse classical Chinese style for Chinese workflows\n\nDefault is disabled. The setting is persisted in `~\u002F.codexsaver\u002Fconfig.json`.\n\nIf you prefer a temporary one-shell-session setup instead of saving the key locally:\n\n```bash\nexport CODEXSAVER_PROVIDER=deepseek\nexport CODEXSAVER_API_KEY=YOUR_API_KEY\ncodexsaver install\ncodexsaver doctor --workspace .\n```\n\n### One Message To Codex\n\nIf Codex is already open in this repository, you can just say:\n\n```text\nSave my worker provider API key for CodexSaver, run `codexsaver auth set --provider deepseek --api-key ...`, then run `codexsaver install` and `codexsaver doctor --workspace .`, and tell me whether it is ready.\n```\n\nFor repo-local setup:\n\n```text\nSave my worker provider API key for CodexSaver, install CodexSaver only for this repo, run `codexsaver auth set --provider deepseek --api-key ...`, `codexsaver install --project`, then `codexsaver doctor --workspace .`, and summarize the result.\n```\n\nReady means:\n\n- `~\u002F.codex\u002Fconfig.toml` contains the global `codexsaver` MCP server, or `.codex\u002Fconfig.toml` exists in the repo\n- `~\u002F.codexsaver\u002Fcodexsaver_mcp.py` exists for global installs\n- provider settings are available from env vars or `~\u002F.codexsaver\u002Fconfig.json`\n- compression settings are available from `~\u002F.codexsaver\u002Fconfig.json`\n- `codexsaver doctor --workspace .` reports `CodexSaver is ready`\n\n---\n\n## 60-Second Demo\n\nGlobal MCP config created by `codexsaver install`:\n\n```toml\n[mcp_servers.codexsaver]\ncommand = \"python\"\nargs = [\"\u002FUsers\u002Fyou\u002F.codexsaver\u002Fcodexsaver_mcp.py\"]\nstartup_timeout_sec = 10\ntool_timeout_sec = 120\n```\n\nThen tell Codex:\n\n```text\nUse CodexSaver for safe low-risk tasks.\nAdd unit tests for user service.\n```\n\nOr call the CLI directly:\n\n```bash\ncodexsaver delegate \"Explain the routing logic briefly\" --files codexsaver\u002Frouter.py --workspace .\n```\n\nDry run:\n\n```bash\ncodexsaver delegate \"add unit tests for user service\" --files src\u002Fuser\u002Fservice.ts --workspace . --dry-run\n```\n\nReal run:\n\n```bash\ncodexsaver delegate \"add unit tests for user service\" --files src\u002Fuser\u002Fservice.ts --workspace .\n```\n\n---\n\n## Verified V2 Setup Flow\n\nMeasured on May 12, 2026 with the editable install, global launcher, and local-key workflow:\n\n| Check | Command | Result |\n|---|---|---|\n| Editable install | `python -m pip install -e .` | installed `codexsaver-0.2.0` |\n| Full test suite | `PYTHONDONTWRITEBYTECODE=1 python -m pytest -q -p no:cacheprovider` | `97 passed in 0.41s` |\n| Global install | `codexsaver install --workspace .` | global config points at `~\u002F.codexsaver\u002Fcodexsaver_mcp.py` |\n| Local provider persistence | `codexsaver auth set --provider deepseek --api-key ...` | saved to `~\u002F.codexsaver\u002Fconfig.json` |\n| Compression config | `codexsaver compression set --enabled true --level full` | saved to `~\u002F.codexsaver\u002Fconfig.json` |\n| Workspace doctor | `codexsaver doctor --workspace .` | `provider_api_key_source=local_config:deepseek`, workspace ready |\n| Global launcher check | `python ~\u002F.codexsaver\u002Fcodexsaver_mcp.py` with MCP `initialize` | returned `serverInfo.version=0.2.0` |\n| V2 MCP tool check | MCP `tools\u002Flist` | includes `delegate_work_packet` |\n| V2 preflight check | MCP `tools\u002Fcall delegate_work_packet` | returned `preflight_satisfied=true` |\n\nThis is the intended workflow:\n\n1. Save the key once\n2. Install the editable package and global launcher\n3. Confirm readiness with `doctor`\n4. Restart\u002Freload any already-open MCP process if it was started before installation\n5. Use real delegated calls without re-exporting API keys\n\nIf an already-open Codex window was using an older MCP process, stop or reload\nthat MCP server. The global launcher is the source of truth for v2 and returns\n`serverInfo.version=0.2.0`.\n\n---\n\n## Provider Matrix\n\nBuilt-in presets cover the common hosted and local routes:\n\n| Provider | Style | Default model | API key |\n|---|---|---|---|\n| `deepseek` | OpenAI-compatible | `deepseek-chat` | required |\n| `openai` | OpenAI | `gpt-4o-mini` | required |\n| `anthropic` | native Messages API | `claude-3-5-haiku-latest` | required |\n| `opencode-go` | OpenAI-compatible | `deepseek-v4-flash` | required |\n| `gemini` | OpenAI-compatible endpoint | `gemini-2.0-flash` | required |\n| `qwen` | OpenAI-compatible endpoint | `qwen-plus` | required |\n| `ollama` | local OpenAI-compatible endpoint | `llama3.1` | not required |\n| `lmstudio` | local OpenAI-compatible endpoint | `local-model` | not required |\n\nRun `codexsaver auth providers` for the complete list.\n\n---\n\n## Post-Setup Usage Ratio\n\nAfter setup completed, I measured the actual routed tasks in this working session.\nI only counted tasks that truly entered model routing, not local commands like `pytest`,\n`git`, `install`, `doctor`, or README editing.\n\nResult:\n\n- `DeepSeek`: `7 \u002F 8 = 87.5%`\n- `Codex`: `1 \u002F 8 = 12.5%`\n\nWhy not 100%?\n\nOne test-writing prompt originally included the phrase `production logic`.\nThat triggered the router's intentional high-risk keyword guard and returned the task to Codex.\nThis was not a failure. It was the protection logic working as designed.\n\nIf you only count the later standardized five-task benchmark with natural low-risk phrasing,\nthe delegation ratio was:\n\n- `DeepSeek`: `5 \u002F 5 = 100%`\n- `Codex`: `0 \u002F 5 = 0%`\n\nTakeaway:\n\n- In real usage, CodexSaver defaulted to DeepSeek for most low-risk work\n- It still preserved a strict fallback path for risky wording and protected domains\n\n---\n\n## Five-Task A\u002FB Benchmark\n\nLatest v2 reports:\n\n- [v2 restart confirmation, 2026-05-12](.\u002Fdocs\u002Fbenchmarks\u002Fv2-restart-confirmation-2026-05-12.md)\n- [v2 benchmark, 2026-05-12](.\u002Fdocs\u002Fbenchmarks\u002Fv2-benchmark-2026-05-12.md)\n\nThe May 12 run was performed after stopping the older in-memory MCP process and\nverifying the global launcher returned `serverInfo.version=0.2.0`.\n\nMethod:\n\n- **A** = counterfactual `Codex-only` baseline with normalized cost index fixed at `1.00`\n- **B** = `CodexSaver` mode with the live router and DeepSeek worker\n- latency is wall-clock time for the real CodexSaver execution\n- savings come from the current `CostEstimator`, so this is a reproducible routing benchmark, not invoice-grade billing data\n\nV2 bounded work-packet summary:\n\n- `5 \u002F 5` tasks succeeded\n- `4 \u002F 5` used the DeepSeek worker path\n- `1 \u002F 5` used v2 preflight because the task was already satisfied\n- average normalized cost index was `0.44`\n- average estimated savings were `56%`\n\nSummary:\n\n- All 5 tasks were typical low-risk development chores: explanation, docs, tests, and README maintenance\n- All 5 delegated successfully after using natural low-risk phrasing\n- Average live latency was `6.18s`\n- Average estimated savings were `48.4%`\n- Average normalized cost moved from `1.00` to `0.52`\n- Estimated relative reduction was `48.0%`\n\n| Task | Type | Route | Latency | A: Codex-only Cost Index | B: CodexSaver Cost Index | Estimated Savings | Output Shape |\n|---|---|---|---:|---:|---:|---:|---|\n| Explain router logic | `explain` | `deepseek` | `2.13s` | `1.00` | `0.55` | `45%` | read-only summary |\n| Document router module | `docs` | `deepseek` | `3.13s` | `1.00` | `0.55` | `45%` | 1-file patch |\n| Add cost tests | `write_tests` | `deepseek` | `9.29s` | `1.00` | `0.55` | `45%` | test patch |\n| Explain verifier flow | `explain` | `deepseek` | `2.30s` | `1.00` | `0.55` | `45%` | read-only summary |\n| Update install docs | `docs` | `deepseek` | `14.06s` | `1.00` | `0.38` | `62%` | README patch |\n\n![Five-task benchmark](.\u002Fassets\u002Fab-test-benchmark.svg)\n\nFigure:\nGray bars are the `Codex-only` baseline fixed at `100`.\nGreen bars are the `CodexSaver` cost index for the same task.\nLower bars mean lower estimated Codex spend.\n\nInterpretation:\n\n- Read-only explain tasks were the fastest, cleanest wins\n- Small docs edits delegated well and returned compact, reviewable patches\n- Test generation had higher latency than explanation, but still stayed in the low-risk savings band\n- Larger-context documentation work produced the biggest estimated savings because the Codex-only context cost would be higher\n\n---\n\n## Routing Rules\n\n### Good Tasks To Delegate\n\n- repo scanning and code search\n- code explanation and summarization\n- writing unit tests\n- fixing lint or type errors\n- documentation updates\n- boilerplate generation\n- small localized refactors\n\n### Tasks Kept In Codex\n\n- architecture decisions\n- auth, security, payment, billing, or permissions logic\n- database migrations\n- deployment and production operations\n- ambiguous product requests\n- final review before applying changes\n\n### Why Some Medium-Risk Tasks Still Delegate\n\nCodexSaver does not just ask:\n\n```text\nIs this code work?\n```\n\nIt asks:\n\n```text\nIs this code work cheap enough to delegate without losing judgment quality?\n```\n\nThat creates a deliberate asymmetry:\n\n- read-only understanding can be cheap\n- writes in sensitive domains are expensive in risk even if the diff is small\n- ambiguity defaults to Codex, not delegation\n\nThat is why `Explain auth code` may still delegate while `Refactor auth service` stays in Codex.\n\n---\n\n## How It Works\n\n```text\nUser\n  ↓\nCodex\n  ↓ MCP tool call\nCodexSaver\n  ├─ Router\n  ├─ Context Packer\n  ├─ Worker LLM Provider\n  ├─ Verifier\n  └─ Cost Estimator\n  ↓\nCodex review \u002F apply \u002F finalize\n```\n\nCore modules:\n\n- `Router`: classify tasks and assign risk\n- `ContextPacker`: bound file context before delegation\n- `ProviderClient`: call the configured worker model\n- `Verifier`: validate output shape, protected paths, and suggested commands\n- `CostEstimator`: estimate relative savings bands\n- `WorkPacketRuntime`: apply worker patches in a sandbox and run allowlisted checks\n\n---\n\n## Security And Persistence\n\n- `codexsaver auth set --provider ... --api-key ...` saves provider settings to `~\u002F.codexsaver\u002Fconfig.json`\n- `codexsaver compression set ...` saves optional worker-output compression in the same local config\n- the config file is written with local-user-only permissions\n- `doctor` shows whether the key comes from the environment or local config, and only prints a masked preview\n- live calls use local config automatically if no env key is exported\n- if verification fails, CodexSaver falls back to `needs_codex`\n\n---\n\n## Troubleshooting\n\n### Windows TOML Unicode Escape Error\n\nIf Codex shows an error like this after installation:\n\n```text\nfailed to read configuration layers ...\\.codex\\config.toml:21:14:\ntoo few unicode value digits, expected unicode hexadecimal value\n```\n\nthe Codex config contains an unescaped Windows path such as:\n\n```toml\nargs = [\"C:\\Users\\admin\\.codexsaver\\codexsaver_mcp.py\"]\n```\n\nTOML treats `\\U` as the start of a unicode escape. Fix it by upgrading to the\nlatest CodexSaver and reinstalling:\n\n```bash\npython -m pip install -e .\ncodexsaver install\ncodexsaver doctor --workspace .\n```\n\nOr repair the file manually by escaping backslashes:\n\n```toml\nargs = [\"C:\\\\Users\\\\admin\\\\.codexsaver\\\\codexsaver_mcp.py\"]\n```\n\nForward slashes also work on Windows:\n\n```toml\nargs = [\"C:\u002FUsers\u002Fadmin\u002F.codexsaver\u002Fcodexsaver_mcp.py\"]\n```\n\n---\n\n## Commands\n\n```bash\ncodexsaver auth providers\ncodexsaver auth set --provider deepseek --api-key YOUR_API_KEY\ncodexsaver compression show\ncodexsaver compression set --enabled true --level full\ncodexsaver install\ncodexsaver install --project\ncodexsaver doctor --workspace .\ncodexsaver delegate \"Explain the routing logic briefly\" --files codexsaver\u002Frouter.py --workspace .\ncodexsaver work-packet \"Create docs\u002Fexample.md with one sentence.\" --files README.md --allowed-file docs\u002Fexample.md --workspace .\ncodexsaver orchestrate \"Explain config loader logic and review performance\" --files codexsaver\u002Fconfig.py\ncodexsaver specialist explainer \"Explain this module\" --files codexsaver\u002Fconfig.py\n```\n\n---\n\n## Roadmap\n\n- [x] MCP server\n- [x] rule-based routing\n- [x] bounded context packing\n- [x] DeepSeek default worker integration\n- [x] multi-provider OpenAI-compatible worker support\n- [x] local API key persistence\n- [x] worker output compression toggles and provider prompt injection\n- [x] interaction-aware tool responses\n- [x] end-to-end verification flow\n- [x] v2 bounded work packets with sandboxed patch verification\n- [x] v2 preflight for already-satisfied work packets\n- [x] v3 readonly specialist orchestration\n- [x] v3 bounded patch nodes via the v2 sandbox runtime\n- [x] v3 conflict fallback on overlapping patch outputs\n- [ ] v3 node-level ownership enforcement\n- [ ] v3 durable ledger and adaptive routing\n\n---\n\n## If This Saves You Money\n\nStar the repo.\n","CodexSaver是一个旨在通过DeepSeek技术降低Codex使用成本而不牺牲其智能水平的多模型协作平台。该项目的核心功能是将低风险、重复性高的开发任务分配给成本更低的语言模型处理，同时保持高风险决策由Codex负责，从而在保证工程质量的前提下显著减少开支。它支持多种语言模型供应商如OpenAI、Anthropic等，并且提供了一次性的本地配置选项以及可选的结果压缩功能来优化输出。CodexSaver特别适用于代码解释、文档维护、性能提示生成及有限范围内的测试用例编写等场景，但对于涉及安全敏感或复杂架构变更的任务，则仍需依赖Codex进行最终审核。","2026-06-11 03:31:16","CREATED_QUERY"]