[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80842":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},80842,"telos-sdk","learningCatHD\u002Ftelos-sdk","learningCatHD","TELOS SDK: a cache-aware prompt protocol and gateway for portable agent context.","",null,"Python",360,21,11,1,0,101,167,325,303,94.03,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:30","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"assets\u002Flogo.svg\" alt=\"TELOS — Portable Agent Context\" width=\"460\"\u002F>\n\n### Context is yours &nbsp;·&nbsp; Agents are hired\n\n**No rewrite. No compression. 90% token billing saving.**\n\n\u003Csub>One canonical IR — tools, system, turns, and memory — runs unchanged across Anthropic · OpenAI · DeepSeek · vLLM · SGLang\u003Cbr\u002F>Real 6-turn session −92.3% · Cost reported in absolute $\u002Fquery-resolved — ratios can be gamed; dollars can't\u003C\u002Fsub>\n\n\u003Csub>LEAP Lab @ Tsinghua University — a research group focused on machine learning, multimodal learning, and embodied intelligence · \u003Ca href=\"https:\u002F\u002Fwww.leaplab.ai\u002F\">leaplab.ai\u003C\u002Fa>\u003C\u002Fsub>\n\n\u003Cbr\u002F>\n\n[![Core](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcore-Apache%202.0-2C5F66?style=flat-square)](LICENSE)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10%2B-4FB3BF?style=flat-square)](pyproject.toml)\n[![Status](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstatus-Beta-d8851f?style=flat-square)](CHANGELOG.md)\n[![Protocol](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fprotocol-TELOS%20IR-7FD8E0?style=flat-square)](docs\u002F2026-05-06-telos-protocol.md)\n\n[**Quickstart**](#quickstart) · [**Support Matrix**](#support-matrix) · [**Why**](#why-telos) · [**Benchmark**](#benchmark) · [**Protocol**](#protocol) · [**Roadmap**](#roadmap) · [**Citation**](#citation)\n\n\u003Csub>📖 &nbsp;**English** · [Simplified Chinese](README.zh-CN.md)\u003C\u002Fsub>\n\n\u003C\u002Fdiv>\n\n---\n\n\u003Ca id=\"problem\">\u003C\u002Fa>\n\n## ⬢ &nbsp;2 a.m. — where did all the money go?\n\n2 a.m., agent still running. The counter in the bottom-right corner climbs to 2,847,103 — you convert it to dollars and your stomach drops. Worse: the line above reads `cache_read: 0`. All night long, every turn fed the same 4,000-token system prompt **from scratch** to the model, billed at full price.\n\nTake the exact same **6-turn** real conversation, drop it into openclaw, flip two switches:\n\n| Mode | raw input tokens | cache_read | Cost for 6 turns |\n|---|:--:|:--:|:--:|\n| passthrough (today's default) | 24,151 | 0 | **$0.3623** |\n| with TELOS | 0 | 18,701 | **$0.0281 (−92.3%)** |\n\nScale to 1,000 sessions: **$362 → $26**. In a controlled A\u002FB\u002FC\u002FD run (`showcase\u002Fdashboard.html`, 2026-05-19) — 48 calls across 4 sessions, counterfactual bill **$5.90**, actual **$3.74** — net saved **$2.16 (−36.6%)**. One dev machine, one afternoon. Multiply by team scale, and that's a real server invoice every month.\n\n**Stop measuring in \"X× fewer tokens.\"** In 2026, the pricing gap between tiers of the same model family already spans **80×–150×**. Anyone can inflate a ratio by stuffing the cheapest tier in the denominator — absolute dollars are the only number that doesn't lie.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002F01-waste.en.svg\" alt=\"Today's agent token efficiency is only 25%\" width=\"100%\"\u002F>\n\u003C\u002Fp>\n\n\u003Ca id=\"quickstart\">\u003C\u002Fa>\n\n## ⬢ &nbsp;3-step to save 90%\n\n#### ❶ &nbsp;Install\n\n```bash\n# Linux \u002F macOS \u002F WSL2 \u002F Android (Termux)\ncurl -fsSL https:\u002F\u002Fraw.githubusercontent.com\u002FlearningCatHD\u002Ftelos-sdk\u002Fmain\u002Fscripts\u002Finstall.sh | bash\n```\n\n\u003Csub>Prefer pip? &nbsp;`pip install -U telos-sdk`\u003C\u002Fsub>\n\n#### ❷ &nbsp;Connect\n\n```bash\ntelos init\n```\n\nAuto-detects **claude-code \u002F codex \u002F openclaw \u002F hermes** on this machine, injects config into each, and starts the local gateway in the background (state written to `~\u002F.telos\u002Fgateway.json`). No changes to your agent code.\n\n#### ❸ &nbsp;Observe\n\n```bash\ntelos dashboard\n```\n\nOpens an offline HTML dashboard in your browser showing savings per call in absolute dollars. Every invocation is automatically appended to `~\u002F.telos\u002Fusage.jsonl` and aggregated in real time.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002F05-dashboard.png\" alt=\"TELOS savings dashboard — absolute dollars broken down by harness \u002F model \u002F session\" width=\"100%\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Csub>\u003Cstrong>Every saving pinned to an absolute dollar figure\u003C\u002Fstrong> · No cloud server required · Opens offline · \u003Ccode>~\u002F.telos\u002Fusage.jsonl\u003C\u002Fcode> fed directly into a single-file HTML page\u003C\u002Fsub>\u003C\u002Fp>\n\n\n**TELOS is open source. Run it on your own workflow — see whether that 92% is real, or just another \"X× tokens\" claim.**\n\n---\n\n\u003Ca id=\"support-matrix\">\u003C\u002Fa>\n\n## ⬢ &nbsp;Support Matrix\n\n### Harness support\n\n| Harness | Typical usage | `telos init` auto-connect | Status |\n|---|---|:---:|:---:|\n| Claude Code | Anthropic-native coding agent workflow | ✅ | 🟢 First-class |\n| OpenClaw | Open-source agent runtime with TELOS parser integration | ✅ | 🟢 First-class |\n| Hermes | Multi-agent orchestration with independent sub-IR handling | ✅ | 🟢 First-class |\n| Codex | OpenAI-style coding workflow via local gateway injection | ✅ | 🟢 Supported |\n\n### Frontier model support\n\n| Model family | Provider | Through TELOS engine adapter | Notes |\n|---|---|:---:|---|\n| Claude (4.x \u002F 4.6+) | Anthropic | ✅ | Explicit breakpoints and prewarm path |\n| GPT (4+\u002F5.x) | OpenAI | ✅ | Uses `prompt_cache_key` routing strategy |\n| DeepSeek (V3+) | DeepSeek | ✅ | Deterministic byte-stable prefix behavior |\n\n### Inference framework support\n\n| Framework | Deployment style | Through TELOS | Cache-aware capabilities |\n|---|---|:---:|---|\n| vLLM | Self-hosted OpenAI-compatible serving | ✅ | Explicit anchors, prewarm, cache probe\u002Fevict, partial fork-and-replace |\n| SGLang | Self-hosted high-throughput serving | ✅ | Explicit anchors, prewarm, cache probe\u002Fevict, full fork-and-replace |\n\n\u003Csub>Need another harness or model backend? TELOS is adapter-driven: keep the same IR and add an engine\u002Fharness adapter without rewriting your agent logic.\u003C\u002Fsub>\n\n---\n\n\u003Ca id=\"why-telos\">\u003C\u002Fa>\n\n## ⬢ &nbsp;TELOS solves exactly two things\n\n**① Push token efficiency to the limit.** 6-turn real session **−92.3%**; controlled 48-call run **−36.6% (net −$2.16)**. Every cent accounted for in absolute $\u002Fquery-resolved — ratios can be faked; dollars can't.\n\n**② Return context sovereignty to you.** `TelosIR` is an engine-agnostic, serializable, portable context representation. Your persona, your tools, your 20-turn mid-session thread — everything packed into the same **stone tablet**. Hand it to Claude today, move it to DeepSeek tomorrow, run it on a local vLLM tonight. **Your context; agents are just hired help.**\n\n---\n\n\u003Ca id=\"benchmark\">\u003C\u002Fa>\n\n## ⬢ &nbsp;SWE-bench Verified — TELOS does not regress task correctness\n\nToken savings are only useful if the agent still solves the problem. We ran a pre-registered A\u002FB on **SWE-bench Verified** with the Hermes harness and `deepseek\u002Fdeepseek-v4-flash` — 100 instances per arm, seeded sample across 8 repos (sphinx, matplotlib, xarray, pytest, requests, pylint, seaborn, flask). **99 instances per arm were graded under the official Docker harness** (1 instance excluded due to a missing per-instance docker image upstream).\n\n#### Resolved rate (docker-graded, n=99\u002Farm, paired)\n\n| Arm | Resolved | Rate | 95% Wilson CI |\n|---|---:|---:|---|\n| **TELOS** | 45 \u002F 99 | **45.5%** | [36.0%, 55.2%] |\n| Vanilla | 42 \u002F 99 | 42.4% | [33.2%, 52.3%] |\n\nPaired 2×2 on the same 99 instances: both resolved 33; TELOS-only 12; vanilla-only 9; neither 45. Exact McNemar two-sided **p = 0.66** — the +3 pp absolute gap is **not statistically significant**, i.e. TELOS does not regress resolved rate at this sample size.\n\n#### Token efficiency (agent-side, n=99\u002Farm, same instances)\n\n| Per-task | TELOS | Vanilla | Δ |\n|---|---:|---:|---:|\n| **new_input** (post-cache, billed) | 93,712 | 198,706 | **−52.8%** |\n| prompt_tokens (raw + cache) | 352,400 | 515,953 | −31.7% |\n| output_tokens | 24,975 | 25,218 | −1.0% |\n| api_calls | 32.6 | 32.1 | +1.4% |\n| **cache_share** | **73.4%** | 61.5% | **+11.9 pp** |\n| reported cost (USD) | $2.29 | $3.85 | **−40.5%** |\n\n**Read this honestly.** The 99-instance subset gives a Wilson CI of roughly ±10 pp on each arm. This run can rule out an absolute regression worse than ~6 pp at 95% confidence (the lower bound of the paired difference), but cannot pin Δ to ±2 pp. What it shows with high confidence is the **input-token bill is roughly halved, and end-to-end cost drops ~40%, at the same correctness band**. A larger run (n ≥ 400\u002Farm) is on the roadmap to tighten the resolved-rate confidence interval further.\n\n\u003Csub>Raw outputs: agent runs in [`\u002Ftmp\u002Ftelos-ab-n100\u002F{with,without}\u002F`](\u002Ftmp\u002Ftelos-ab-n100\u002F), docker-graded reports in [`\u002Ftmp\u002Ftelos-ab-n100\u002Fdocker-eval\u002F`](\u002Ftmp\u002Ftelos-ab-n100\u002Fdocker-eval\u002F). Reproduce: `scripts\u002Frun_swebench_batch.py -n 100 --seed 7`. Full technical report (pre-registered design, statistical detail, related work): [docs\u002F2026-05-26-swebench-ab.md](docs\u002F2026-05-26-swebench-ab.md).\u003C\u002Fsub>\n\n---\n\n\u003Ca id=\"protocol\">\u003C\u002Fa>\n\n## ⬢ &nbsp;The protocol: not compression, but never breaking the prefix\n\nMost agent frameworks treat KV-cache as a runtime gift the inference engine may or may not give you. TELOS inverts this:\n\n> **Cache reuse is a structural property of the prompt itself, not a matter of runtime luck. If you never touch bytes already submitted, the cache *cannot* be invalidated.**\n\nThat principle materializes in three interlocking ideas.\n\n### Three-color bands\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002F03-banding.en.svg\" alt=\"PIN \u002F FOLD \u002F DROP bands\" width=\"100%\"\u002F>\n\u003C\u002Fp>\n\nEvery content block declares its cache lifetime **at birth** — not post-hoc heuristics, not LLM guessing, but first-class structural annotation:\n\n| Band | Color | Semantics | Cache behavior |\n|---|:---:|---|---|\n| **PIN** | 🟢 | Tool defs · system prompt · current question | Permanent. Never evicted. The immutable base of every request's prefix hash |\n| **FOLD** | 🟡 | Conversation history · tool results · large docs | Cacheable, compactable. Under pressure, replaced by a summary — PIN prefix bytes stay untouched |\n| **DROP** | 🔴 | Timestamps · CWD · git status · PIDs | Ephemeral. **Excluded entirely from the prefix hash.** Must follow all BPs; never contaminates upstream bytes |\n\nThe ordering invariant is absolute: **PIN\\* → FOLD\\* → DROP\\*** — within each message, across the full prompt, at every layer. This is the **only** structural rule that wins the cache — everything else is implementation detail.\n\n### Monotonic append\n\nThe prompt is an **append-only stream**. New turns only add blocks to the tail — **no mutation of already-submitted bytes**. A \"modification\" is expressed as a new block (summary, redaction), never an in-place rewrite.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002F04-append.en.svg\" alt=\"Monotonic append: cache hit rate is monotonically non-decreasing with session length\" width=\"100%\"\u002F>\n\u003C\u002Fp>\n\nBecause earlier blocks are immutable and bytes are identical across turns, the inference engine's prefix-matching algorithm finds the longest common prefix on **every** request — not by luck, but by construction. **Cache hit rate is therefore a monotonically non-decreasing function of session length: longer sessions, more reuse, never regression.**\n\n---\n\n\u003Ca id=\"roadmap\">\u003C\u002Fa>\n\n## ⬢ &nbsp;Roadmap\n\nTELOS makes exactly one claim: **context is yours, agents are hired.** The current roadmap stays entirely within the *cost-saving gateway* narrative, with the seed of *trajectory as a portable asset* planted only in the last phase. **Anything that can be checked off goes on the roadmap; anything that can't, doesn't.**\n\n| Phase | Thesis |\n|---|---|\n| **Phase 1** · Protocol correctness hardening | Turn \"cache cannot be invalidated\" from a slogan into a CI red\u002Fgreen light |\n| **Phase 2** · Production reliability & observability | Make the gateway safe to leave on someone else's prod traffic |\n| **Phase 3** · Take over the call chain | Go from prompt rewriter to the agent's traffic plane |\n| **Phase 4** · Context becomes an asset | Trajectories are no longer logs — they're forkable code |\n\n---\n\n\u003Ca id=\"citation\">\u003C\u002Fa>\n\n## Citation\n\nCore contributors: Zheng Wang, Shenzhi Wang, Yue Wu, Shiji Song, Gao Huang\n\n```\n@misc{wang2026telos-agent,\n  title        = {Telos: A Cost-Aware Inference Infrastructure for AI Agent},\n  author       = {Zheng Wang, Shenzhi Wang, HongTao Zhong, Shiji Song, Gao Huang},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FlearningCatHD\u002Ftelos-sdk.git}},\n  year         = {2026}\n}\n```\n\n---\n\n\u003Cdiv align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FlearningCatHD\u002Ftelos-sdk\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F⭐%20Star%20on%20GitHub-learningCatHD%2Ftelos--sdk-1F4A50?style=for-the-badge&logo=github&logoColor=white\" alt=\"Star on GitHub\"\u002F>\u003C\u002Fa>\n","TELOS SDK 是一个缓存感知的提示协议和网关，用于可移植代理上下文管理。它通过避免重复计算相同的系统提示来大幅减少令牌计费，据称能够节省高达90%的成本。该SDK支持跨多种AI模型（如Anthropic、OpenAI等）运行统一的中间表示(IR)，确保在不同平台上的一致性与高效性。适用于需要频繁调用语言模型且对成本敏感的应用场景，比如大规模对话系统或基于AI的服务。采用Python开发，当前处于Beta阶段，并遵循Apache License 2.0许可协议。",2,"2026-06-11 04:02:31","CREATED_QUERY"]