[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74138":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":22,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":43,"readmeContent":44,"aiSummary":45,"trendingCount":16,"starSnapshotCount":16,"syncStatus":46,"lastSyncTime":47,"discoverSource":48},74138,"claw-compactor","open-compress\u002Fclaw-compactor","open-compress","14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.","https:\u002F\u002Fwww.opencompress.ai\u002F",null,"Python",2197,210,133,9,0,1,28.97,"MIT License",false,"main",true,[24,25,26,5,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42],"ai-agent-tools","ai-infrastructure","ast-code-analysis","context-compression","context-pruning","context-window-optimization","developer-tools","fusion-pipeline","llm-compression","llm-context-compression","llm-cost-reduction","llm-token-compression","llm-tools","openclaw","prompt-compression","python","reversible-compression","token-compression","tree-sitter","2026-06-12 02:03:22","\u003C!--\n\u003Cscript type=\"application\u002Fld+json\">\n{\n  \"@context\": \"https:\u002F\u002Fschema.org\",\n  \"@type\": \"SoftwareApplication\",\n  \"name\": \"Claw Compactor\",\n  \"description\": \"14-stage Fusion Pipeline for LLM token compression with reversible compression, AST-aware code analysis, and intelligent content routing\",\n  \"applicationCategory\": \"DeveloperApplication\",\n  \"operatingSystem\": \"Cross-platform\",\n  \"softwareVersion\": \"7.0.0\",\n  \"license\": \"https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT\",\n  \"url\": \"https:\u002F\u002Fgithub.com\u002Fopen-compress\u002Fclaw-compactor\",\n  \"downloadUrl\": \"https:\u002F\u002Fgithub.com\u002Fopen-compress\u002Fclaw-compactor\",\n  \"author\": {\n    \"@type\": \"Organization\",\n    \"name\": \"OpenClaw\",\n    \"url\": \"https:\u002F\u002Fopenclaw.ai\"\n  },\n  \"offers\": {\n    \"@type\": \"Offer\",\n    \"price\": \"0\",\n    \"priceCurrency\": \"USD\"\n  },\n  \"keywords\": \"token compression, LLM, AI agent, fusion pipeline, reversible compression, AST code analysis, context window optimization\"\n}\n\u003C\u002Fscript>\n-->\n\n\u003Cdiv align=\"center\">\n\n# Claw Compactor\n\n### 14-Stage Fusion Pipeline for LLM Token Compression\n\n![Claw Compactor Banner](assets\u002Fbanner.png)\n\n[![CI](https:\u002F\u002Fgithub.com\u002Fopen-compress\u002Fclaw-compactor\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fopen-compress\u002Fclaw-compactor\u002Factions)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopen-compress\u002Fclaw-compactor\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopen-compress\u002Fclaw-compactor)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.9%2B-blue)](https:\u002F\u002Fpython.org)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-purple)](LICENSE)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fclaw-compactor?color=blue&label=PyPI)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fclaw-compactor\u002F)\n[![Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fclaw-compactor?color=green&label=downloads)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fclaw-compactor\u002F)\n[![Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopen-compress\u002Fclaw-compactor?style=social)](https:\u002F\u002Fgithub.com\u002Fopen-compress\u002Fclaw-compactor)\n\n**15–82% compression depending on content &middot; Zero LLM inference cost &middot; Reversible &middot; 1600+ tests**\n\n[Documentation](https:\u002F\u002Fopen-compress.github.io\u002Fclaw-compactor) &middot; [Architecture](ARCHITECTURE.md) &middot; [Benchmarks](#benchmarks) &middot; [Quick Start](#quick-start) &middot; [API](#api)\n\n\u003C\u002Fdiv>\n\n---\n\n## What is Claw Compactor?\n\nClaw Compactor is an open-source **LLM token compression engine** built around a 14-stage **Fusion Pipeline**. Each stage is a specialized compressor — from AST-aware code analysis to JSON statistical sampling to simhash-based deduplication — chained through an immutable data flow architecture where each stage's output feeds the next.\n\n### Demo\n\n```\n$ claw-compactor benchmark .\u002Fmy-workspace\n\n  Claw Compactor v7.0 — Fusion Pipeline Benchmark\n  ─────────────────────────────────────────────────\n\n  Scanning workspace... 47 files, 234,891 tokens\n\n  Stage Results:\n  ┌──────────────────┬──────────┬───────────┬──────────┐\n  │ Stage            │ Applied  │ Reduction │ Time     │\n  ├──────────────────┼──────────┼───────────┼──────────┤\n  │ Cortex           │ 47\u002F47    │ —         │ 12ms     │\n  │ Photon           │ 3\u002F47     │ 2.1%      │ 4ms      │\n  │ RLE              │ 41\u002F47    │ 8.3%      │ 6ms      │\n  │ SemanticDedup    │ 47\u002F47    │ 12.7%     │ 18ms     │\n  │ Ionizer          │ 8\u002F47     │ 71.2%     │ 9ms      │\n  │ Neurosyntax      │ 23\u002F47    │ 18.4%     │ 31ms     │\n  │ TokenOpt         │ 47\u002F47    │ 4.1%      │ 3ms      │\n  │ Abbrev           │ 12\u002F47    │ 6.8%      │ 5ms      │\n  └──────────────────┴──────────┴───────────┴──────────┘\n\n  Summary:\n    Before:  234,891 tokens ($2.35 at GPT-4 rates)\n    After:   108,250 tokens ($1.08)\n    Saved:   126,641 tokens (53.9%) — $1.27\u002Frun\n    Time:    88ms total\n\n  Estimated monthly savings at 100 runs\u002Fday: $3,810\n```\n\n---\n\n## How It Compares\n\n| Feature | Claw Compactor | LLMLingua-2 | SelectiveContext | gzip + base64 |\n|:--------|:-:|:-:|:-:|:-:|\n| Compression rate | 15–82% | 30–70% | 10–40% | 60–80% |\n| ROUGE-L @ 0.3 | **0.653** | 0.346 | ~0.4 | N\u002FA |\n| ROUGE-L @ 0.5 | **0.723** | 0.570 | ~0.6 | N\u002FA |\n| LLM inference cost | **$0** | ~$0.02\u002Fcall | **$0** | **$0** |\n| Latency | **\u003C50ms** | ~300ms | ~200ms | \u003C10ms |\n| Reversible | **Yes** | No | No | Yes (manual) |\n| Content-aware routing | **14 stages** | 1 (perplexity) | 1 (self-info) | None |\n| AST-aware code handling | **Yes** (tree-sitter) | No | No | No |\n| JSON schema sampling | **Yes** | No | No | No |\n| Log\u002Fdiff\u002Fsearch stages | **Yes** | No | No | No |\n| Required dependencies | **0** | torch, transformers | torch | zlib |\n| LLM-readable output | **Yes** | Partial | Partial | **No** |\n\n**Why Claw Compactor wins:** LLMLingua-2 drops tokens by perplexity score — effective for natural language, but destroys code identifiers, JSON keys, and log patterns. Claw Compactor uses content-type-aware stages that understand the structure of what they're compressing.\n\n---\n\n```\nInput\n  |\n  v\n┌─────────────────────────────────────────────────────────────────────────┐\n│                         FUSION PIPELINE                                 │\n│                                                                         │\n│  QuantumLock ─> Cortex ─> Photon ─> RLE ─> SemanticDedup ─> Ionizer    │\n│       |            |         |        |          |              |        │\n│   KV-cache    auto-detect  base64   path     simhash       JSON         │\n│   alignment   16 languages  strip  shorten   dedup        sampling      │\n│                                                                         │\n│  ─> LogCrunch ─> SearchCrunch ─> DiffCrunch ─> StructuralCollapse      │\n│        |              |              |                |                  │\n│    log folding    result dedup   context fold    import merge            │\n│                                                                         │\n│  ─> Neurosyntax ─> Nexus ─> TokenOpt ─> Abbrev ─────────> Output       │\n│        |             |          |           |                            │\n│    AST compress   ML token   format     NL shorten                      │\n│    (tree-sitter)  classify   optimize   (text only)                     │\n│                                                                         │\n│  [ RewindStore ] ── hash-addressed LRU for reversible retrieval         │\n└─────────────────────────────────────────────────────────────────────────┘\n```\n\nKey design principles:\n\n- **Immutable data flow** — `FusionContext` is a frozen dataclass. Every stage produces a new `FusionResult`; nothing is mutated in-place.\n- **Gate-before-compress** — Each stage has `should_apply()` that inspects context type, language, and role before doing any work. Stages that don't apply are skipped at zero cost.\n- **Content-aware routing** — Cortex auto-detects content type (code, JSON, logs, diffs, search results) and language (Python, Go, Rust, TypeScript, etc.), then downstream stages make type-aware compression decisions.\n- **Reversible compression** — Ionizer stores originals in a hash-addressed `RewindStore`. The LLM can call a tool to retrieve any compressed section by its marker ID.\n\n---\n\n## Benchmarks\n\n### Real-World Compression (FusionEngine v7 vs Legacy Regex)\n\n| Content Type | Legacy | FusionEngine | Improvement |\n|:-------------|-------:|-------------:|:-----------:|\n| Python source | 7.3% | **25.0%** | 3.4x |\n| JSON (100 items) | 12.6% | **81.9%** | 6.5x |\n| Build logs | 5.5% | **24.1%** | 4.4x |\n| Agent conversation | 5.7% | **31.0%** | 5.4x |\n| Git diff | 6.2% | **15.0%** | 2.4x |\n| Search results | 5.3% | **40.7%** | 7.7x |\n| **Weighted average** | **9.2%** | **36.3%** | **3.9x** |\n\n### SWE-bench Real Tasks\n\nTested on real SWE-bench instances with actual repository code:\n\n| Instance | Size | Compression |\n|:---------|-----:|------------:|\n| django__django-11620 | 4.5K | **14.5%** |\n| sympy__sympy-14396 | 5.5K | **19.1%** |\n| scikit-learn-25747 | 11.8K | **15.9%** |\n| scikit-learn-13554 | 73K | **11.8%** |\n| scikit-learn-25308 | 81K | **14.4%** |\n\n### vs LLMLingua-2 (ROUGE-L Fidelity)\n\n| Compression Rate | Claw Compactor | LLMLingua-2 | Delta |\n|:-----------------|---------------:|------------:|------:|\n| 0.3 (aggressive) | **0.653** | 0.346 | +88.2% |\n| 0.5 (balanced) | **0.723** | 0.570 | +26.8% |\n\nClaw Compactor preserves more semantic content at the same compression ratio, with zero LLM inference cost.\n\n---\n\n## Quick Start\n\n### Install from PyPI\n\n```bash\npip install claw-compactor\n```\n\n### Or clone from source\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fopen-compress\u002Fclaw-compactor.git\ncd claw-compactor\npip install -e .\n```\n\n### Run\n\n```bash\n# Benchmark your workspace (non-destructive)\nclaw-compactor benchmark \u002Fpath\u002Fto\u002Fworkspace\n\n# Full compression pipeline\nclaw-compactor compress \u002Fpath\u002Fto\u002Fworkspace\n```\n\n**Requirements:** Python 3.9+. Optional: `pip install claw-compactor[accurate]` for exact token counts via tiktoken.\n\n---\n\n## API\n\n### FusionEngine — Single Text\n\n```python\nfrom scripts.lib.fusion.engine import FusionEngine\n\nengine = FusionEngine()\n\nresult = engine.compress(\n    text=\"def hello():\\n    # greeting function\\n    print('hello')\",\n    content_type=\"code\",    # or let Cortex auto-detect\n    language=\"python\",      # optional hint\n)\n\nprint(result[\"compressed\"])     # compressed output\nprint(result[\"stats\"])          # per-stage timing + token counts\nprint(result[\"markers\"])        # Rewind markers for reversibility\n```\n\n### FusionEngine — Chat Messages\n\n```python\nmessages = [\n    {\"role\": \"system\", \"content\": \"You are a coding assistant...\"},\n    {\"role\": \"user\", \"content\": \"Fix the auth bug in login.py\"},\n    {\"role\": \"assistant\", \"content\": \"I found the issue. Here's the fix:\\n```python\\n...\"},\n    {\"role\": \"tool\", \"content\": '{\"results\": [{\"file\": \"login.py\", ...}, ...]}'},\n]\n\nresult = engine.compress_messages(messages)\n\n# Cross-message dedup runs first, then per-message pipeline\nprint(result[\"stats\"][\"reduction_pct\"])   # aggregate compression %\nprint(result[\"per_message\"])              # per-message breakdown\n```\n\n### Rewind — Reversible Retrieval\n\n```python\nengine = FusionEngine(enable_rewind=True)\nresult = engine.compress(large_json, content_type=\"json\")\n\n# LLM sees compressed output with markers like [rewind:abc123...]\n# When the LLM needs the original, it calls the Rewind tool:\noriginal = engine.rewind_store.retrieve(\"abc123def456...\")\n```\n\n### Custom Stage\n\n```python\nfrom scripts.lib.fusion.base import FusionStage, FusionContext, FusionResult\n\nclass MyStage(FusionStage):\n    name = \"my_compressor\"\n    order = 22  # runs between StructuralCollapse (20) and Neurosyntax (25)\n\n    def should_apply(self, ctx: FusionContext) -> bool:\n        return ctx.content_type == \"log\"\n\n    def apply(self, ctx: FusionContext) -> FusionResult:\n        compressed = my_compression_logic(ctx.content)\n        return FusionResult(\n            content=compressed,\n            original_tokens=estimate_tokens(ctx.content),\n            compressed_tokens=estimate_tokens(compressed),\n        )\n\n# Add to pipeline\npipeline = engine.pipeline.add(MyStage())\n```\n\n---\n\n## The 14 Stages\n\n| # | Stage | Order | Purpose | Applies To |\n|:-:|:------|:-----:|:--------|:-----------|\n| 1 | **QuantumLock** | 3 | Isolates dynamic content in system prompts to maximize KV-cache hit rate | system messages |\n| 2 | **Cortex** | 5 | Auto-detects content type and programming language (16 languages) | untyped content |\n| 3 | **Photon** | 8 | Detects and compresses base64-encoded images | all |\n| 4 | **RLE** | 10 | Path shorthand (`$WS`), IP prefix compression, enum compaction | all |\n| 5 | **SemanticDedup** | 12 | SimHash fingerprint deduplication across content blocks | all |\n| 6 | **Ionizer** | 15 | JSON array statistical sampling with schema discovery + error preservation | json |\n| 7 | **LogCrunch** | 16 | Folds repeated log lines with occurrence counts | log |\n| 8 | **SearchCrunch** | 17 | Deduplicates search\u002Fgrep results | search |\n| 9 | **DiffCrunch** | 18 | Folds unchanged context lines in git diffs | diff |\n| 10 | **StructuralCollapse** | 20 | Merges import blocks, collapses repeated assertions\u002Fpatterns | code |\n| 11 | **Neurosyntax** | 25 | AST-aware code compression via tree-sitter (safe regex fallback). Never shortens identifiers. | code |\n| 12 | **Nexus** | 35 | ML token-level compression (stopword removal fallback without model) | text |\n| 13 | **TokenOpt** | 40 | Tokenizer format optimization — strips bold\u002Fitalic markers, normalizes whitespace | all |\n| 14 | **Abbrev** | 45 | Natural language abbreviation. Only fires on text — never touches code, JSON, or structured data. | text |\n\nEach stage is independent and stateless. Stages communicate only through the immutable `FusionContext` that flows forward through the pipeline.\n\n---\n\n## Workspace Commands\n\n```bash\npython3 scripts\u002Fmem_compress.py \u003Cworkspace> \u003Ccommand> [options]\n```\n\n| Command | Description |\n|:--------|:-----------|\n| `full` | Run complete compression pipeline |\n| `benchmark` | Dry-run compression report |\n| `compress` | Rule-based compression only |\n| `dict` | Dictionary encoding with auto-learned codebook |\n| `observe` | Session transcript JSONL to structured observations |\n| `tiers` | Generate L0\u002FL1\u002FL2 tiered summaries |\n| `dedup` | Cross-file duplicate detection |\n| `estimate` | Token count report |\n| `audit` | Workspace health check |\n| `optimize` | Tokenizer-level format optimization |\n| `auto` | Watch mode — compress on file changes |\n\nOptions: `--json`, `--dry-run`, `--since YYYY-MM-DD`, `--quiet`\n\n---\n\n## Architecture\n\nSee [ARCHITECTURE.md](ARCHITECTURE.md) for the full technical deep-dive:\n- Immutable data flow design\n- Stage execution model and gating\n- Rewind reversible compression protocol\n- Cross-message semantic deduplication\n- How to extend the pipeline\n\n```\n12,000+ lines Python  ·  1,600+ tests  ·  14 fusion stages  ·  0 external ML dependencies\n```\n\n---\n\n## Installation\n\n```bash\n# Clone\ngit clone https:\u002F\u002Fgithub.com\u002Fopen-compress\u002Fclaw-compactor.git\ncd claw-compactor\n\n# Optional: exact token counting\npip install tiktoken\n\n# Optional: AST-aware code compression (Neurosyntax)\npip install tree-sitter-language-pack\n\n# Development\npip install -e \".[dev,accurate]\"\n```\n\n**Zero required dependencies.** tiktoken and tree-sitter are optional enhancements — the pipeline runs with built-in heuristic fallbacks for both.\n\n---\n\n## Who Uses This\n\n| Project | How |\n|:--------|:----|\n| [OpenClaw](https:\u002F\u002Fopenclaw.ai) | Built-in skill for all OpenClaw AI agents — compresses workspace context before every LLM call |\n| [OpenCompress](https:\u002F\u002Fopencompress.ai) | Production compression engine powering the OpenCompress API |\n\nUsing Claw Compactor? [Open a PR](https:\u002F\u002Fgithub.com\u002Fopen-compress\u002Fclaw-compactor\u002Fpulls) to add yourself here.\n\n---\n\n## Project Stats\n\n| Metric | Value |\n|:-------|:------|\n| Tests | 1,600+ passed |\n| Python source | 12,000+ lines |\n| Fusion stages | 14 |\n| Languages detected | 16 |\n| Required dependencies | 0 |\n| Compression (code) | 15–25% |\n| Compression (JSON peak) | 81.9% |\n| ROUGE-L @ 0.3 rate | 0.653 |\n| License | MIT |\n\n---\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on:\n- Setting up the development environment\n- Adding new Fusion stages\n- Running the test suite\n- Submitting PRs\n\n---\n\n## Related\n\n- [OpenClaw](https:\u002F\u002Fopenclaw.ai) — AI agent platform\n- [ClawhubAI](https:\u002F\u002Fclawhub.com) — Agent skills marketplace\n- [OpenClaw Discord](https:\u002F\u002Fdiscord.com\u002Finvite\u002Fclawd) — Community\n- [OpenClaw Docs](https:\u002F\u002Fdocs.openclaw.ai) — Documentation\n- [Full Documentation](https:\u002F\u002Fopen-compress.github.io\u002Fclaw-compactor) — GitHub Pages docs\n\n---\n\n`token-compression` `llm-tools` `fusion-pipeline` `reversible-compression` `ast-code-analysis` `context-compression` `ai-agent` `openclaw` `python` `developer-tools`\n\n## License\n\n[MIT](LICENSE)\n","Claw Compactor 是一个专为大型语言模型（LLM）设计的令牌压缩引擎，通过14阶段的融合管道实现高效的可逆压缩、AST感知代码分析和智能内容路由。其核心技术包括AST代码分析、JSON统计采样以及基于simhash的去重等，这些技术被串联在一个不可变的数据流架构中，逐级优化输入数据，最终达到显著减少LLM推理成本的目的。此项目特别适用于需要处理大量文本数据以降低计算资源消耗的应用场景，如开发者工具、AI代理工具或任何依赖于高效上下文管理的系统。使用Python编写，并采用MIT许可证开放源码。",2,"2026-06-11 03:48:58","high_star"]