[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1662":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":15,"starSnapshotCount":15,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},1662,"context-engine","Emmimal\u002Fcontext-engine","Emmimal","A pure-Python context management layer for LLM systems — retrieval, re-ranking, memory decay, and token-budget enforcement in one pipeline.","",null,"Python",191,28,5,0,6,4.39,"MIT License",false,"main",true,[23,24,25,26,27],"context-engineering","large-language-models","llm","prompt-engineering","rag","2026-06-12 02:00:31","# context-engine\nA pure-Python context management layer for LLM systems — retrieval, re-ranking, memory decay, and token-budget enforcement in one pipeline.\n\n# context-engine\n\n> A pure-Python context management layer for LLM systems — retrieval, re-ranking, memory decay, and token-budget enforcement in one pipeline.\n\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10%2B-blue)](https:\u002F\u002Fwww.python.org\u002F)\n[![Version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fversion-1.1.0-green)]()\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-lightgrey)]()\n\nMost RAG tutorials stop at: retrieve documents, stuff them into a prompt, call the model.\nThis library handles what comes next — deciding *what* the model actually sees, *how much*\nof it, and in *what order*, under real token constraints.\n\nRead the full write-up on Towards Data Science → **[RAG Isn’t Enough — I Built the Missing Layer That Makes LLM Systems Work](https:\u002F\u002Ftowardsdatascience.com\u002Frag-isnt-enough-i-built-the-missing-context-layer-that-makes-llm-systems-work\u002F)**\n\n---\n\n## What It Does\n\n```\nDocuments → Retriever → Re-ranker → Compressor → TokenBudget → ContextPacket → LLM\n                                         ↑\n                                      Memory\n```\n\nFive components, one `build()` call:\n\n| Component     | Job                                                              |\n|---------------|------------------------------------------------------------------|\n| `Retriever`   | keyword \u002F TF-IDF \u002F hybrid (embedding + TF-IDF) retrieval        |\n| Re-ranker     | tag-weighted score blending to promote domain-relevant docs      |\n| `Memory`      | exponential decay, auto-importance scoring, deduplication        |\n| `Compressor`  | truncate \u002F sentence \u002F extractive query-aware compression         |\n| `TokenBudget` | slot-based budget enforcer (system → history → docs)             |\n\n---\n\n## Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FEmmimal\u002Fcontext-engine.git\ncd context-engine\npip install numpy                        # required\npip install sentence-transformers        # optional — enables hybrid retrieval\n```\n\nNo other dependencies. All core functionality runs on the Python standard library + numpy.\nIf `sentence-transformers` is not installed, hybrid mode falls back to random embeddings\nwith a warning — useful for development and testing.\n\n---\n\n## Quick Start\n\n```python\nfrom context_engineering import ContextEngine, Document\n\ndocs = [\n    Document(id=\"doc-1\", content=\"RAG grounds models in external knowledge.\", tags=[\"rag\"]),\n    Document(id=\"doc-2\", content=\"Memory decay prevents context bloat.\", tags=[\"memory\"]),\n]\n\nengine = ContextEngine(\n    documents=docs,\n    total_token_budget=800,\n    retrieval_mode=\"hybrid\",          # \"keyword\" | \"tfidf\" | \"hybrid\"\n    compression_strategy=\"extractive\" # \"truncate\" | \"sentence\" | \"extractive\"\n)\n\n# First turn\npacket = engine.build(\"How does memory decay work?\")\nprint(packet.to_prompt_string())\nengine.remember(\"user\", \"How does memory decay work?\")\nengine.remember(\"assistant\", \"Memory decay reduces the weight of older turns over time.\")\n\n# Second turn — memory now competes for budget; compression tightens automatically\npacket = engine.build(\"What happens to irrelevant turns?\")\nprint(packet.diagnostics())\n```\n\n---\n\n## Running the Demos\n\nSeven runnable demos covering every component:\n\n```bash\npython demo.py\n```\n\n| Demo | What It Shows                                      |\n|------|----------------------------------------------------|\n| 1    | Keyword vs TF-IDF retrieval on the same query      |\n| 2    | All three compression strategies side by side      |\n| 3    | Memory decay and deduplication                     |\n| 4    | Token budget slot enforcement                      |\n| 5    | Full engine under tight token pressure             |\n| 6    | Prompt engineering vs context engineering contrast |\n| 7    | Hybrid retrieval + re-ranking + auto-importance    |\n\n---\n\n## Configuration Reference\n\n```python\nContextEngine(\n    documents=[],                  # Initial document list (add more with .add_document())\n    total_token_budget=2048,       # Total token budget across all slots\n    system_prompt=\"...\",           # Fixed overhead reserved first\n    retrieval_top_k=5,             # Documents to keep after re-ranking\n    retrieval_mode=\"hybrid\",       # \"keyword\" | \"tfidf\" | \"hybrid\"\n    compression_strategy=\"extractive\",\n    memory_short_term=4,           # Turns always included regardless of decay\n    memory_decay_rate=0.001,       # Exponential decay rate (per second)\n    hybrid_alpha=0.65,             # 0.0 = pure TF-IDF, 1.0 = pure embeddings\n)\n```\n\n**Tuning `hybrid_alpha`:**\n\n| Query type                        | Suggested alpha |\n|-----------------------------------|-----------------|\n| Exact term lookup                 | 0.3 – 0.4       |\n| General \u002F mixed                   | 0.6 – 0.7       |\n| Conceptual \u002F paraphrase-heavy     | 0.8 – 0.9       |\n\n---\n\n## Project Structure\n\n```\ncontext-engine\u002F\n├── __init__.py               # Public API surface\n├── retriever.py              # Retriever + EmbeddingEngine + Document \u002F ScoredDocument\n├── memory.py                 # Memory + Turn (decay, dedup, auto-importance)\n├── compressor.py             # Compressor + TokenBudget + CompressionResult\n├── context_engineering.py    # ContextEngine + ContextPacket (orchestrator)\n└── demo.py                   # Seven runnable demos\n```\n\n---\n\n## Performance (CPU only, 5-doc knowledge base)\n\n| Operation              | Latency  |\n|------------------------|----------|\n| Keyword retrieval      | ~0.8 ms  |\n| TF-IDF retrieval       | ~2.1 ms  |\n| Hybrid retrieval       | ~85 ms   |\n| Re-ranking (5 docs)    | ~0.3 ms  |\n| Extractive compression | ~4.2 ms  |\n| Full `engine.build()`  | ~92 ms   |\n\nHybrid retrieval dominates latency. For sub-50ms requirements, use `tfidf` or `keyword` mode.\nEmbedding results are cached after the first call — subsequent queries on the same document\nset drop to ~2ms for the embedding step.\n\n---\n\n## When to Use This\n\n**Worth it when you have:**\n- Multi-turn conversations where context accumulates across turns\n- A large knowledge base where retrieval noise degrades quality\n- A tight token budget and quality requirements that outweigh ~92ms overhead\n\n**Skip it when you have:**\n- Single-turn queries against a small fixed dataset\n- Hard latency requirements under 50ms\n- Fully deterministic domains where keyword retrieval is sufficient\n\n---\n\n## Known Limitations\n\n- Token estimation uses 1 token ≈ 4 characters (English prose). Misfires for code and\n  non-Latin scripts. Swap in `tiktoken` in `compressor.py` for exact counts.\n- The extractive compressor scores sentences by query-token recall overlap, not semantic\n  similarity. Sentences that paraphrase the query without sharing tokens score zero.\n- `Memory` is in-process only — no persistence across sessions.\n- `hybrid_alpha=0.65` is empirically tuned on a small query set. Tune it for your domain.\n\n---\n## Related\n\n**Same series — production layers for LLM systems:**\n\n- [RAG Is Blind to Time — I Built a Temporal Layer to Fix It in Production](https:\u002F\u002Ftowardsdatascience.com\u002Frag-is-blind-to-time-i-built-a-temporal-layer-to-fix-it-in-production\u002F)\n  — temporal awareness layer for RAG systems that treats time as a first-class\n  retrieval signal.\n\n- [LLM Evals Are Based on Vibes — I Built the Missing Layer That Decides What Ships](https:\u002F\u002Ftowardsdatascience.com\u002Fllm-evals-are-based-on-vibes-i-built-the-missing-layer-that-decides-what-ships\u002F)\n  — evaluation layer that replaces gut-feel shipping decisions with measurable\n  output quality gates.\n\n- [PyTorch NaNs Are Silent Killers — I Built a 3ms Hook to Catch Them at the Exact Layer](https:\u002F\u002Ftowardsdatascience.com\u002Fpytorch-nans-are-silent-killers-i-built-a-3ms-hook-to-catch-them-at-the-exact-layer\u002F)\n  — lightweight hook that catches NaN propagation at the exact layer it\n  originates, in under 3ms overhead.\n\n## License\n\nMIT\n","context-engine 是一个专为大语言模型系统设计的纯 Python 上下文管理层，集成了检索、重新排序、记忆衰减和令牌预算控制等功能。其核心功能包括通过多种方式（如关键词、TF-IDF或混合模式）进行文档检索，并根据领域相关性对结果进行重排序；使用指数衰减等机制自动调整文档的重要性并去除重复项；以及基于查询感知的压缩策略来优化输入文本长度以适应实际的令牌限制。适用于需要在有限的令牌预算内高效处理大量外部知识场景下的对话式AI应用开发，比如客服机器人、智能助手等领域。",2,"2026-06-11 02:45:17","CREATED_QUERY"]