[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80106":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":15,"starSnapshotCount":15,"syncStatus":37,"lastSyncTime":38,"discoverSource":39},80106,"control-layer","Emmimal\u002Fcontrol-layer","Emmimal","A production-grade control layer that sits between your application logic and any LLM — input validation, schema enforcement, circuit breaking, targeted retry, and audit logging in one composable pipeline.","",null,"Python",59,9,56,0,1,3,"MIT License",false,"main",true,[23,24,25,26,27,28,29,30,31,32,33],"anthropic","circuit-breaker","generative-ai","input-validation","llm","llm-guardrails","llm-ops","production-ai","prompt-engineering","python","structured-output","2026-06-12 02:03:58","# control-layer\nA production-grade control layer that sits between your application logic and any LLM — input validation, schema enforcement, circuit breaking, targeted retry, and audit logging in one composable pipeline.\n\n\n![Python Version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.12-blue)\n![Tests](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ftests-69%20passed-brightgreen)\n![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green)\n![Dependencies](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdeps-tiktoken%20%7C%20tenacity%20%7C%20pydantic%20%7C%20structlog-lightgrey)\n\nMost LLM integrations stop at: write a prompt, call the model, use the response. This\nlibrary handles what prompt engineering cannot — enforcing what the model actually returns,\nblocking what should never reach it, and recovering cleanly when things break.\n\nRead the full write-up on Towards Data Science →\n**[Prompt Engineering Failed in Production — I Built the Control Layer That Actually Works](https:\u002F\u002Ftowardsdatascience.com\u002Fprompt-engineering-isnt-enough-i-built-a-control-layer-that-works-in-production\u002F)**\n\n---\n\n## What It Does\n\n```\nUser Input\n    |\n[1] InputGuard          -- injection detection (20 patterns), length check, sanitization\n    |\n[2] CircuitBreaker      -- stops hammering a failing LLM backend\n    |\n[3] TokenBudget         -- tiktoken-accurate slot allocation, priority order\n[4] PromptBuilder       -- assembles prompt within budget, injects constraints\n    |\n[5] LLMCaller           -- enforces hard timeout on every call\n    |\n[6] ResponseValidator   -- JSON schema, length bounds, forbidden phrases, quality score\n    | [failed?]\n[7] RetryEngine         -- targeted prompt mutation per failure mode, jittered backoff\n    | [exhausted?]\n[8] FallbackRouter      -- cached response, template, or escalation chain\n    |\n    AuditLogger         -- every attempt written to JSONL, thread-safe, persistent\n    |\nControlPacket           -- response, attempts, latency, score, audit_id\n```\n\n| Component | Job |\n|---|---|\n| InputGuard | Blocks injection attempts and oversized input before any LLM call |\n| CircuitBreaker | Opens after N consecutive failures; rejects calls instantly during recovery |\n| TokenBudget | tiktoken-accurate slot-based allocator; prevents silent overflow |\n| PromptBuilder | Assembles prompt in priority order with hard constraints injected structurally |\n| LLMCaller | Wraps any callable LLM with thread-based timeout enforcement |\n| ResponseValidator | Validates JSON structure, required keys, length, forbidden phrases |\n| RetryEngine | Maps each failure mode to a targeted mutation hint; jittered exponential backoff |\n| FallbackRouter | Registered fallback chain; first non-empty response wins |\n| AuditLogger | Thread-safe JSONL audit log; P50\u002FP90\u002FP99 latency stats; failure distribution |\n\n---\n\n## Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FEmmimal\u002Fcontrol-layer.git\ncd control-layer\npip install tiktoken tenacity pydantic structlog   # required\npip install pytest                                  # optional — for running tests\n```\n\nNo ML dependencies. No GPU required. All functionality runs on the Python standard library\nplus the four packages above.\n\n---\n\n## Quick Start\n\n```python\nfrom control_layer import ControlLayer, ControlLayerConfig, ResponseSchema\n\n# Define your output contract\nschema = ResponseSchema(\n    must_be_json=True,\n    required_keys=[\"summary\", \"confidence\"],\n    max_length=400,\n    forbidden_phrases=[\"I cannot\", \"As an AI\"],\n)\n\n# Configure the layer\nconfig = ControlLayerConfig(\n    total_tokens=800,\n    max_attempts=3,\n    timeout_seconds=30.0,\n    cb_failure_threshold=5,\n    cb_recovery_seconds=30.0,\n)\n\n# Swap in any LLM callable — OpenAI, Anthropic, local model, mock\ndef your_llm_call(prompt: str) -> str:\n    ...\n\nlayer = ControlLayer(\n    llm_fn=your_llm_call,\n    system_prompt=\"You are a structured research assistant.\",\n    schema=schema,\n    config=config,\n)\n\n# Register fallbacks — called in order when retries exhaust\nlayer.register_fallback(\n    \"cache\",\n    lambda q: '{\"summary\": \"Cached response.\", \"confidence\": 0.5}',\n)\n\n# Run\npacket = layer.run(\n    user_input=\"How does token budget allocation work?\",\n    constraints=[\n        \"Return only valid JSON.\",\n        \"Include 'summary' and 'confidence' keys.\",\n        \"No markdown fencing.\",\n    ],\n    context=retrieved_documents,   # optional RAG context\n)\n\nprint(packet.response)            # final response\nprint(packet.validation.passed)   # True \u002F False\nprint(packet.attempts)            # 1, 2, or 3\nprint(packet.total_latency_ms)    # end-to-end latency\nprint(packet.audit_id)            # ties all log lines to this request\n```\n\n---\n\n## Running the Demos\n\nFive runnable demos covering every failure mode and recovery path. No API key required.\nThe `MockLLM` simulates realistic failure behavior at a configurable rate.\n\n```bash\npython demo.py\n```\n\n| Demo | What It Shows |\n|---|---|\n| 1 | Input guard blocking 7 of 8 inputs — injection, empty, oversized |\n| 2 | Schema enforcement with retry — 75% first-attempt failure rate, mutation hints |\n| 3 | Constraint violation recovery — length and forbidden phrase, 3 attempts |\n| 4 | Fallback router — exhausted retries route to cached response |\n| 5 | Benchmark — naive 0% pass rate vs control layer 100%, latency breakdown |\n\nRunning Demo 5 also generates `control_layer_benchmark.png` — a 6-panel benchmark figure\nshowing pass rate, failure mode distribution, retry distribution, latency percentiles,\ntoken budget allocation, and quality score histogram.\n\n---\n\n## Running the Tests\n\n```bash\npytest tests\u002F -v\n```\n\n```\nTestInputGuard               14 tests   PASSED\nTestTokenBudget               5 tests   PASSED\nTestPromptBuilder             6 tests   PASSED\nTestResponseValidator        10 tests   PASSED\nTestCircuitBreaker            5 tests   PASSED\nTestRetryEngine               6 tests   PASSED\nTestFallbackRouter            4 tests   PASSED\nTestLLMCaller                 2 tests   PASSED\nTestAuditLogger               5 tests   PASSED\nTestControlLayerIntegration   8 tests   PASSED\nTestPydanticConfig            4 tests   PASSED\n\n69 passed in 1.19s\n```\n\nEvery component is tested in isolation. Integration tests cover the full orchestration\npath: first-attempt success, retry on schema violation, fallback after exhausted retries,\ncircuit breaker rejection after consecutive timeouts, and Pydantic config validation errors.\n\n---\n\n## Configuration Reference\n\n```python\nControlLayerConfig(\n    # Token budget\n    total_tokens=800,              # Total token budget for prompt assembly\n    model_name=\"cl100k_base\",      # tiktoken encoding name\n\n    # Input validation\n    max_input_chars=2000,          # Hard limit on user input length\n\n    # LLM call\n    timeout_seconds=30.0,          # Hard timeout per LLM call\n\n    # Retry\n    max_attempts=3,                # Maximum retry attempts per request\n    base_delay_ms=50.0,            # Base exponential backoff delay\n    max_delay_ms=2000.0,           # Maximum backoff delay\n    jitter_ms=25.0,                # Random jitter added to each delay\n\n    # Circuit breaker\n    cb_failure_threshold=5,        # Consecutive failures before opening\n    cb_recovery_seconds=30.0,      # Seconds before attempting recovery\n\n    # Audit\n    audit_log_path=\"audit.jsonl\",  # JSONL audit log path\n)\n```\n\n```python\nResponseSchema(\n    must_be_json=False,            # Require valid JSON response\n    required_keys=[],              # Keys that must appear in JSON output\n    max_length=None,               # Maximum response length in characters\n    min_length=None,               # Minimum response length in characters\n    forbidden_phrases=[],          # Phrases that must not appear in response\n    must_contain=[],               # Phrases that must appear (used for quality score)\n)\n```\n\n---\n\n## Swapping the LLM\n\nThe `llm_fn` parameter accepts any callable that takes a `str` and returns a `str`.\n\n```python\n# OpenAI\nimport openai\nclient = openai.OpenAI()\n\ndef openai_call(prompt: str) -> str:\n    response = client.chat.completions.create(\n        model=\"gpt-4o\",\n        messages=[{\"role\": \"user\", \"content\": prompt}],\n    )\n    return response.choices[0].message.content\n\nlayer = ControlLayer(llm_fn=openai_call, ...)\n\n# Anthropic\nimport anthropic\nclient = anthropic.Anthropic()\n\ndef claude_call(prompt: str) -> str:\n    response = client.messages.create(\n        model=\"claude-sonnet-4-5\",\n        max_tokens=1024,\n        messages=[{\"role\": \"user\", \"content\": prompt}],\n    )\n    return response.content[0].text\n\nlayer = ControlLayer(llm_fn=claude_call, ...)\n\n# Any local model\nlayer = ControlLayer(llm_fn=lambda prompt: your_local_model.generate(prompt), ...)\n```\n\n---\n\n## Project Structure\n\n```\ncontrol-layer\u002F\n├── control_layer.py          # All eight components + ControlLayer orchestrator\n├── demo.py                   # Five runnable demos + benchmark charts\n├── tests\u002F\n│   └── test_control_layer.py # 69 tests across all components\n├── audit.jsonl               # Generated on first run (append-only audit log)\n├── control_layer_benchmark.png  # Generated by demo.py\n└── README.md\n```\n\n---\n\n## Benchmark\n\nMeasured on Python 3.12.6, Windows 11, CPU only, no GPU.\nTen structured output queries, 55% first-attempt failure rate.\n\n| Metric | Naive | Control Layer |\n|---|---|---|\n| Pass rate | 0% | 100% |\n| Min latency (ms) | 37.3 | 46.2 |\n| Median latency (ms) | 43.3 | 143.5 |\n| Mean latency (ms) | 42.9 | 139.8 |\n| P90 latency (ms) | 45.6 | 168.0 |\n| Max latency (ms) | 48.4 | 281.9 |\n| Resolved on attempt 1 | N\u002FA | 2 |\n| Resolved on attempt 2 | N\u002FA | 7 |\n| Resolved on attempt 3+ | N\u002FA | 1 |\n\nComponent overhead (excluding LLM call):\n\n| Operation | Latency | Notes |\n|---|---|---|\n| InputGuard validation | ~0.2ms | 20 regex patterns |\n| tiktoken count (100 tokens) | ~0.8ms | Encoding lookup |\n| PromptBuilder.build() | ~1.1ms | Budget allocation + assembly |\n| ResponseValidator.validate() | ~0.3ms | JSON parse + rule checks |\n| CircuitBreaker.is_open() | ~0.05ms | Lock acquire + state check |\n| AuditLogger.log() | ~0.4ms | Lock + file append |\n| Total non-LLM overhead | ~2.9ms | Per request |\n\nThe LLM call dominates every other number. The control layer adds under 3ms of overhead\nper request, which is within the variance of a single network round-trip.\n\n---\n\n## When to Use This\n\nWorth it when you have:\n\n- LLM responses that drive downstream code — JSON parsed programmatically, data written\n  to a database, outputs shown to users without human review\n- User input passed to an LLM without a validation layer in between\n- Structured output requirements the model violates intermittently\n- Production systems where a LLM outage would block threads or hang requests\n\nSkip it when you have:\n\n- Single-turn, low-stakes use cases where a bad response is displayed and discarded\n- Hard latency requirements under 50ms — retry delays alone can exceed this\n- A chatbot where the user sees the raw model output and can judge it themselves\n\n---\n\n## Known Limitations\n\n**Injection patterns are not exhaustive.** Twenty patterns cover the OWASP LLM Top 10\nattack taxonomy. Adversarial prompts crafted to avoid known patterns will pass. Combine\nwith embedding-based anomaly detection for high-risk deployments.\n\n**Circuit breaker state is in-process only.** A restart resets the circuit to CLOSED\nregardless of backend status. For multi-instance deployments, share circuit state via\nRedis or a similar low-latency store.\n\n**No streaming support.** The `LLMCaller` collects the full response before validation.\nStreaming APIs require partial validation heuristics or full response buffering — neither\nis implemented.\n\n**Quality score uses phrase matching, not semantic similarity.** `must_contain` checks\nexact string presence. A response that paraphrases a required concept without using the\nexact phrase scores zero. Swap in an embedding-based scorer for higher precision.\n\n**AuditLogger grows unbounded.** The JSONL file appends on every call. In production,\nship it to object storage on a rolling basis and rotate locally.\n\n---\n\n## Related\n\n\n**Same series — production layers for LLM systems:**\n\n- [RAG Is Blind to Time — I Built a Temporal Layer to Fix It in Production](https:\u002F\u002Ftowardsdatascience.com\u002Frag-is-blind-to-time-i-built-a-temporal-layer-to-fix-it-in-production\u002F)\n  — temporal awareness layer for RAG systems that treats time as a first-class\n  retrieval signal.\n\n- [LLM Evals Are Based on Vibes — I Built the Missing Layer That Decides What Ships](https:\u002F\u002Ftowardsdatascience.com\u002Fllm-evals-are-based-on-vibes-i-built-the-missing-layer-that-decides-what-ships\u002F)\n  — evaluation layer that replaces gut-feel shipping decisions with measurable\n  output quality gates.\n\n- [PyTorch NaNs Are Silent Killers — I Built a 3ms Hook to Catch Them at the Exact Layer](https:\u002F\u002Ftowardsdatascience.com\u002Fpytorch-nans-are-silent-killers-i-built-a-3ms-hook-to-catch-them-at-the-exact-layer\u002F)\n  — lightweight hook that catches NaN propagation at the exact layer it\n  originates, in under 3ms overhead.\n\n- [context-engine](https:\u002F\u002Fgithub.com\u002FEmmimal\u002Fcontext-engine) — retrieval,\n  re-ranking, memory decay, and token budget control for RAG systems. The\n  control layer handles what the model returns. The context engine handles\n  what it receives. They compose.\n\n---\n\n## License\n\nMIT\n","Emmimal\u002Fcontrol-layer 是一个生产级别的控制层，位于应用程序逻辑和任何大语言模型（LLM）之间，提供输入验证、模式强制、断路器、定向重试和审计日志等功能。项目采用Python编写，支持包括输入防护、电路断路器、令牌预算管理、提示构建、响应验证、重试机制及回退路由等核心组件，确保与LLM交互时的安全性和稳定性。特别适用于需要对生成式AI应用进行严格控制的场景，如企业级软件开发中集成复杂的语言模型服务时，能够有效避免潜在的安全风险并提高系统的健壮性。MIT许可证下开放源代码，当前已获得56个星标。",2,"2026-06-11 03:59:16","CREATED_QUERY"]