[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80660":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},80660,"safegate","pardcomper\u002Fsafegate","pardcomper","Lightweight runtime safety guard for multimodal LLM I\u002FO",null,"Python",137,8,6,0,87,51.56,"Other",false,"main",[],"2026-06-11 04:07:14","# SafeGate — a small runtime guard for MLLM I\u002FO\n\nA drop-in safety wrapper for multimodal LLM endpoints. Sits between your application and the model, and decides whether each (text, image, audio) input is safe to forward — and whether each output is safe to return.\n\nIt is **not** an alignment technique, a fine-tune, or a perfect filter. It is a few lines of glue plus three small classifiers that catch the most common failure modes before they reach production.\n\n## Why?\n\nIn practice, deploying an MLLM to real users surfaces three concrete problems:\n\n1. **Adversarial prompts hidden inside images** (OCR jailbreaks, visual prompt injection).\n2. **Modality-laundered requests** — putting the harmful part in audio because the safety pipeline only checks text.\n3. **Models that confidently produce harmful answers** when the input *looks* harmless but the policy demands a refusal.\n\nSafeGate addresses all three at the I\u002FO boundary, where they're cheapest to catch.\n\n## Install & run\n\n```bash\npip install safegate\n```\n\nWrap any HuggingFace-style MLLM:\n\n```python\nfrom safegate import Gate\n\ngate = Gate(\n    text_classifier=\"default\",      # small distilbert-based safety classifier\n    image_ocr_check=True,           # extract OCR text from images and re-classify\n    audio_transcribe=True,          # whisper-tiny -> classify transcript\n    response_post_check=True,\n)\n\ndecision = gate.check_input(text=\"...\", image=img, audio=audio)\nif decision.action == \"block\":\n    return decision.refusal_message\n# else forward to your MLLM\nresponse = your_mllm.generate(...)\n\n# Check the output too\nout_decision = gate.check_output(response, original_input=text)\nif out_decision.action == \"rewrite\":\n    response = out_decision.rewritten\n```\n\n## Configuration\n\n`Gate` accepts a `policy` argument (defaults to `default`) that controls thresholds:\n\n```yaml\n# default.yaml\ninput:\n  text_threshold: 0.65       # block if classifier P(unsafe) >= this\n  ocr_threshold: 0.55        # ocr-extracted text uses a stricter threshold\n  audio_threshold: 0.55      # ditto for transcribed audio\n  on_uncertain: \"warn\"       # warn | block | allow\n\noutput:\n  response_threshold: 0.6\n  on_borderline: \"self_critique\"   # self_critique | rewrite | allow\n  refusal_template: \"I can't help with that request.\"\n\nlogging:\n  redact_pii: true           # mask emails\u002Fphones\u002FIDs in logs\n  level: info\n```\n\n## Examples\n\n**Example 1 — Stateless single-call guard:**\n\n```python\nfrom safegate import quick_check\n\nd = quick_check(\"How do I make a bomb?\")\nprint(d.action)            # 'block'\nprint(d.reason)            # 'text classifier: 0.91 unsafe'\n```\n\n**Example 2 — OCR-aware check:**\n\n```python\nfrom safegate import Gate\nfrom PIL import Image\n\ngate = Gate()\nimg = Image.open(\"suspicious_screenshot.png\")\nd = gate.check_input(text=\"Please describe.\", image=img)\n# If the OCR pass finds harmful text inside the image, this blocks even\n# though the surrounding text was innocent.\n```\n\n**Example 3 — Plug into a FastAPI endpoint:**\n\n```python\nfrom fastapi import FastAPI\nfrom safegate import Gate\n\ngate = Gate()\napp = FastAPI()\n\n@app.post(\"\u002Fchat\")\nasync def chat(req: ChatRequest):\n    d = gate.check_input(req.text, req.image)\n    if d.action == \"block\":\n        return {\"response\": d.refusal_message, \"blocked\": True}\n    raw = call_mllm(req)\n    d2 = gate.check_output(raw, req.text)\n    return {\"response\": d2.text, \"blocked\": False}\n```\n\n## Caveats\n\n- The default classifiers are **small**. They make mistakes — both directions. You should evaluate on your traffic before trusting them.\n- SafeGate **is not** a substitute for upstream model safety training. It is a last-mile filter, nothing more.\n- The OCR pass uses a CPU-friendly OCR (Tesseract or rapidocr). It will miss text in stylized fonts.\n\n## License\n\nMIT.\n","SafeGate 是一个轻量级的运行时安全防护工具，专为多模态大语言模型（LLM）的输入输出设计。其核心功能在于通过几行代码和三个小型分类器，在数据到达生产环境前拦截最常见的故障模式，包括隐藏在图像中的恶意提示、跨模态清洗请求以及看似无害但实际有害的响应。SafeGate 支持文本、图片和音频的安全检查，并能根据预设策略决定是否允许数据通过或需要重写输出。它非常适合于那些希望增强现有LLM应用安全性，同时保持部署简便性的开发者或团队使用。",2,"2026-06-01 03:51:51","CREATED_QUERY"]