[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11508":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":16,"starSnapshotCount":16,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},11508,"OrcaRouter-Lite","Continuum-AI-Corp\u002FOrcaRouter-Lite","Continuum-AI-Corp","Self-hosted LLM router with a managed safety net. OpenAI-compatible. BYOK. Single-workspace. Streaming. For more advanced routing choose hosted OrcaRouter","https:\u002F\u002Fwww.orcarouter.ai",null,"Python",544,46,9,5,0,674,9.02,"MIT License",false,"main",[],"2026-06-12 02:02:32","# OrcaRouter Lite\n\n**Self-hosted LLM router with a managed safety net.**\nOpenAI-compatible. BYOK. Single-workspace. Streaming. `model=\"auto\"`.\n\n![OrcaRouter Lite Logo](https:\u002F\u002Fgithub.com\u002FContinuum-AI-Corp\u002FOrcaRouter-Lite\u002Fblob\u002Fmain\u002Fdesign\u002FOrcaRouter%20Lite.png?raw=true)\n\n[![tests](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ftests-127_passing-brightgreen)](#testing)\n[![models](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fmodels-100%2B-blue)](#model-catalog)\n[![license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue)](#license)\n\n## Languages\n\n- [English](.\u002FREADME.md)\n- [日本語](.\u002FREADME.ja.md)\n- [中文](.\u002FREADME.zh.md)\n- [한국어](.\u002FREADME.ko.md)\n- [Deutsch](.\u002FREADME.de.md)\n- [Français](.\u002FREADME.fr.md)\n- [Español](.\u002FREADME.es.md)\n- [Italiano](.\u002FREADME.it.md)\n- [Русский](.\u002FREADME.ru.md)\n- [Português](.\u002FREADME.pt.md)\n- [Tiếng Việt](.\u002FREADME.vi.md)\n- [हिन्दी](.\u002FREADME.hi.md)\n\nOrcaRouter Lite is the open-source single-workspace edition of [OrcaRouter](https:\u002F\u002Fwww.orcarouter.ai). Run it on your laptop, ship it in your product, or use hosted `api.orcarouter.ai` directly for the long tail of models you don't want to manage keys for.\n\n> **Why us?** LiteLLM is a library; OpenRouter is closed-source hosted; Ollama is local-only. We're the **self-hosted server with a managed fallback** — a sentence none of those can say.\n\n## 60-second quickstart\n\nTwo ways to use OrcaRouter:\n\n### Path A — Self-hosted (BYOK)\n\nRun Lite on your own machine; bring your own provider keys.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FContinuum-AI-Corp\u002FOrcaRouter-Lite.git\ncd OrcaRouter-Lite\ncp .env.example .env\n# add at least one: OPENAI_API_KEY=sk-...  (or ORCAROUTER_API_KEY=...)\n\ndocker compose up\n# logs: ✓ orcarouter-lite ready. API key: sk-orca-abc123...\n```\n\nBase URL: `http:\u002F\u002Flocalhost:8000\u002Fv1`. Use the `sk-orca-*` key printed at startup.\n\n### Path B — Hosted (account required)\n\nNo clone, no docker. Register, get a key, point any OpenAI SDK at hosted.\n\n```bash\n# 1. Register at https:\u002F\u002Fwww.orcarouter.ai and copy your sk-orca-* key\n# 2. Use https:\u002F\u002Fapi.orcarouter.ai\u002Fv1 as the base URL\n```\n\n**Account required.** Hosted handles routing, billing, and the long tail of providers — billed per-token on your OrcaRouter account. See [docs.orcarouter.ai\u002Fintroduction](https:\u002F\u002Fdocs.orcarouter.ai\u002Fintroduction).\n\n### Then call it from any OpenAI SDK\n\nExamples below use Path A's localhost base URL — swap for `https:\u002F\u002Fapi.orcarouter.ai\u002Fv1` if you're on Path B.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Python\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(\n    base_url=\"http:\u002F\u002Flocalhost:8000\u002Fv1\",\n    api_key=\"sk-orca-abc123...\",\n)\nr = client.chat.completions.create(\n    model=\"auto\",  # or \"gpt-4o-mini\", \"claude-3-5-sonnet-latest\", ...\n    messages=[{\"role\": \"user\", \"content\": \"Hello!\"}],\n)\nprint(r.choices[0].message.content)\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Node.js\u003C\u002Fb>\u003C\u002Fsummary>\n\n```js\nimport OpenAI from \"openai\";\n\nconst client = new OpenAI({\n  baseURL: \"http:\u002F\u002Flocalhost:8000\u002Fv1\",\n  apiKey: \"sk-orca-abc123...\",\n});\n\nconst r = await client.chat.completions.create({\n  model: \"auto\",\n  messages: [{ role: \"user\", content: \"Hello!\" }],\n});\nconsole.log(r.choices[0].message.content);\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>curl\u003C\u002Fb>\u003C\u002Fsummary>\n\n```bash\ncurl http:\u002F\u002Flocalhost:8000\u002Fv1\u002Fchat\u002Fcompletions \\\n  -H \"Authorization: Bearer sk-orca-abc123...\" \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"model\":\"auto\",\"messages\":[{\"role\":\"user\",\"content\":\"Hello!\"}]}'\n```\n\u003C\u002Fdetails>\n\nOpen `http:\u002F\u002Flocalhost:8000\u002F` for the dashboard — providers, routing, analytics, keys (Path A only).\n\n## Why?\n\n| | OrcaRouter Lite | LiteLLM library | OpenRouter | Ollama |\n|---|---|---|---|---|\n| Self-hosted server | ✓ | as a library | ✗ | ✓ |\n| OpenAI-compatible | ✓ | ✓ | ✓ | ✓ |\n| Multi-provider (OpenAI\u002FAnthropic\u002FGoogle\u002F…) | ✓ | ✓ | ✓ | ✗ |\n| Built-in dashboard | ✓ | ✗ | ✓ | ✗ |\n| `model=\"auto\"` (cheapest capable) | ✓ | ✗ | ✗ | n\u002Fa |\n| Streaming | ✓ | ✓ | ✓ | ✓ |\n| BYOK | ✓ | ✓ | ✗ | n\u002Fa |\n| Hosted-as-fallback | ✓ | ✗ | n\u002Fa | ✗ |\n| No Postgres \u002F no Redis required | ✓ | n\u002Fa | n\u002Fa | ✓ |\n\n## `model=\"auto\"` — the headline feature\n\nSend `model=\"auto\"` and OrcaRouter picks the **cheapest** model in your configured providers that meets the request's capability requirements (tools, vision, JSON mode). No manual routing rules; no rate-limit gymnastics; no `if x: ...` cost optimization in your code.\n\n```python\nclient.chat.completions.create(\n    model=\"auto\",\n    messages=[{\"role\": \"user\", \"content\": [\n        {\"type\": \"text\", \"text\": \"What's in this image?\"},\n        {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:...\"}},\n    ]}],\n)\n# → routes to the cheapest VISION-capable model your keys cover\n```\n\nThe resolved model is exposed back to callers via the `x-orca-resolved-model` response header so you can log\u002Fdisplay what was actually used.\n\n## Hosted as upstream (Lite + hosted)\n\nAlready running Lite? Set `ORCAROUTER_API_KEY` to your `sk-orca-*` from [www.orcarouter.ai](https:\u002F\u002Fwww.orcarouter.ai), and hosted becomes one more provider in the routing chain — covering models your local keys don't:\n\n```bash\n# .env\nORCAROUTER_API_KEY=sk-orca-hosted-abc...\n```\n\nUse cases:\n- **Try-before-you-buy** — no local provider keys needed\n- **Local logging** — hosted handles routing, Lite stores RequestLog rows for the dashboard\n- **Failover** — local providers fail, hosted is the safety net\n\n## Streaming\n\nOpenAI-compatible SSE format with the standard `data: ... \\n\\n` framing and a terminal `[DONE]` sentinel — drop-in for any SDK that already streams from OpenAI.\n\n```python\nfor chunk in client.chat.completions.create(\n    model=\"auto\",\n    messages=[{\"role\": \"user\", \"content\": \"Tell me a story\"}],\n    stream=True,\n):\n    print(chunk.choices[0].delta.content or \"\", end=\"\", flush=True)\n```\n\n## Model catalog\n\n100+ chat models are loaded at startup from [LiteLLM's community-maintained pricing database](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm\u002Fblob\u002Fmain\u002Fmodel_prices_and_context_window.json) — no model list to maintain manually. Each entry exposes:\n\n- `id` (e.g. `gpt-4o`, `claude-3-5-sonnet-latest`)\n- `provider` (mapped to your configured keys)\n- Capability flags: `supports_tools`, `supports_vision`, `supports_json_mode`\n- Per-token input\u002Foutput cost (drives the savings widget + `model=\"auto\"`)\n\n`GET \u002Fv1\u002Fmodels` returns the OpenAI-format catalogue.\n\n## Deploy somewhere else\n\n| Platform | One-click |\n|---|---|\n| Railway | [![Deploy on Railway](https:\u002F\u002Frailway.app\u002Fbutton.svg)](https:\u002F\u002Frailway.app\u002Fnew\u002Ftemplate) |\n| Fly.io | `fly launch --dockerfile Dockerfile` |\n| Render | Connect repo, root dir = `.` |\n| Bare Docker | `docker run -p 8000:8000 -e OPENAI_API_KEY=... ghcr.io\u002F...` (image coming soon) |\n\n## What's in the box\n\n- `POST \u002Fv1\u002Fchat\u002Fcompletions` — proxy + streaming + `model=\"auto\"` + cross-provider prompt cache\n- `GET  \u002Fv1\u002Fmodels` — discoverable model catalog (100+ models from `litellm.model_cost`)\n- `GET\u002FPUT\u002FDELETE \u002Fv1\u002Fproviders\u002F{provider}` — set \u002F list \u002F revoke encrypted provider keys\n- `GET\u002FPUT \u002Fv1\u002Frouting` — change strategy (`balanced` \u002F `cheapest` \u002F `fastest` \u002F `quality`)\n- `GET  \u002Fv1\u002Fanalytics\u002F{recent,spend,latency,savings,unreachable}` — local analytics, no telemetry leaves the box\n- `GET  \u002Fv1\u002Fhosted` — hosted-fallback status (drives the dashboard's \"Get $5 free credit\" card)\n- `GET\u002FPOST\u002FDELETE \u002Fv1\u002Fkeys\u002F...` — list \u002F rotate \u002F revoke API keys\n- Single-page dashboard at `\u002F`\n- SQLite by default; Postgres opt-in via `DATABASE_URL`; Redis optional\n\n### Cross-provider prompt cache\n\nDeterministic requests (`temperature=0` or pinned `seed`) are served from cache on repeat — works across **every** provider, not just Anthropic. Backend is Redis when `REDIS_URL` is set, in-process LRU otherwise. Cache hits return instantly with `x-orca-cache: HIT` and cost $0.\n\n```bash\n$ curl ... -d '{\"model\":\"auto\",\"messages\":[...], \"temperature\": 0}' -i\nHTTP\u002F1.1 200 OK\nx-orca-cache: MISS\nx-orca-resolved-model: gpt-4o-mini\n\n$ curl ...  # same payload again\nHTTP\u002F1.1 200 OK\nx-orca-cache: HIT          ← served from cache, no upstream call\n```\n\n### Savings widget\n\n`GET \u002Fv1\u002Fanalytics\u002Fsavings?baseline=gpt-4o&days=7` reports what your traffic would have cost on always-GPT-4 vs what it actually cost. The dashboard shows it as a tile.\n\n### Integrations\n\nDrop-in configs for [Continue.dev](.\u002Fintegrations\u002Fcontinue.json), [Aider](.\u002Fintegrations\u002Faider.md), [Cursor](.\u002Fintegrations\u002Fcursor.md), [LangChain](.\u002Fintegrations\u002Flangchain_orcarouter.py), [LlamaIndex](.\u002Fintegrations\u002Fllamaindex_orcarouter.py), [Vercel AI SDK](.\u002Fintegrations\u002Fvercel_ai.ts), and any tool that speaks the OpenAI Chat Completions protocol. See [`integrations\u002F`](.\u002Fintegrations\u002F).\n\n## What's deliberately not\n\nThis is the **single-workspace** edition. By design, no:\n- multi-tenancy, RBAC, SSO\n- billing, wallets, points, partner program\n- admin console, audit logs, trust & safety\n- multi-pod deployment \u002F Kubernetes\n- email \u002F Slack \u002F webhooks for alerts\n\nFor those, see the hosted product or the (forthcoming) Teams edition.\n\n## Testing\n\nBuilt test-first. Every behaviour shipped here had a failing test first.\n\n```bash\npip install -e \".[dev]\"\nPYTHONPATH=. pytest -v\n# 127 passed\n```\n\n| Slice | Tests | What |\n|---|---|---|\n| 1. Config | 5 | env loading, defaults, `env_provider_keys()` |\n| 2. Seed | 3 | bootstrap workspace + API key + RoutingConfig, idempotent |\n| 3. Auth middleware | 4 | bearer-token validation, 401 on missing\u002Finvalid |\n| 4. App factory | 3 | \u002Fhealth, error envelope, \u002Fv1\u002F* gating |\n| 5. Provider keys CRUD | 5 | encrypted at rest, plaintext never round-trips |\n| 6. Router cache | 13 | env+DB+hosted deployment assembly with precedence |\n| 7. Chat completion | 5 | OpenAI format, RequestLog, validation |\n| 8. Analytics | 4 | recent \u002F spend \u002F latency p50\u002Fp99 |\n| 9. \u002Fv1\u002F{models,keys,routing} | 8 | list\u002Fcreate\u002Frevoke + strategy update |\n| 10. Streaming | 4 | SSE format, `[DONE]` sentinel, log writeback |\n| 11. Catalog | 7 | 100+ models, capability flags, pricing |\n| 12. `model=\"auto\"` | 21 | capability detection, cheapest-meeting-needs (unit + integration) |\n| 13. Cost savings | 9 | savings vs always-GPT-4 baseline + hosted-auto comparison |\n| 14. Prompt cache | 15 | cross-provider exact-match cache + chat integration |\n| 15. Benchmark | 4 | summarize() + render_markdown() aggregation |\n| 16. Hosted status | 7 | `\u002Fv1\u002Fhosted` config-source + signup-URL surface |\n| 17. Hosted-auto savings | 3 | `_hosted_auto_savings` edge cases on synthetic catalogs |\n| 18. Unreachable models | 7 | \"models you can't reach\" tile clears when hosted is on |\n| **Total** | **127** | |\n\n## Architecture\n\n```\napp\u002F\n├── main.py             FastAPI factory + lifespan + SPA mount\n├── config.py           Settings (~15 fields)\n├── deps.py             DI helpers\n├── seed.py             First-run bootstrap\n├── auto_routing.py     model=\"auto\" capability + cost scoring\n├── router_cache.py     Single-workspace router\n├── prompt_cache.py     Cross-provider exact-match cache (Redis or in-memory LRU)\n├── schemas.py          OpenAI-compatible request schema\n├── middleware\u002Fauth.py  sk-orca-* validation\n└── routes\u002F\n    ├── chat.py         \u002Fv1\u002Fchat\u002Fcompletions  (blocking + streaming)\n    ├── models.py       \u002Fv1\u002Fmodels\n    ├── providers.py    BYOK CRUD\n    ├── routing.py      strategy config\n    ├── analytics.py    recent \u002F spend \u002F latency \u002F savings \u002F unreachable\n    ├── keys.py         list \u002F rotate \u002F revoke API keys\n    ├── hosted.py       \u002Fv1\u002Fhosted — hosted-fallback status for the dashboard\n    └── health.py\n\npackages\u002F\n├── litellm_adapter\u002F    Router wrapper + 100+ model catalog\n├── auth\u002F               hashing + AES-256-GCM\n└── db\u002F                 models + engine + session\n```\n\n## Roadmap\n\n- [x] OpenAI-compatible chat completions\n- [x] Streaming (SSE)\n- [x] `model=\"auto\"` cheapest-capable routing\n- [x] Hosted-as-upstream\n- [x] Encrypted BYOK at rest\n- [x] Local analytics dashboard\n- [x] CI (GitHub Actions)\n- [x] Cross-provider prompt caching\n- [x] Continue.dev \u002F Aider \u002F LangChain \u002F Cursor \u002F Vercel AI SDK integrations\n- [x] Public benchmark + savings claim\n- [ ] Embeddings + image-gen proxy\n\nSee [DEMO.md](.\u002FDEMO.md) for the failover demo.\n\n## License\n\nMIT. See [LICENSE](.\u002FLICENSE).\n","OrcaRouter Lite 是一个自托管的大型语言模型路由工具，带有管理的安全网，并且兼容OpenAI。其核心功能包括支持自带密钥（BYOK）、单一工作区管理和流式处理能力，能够自动选择最佳模型 (`model=\"auto\"`)。该项目使用Python开发，拥有MIT许可证，适用于需要在本地环境或产品中集成多模型访问而不希望直接管理各个模型API密钥的场景。此外，对于更高级别的路由需求，用户可以选择使用官方提供的托管服务。",2,"2026-06-11 03:32:02","CREATED_QUERY"]