[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-797":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},797,"freellmapi","tashfeenahmed\u002Ffreellmapi","tashfeenahmed","OpenAI-compatible proxy that stacks the free tiers of 16 LLM providers (~1.7B tokens\u002Fmonth) behind one \u002Fv1 endpoint — plus any custom OpenAI-compatible endpoint. Smart routing, automatic failover, encrypted keys. Personal experimentation only.","https:\u002F\u002Ftashfeenahmed.github.io\u002Ffreellmapi\u002F",null,"TypeScript",9346,1504,41,17,0,1327,2591,8462,3981,40.53,"MIT License",false,"main",true,[],"2026-06-12 02:00:18","\u003Cdiv align=\"center\">\n\n# FreeLLMAPI\n\n**One OpenAI-compatible endpoint. Eleven free LLM providers. ~1B+ tokens per month.**\n\nAggregate the free tiers from Google, Groq, Cerebras, SambaNova, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, and Z.ai (Zhipu) behind a single `\u002Fv1\u002Fchat\u002Fcompletions` endpoint. Keys are stored encrypted. A router picks the best available model for each request, falls over to the next provider when one is rate-limited, and tracks per-key usage so you stay under every free-tier cap.\n\n[![CI](https:\u002F\u002Fgithub.com\u002Ftashfeenahmed\u002Ffreellmapi\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Ftashfeenahmed\u002Ffreellmapi\u002Factions\u002Fworkflows\u002Fci.yml)\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-green.svg)](.\u002FLICENSE)\n[![PRs Welcome](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen.svg)](#contributing)\n\n![Fallback chain with per-provider token budget](repo-assets\u002Ffallback-chain.png)\n\n\u003C\u002Fdiv>\n\n---\n\n## Contents\n\n- [Why this exists](#why-this-exists)\n- [Supported providers](#supported-providers)\n- [Features](#features)\n- [Not yet supported](#not-yet-supported)\n- [Quick start](#quick-start)\n- [Using the API](#using-the-api)\n- [Screenshots](#screenshots)\n- [How it works](#how-it-works)\n- [Limitations](#limitations)\n- [Contributing](#contributing)\n- [Terms of Service review](#terms-of-service-review)\n- [Disclaimer](#disclaimer)\n\n## Why this exists\n\nEvery serious AI lab now offers a free tier — a few million tokens a month, a few thousand requests a day. On its own each tier is a toy. Stacked together, they add up to roughly **1.3 billion tokens per month** of working inference capacity, across dozens of models from small-and-fast to reasonably capable.\n\nThe problem is that stacking them by hand is painful: fourteen different SDKs, fourteen different rate limits, fourteen places a request can fail. FreeLLMAPI collapses that into one OpenAI-compatible endpoint. Point any OpenAI client library at your local server, and it routes transparently across whichever providers you've added keys for.\n\n## Supported providers\n\n\u003Ctable>\n\u003Ctr>\n\u003Ctd align=\"center\" width=\"180\">\u003Ca href=\"https:\u002F\u002Fai.google.dev\">\u003Cb>Google\u003C\u002Fb>\u003Cbr\u002F>Gemini 2.5 Flash · 3.x previews\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\" width=\"180\">\u003Ca href=\"https:\u002F\u002Fgroq.com\">\u003Cb>Groq\u003C\u002Fb>\u003Cbr\u002F>Llama 3.3, Llama 4, GPT-OSS, Qwen3\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\" width=\"180\">\u003Ca href=\"https:\u002F\u002Fcerebras.ai\">\u003Cb>Cerebras\u003C\u002Fb>\u003Cbr\u002F>Qwen3 235B\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\" width=\"180\">\u003Ca href=\"https:\u002F\u002Fcloud.sambanova.ai\">\u003Cb>SambaNova\u003C\u002Fb>\u003Cbr\u002F>DeepSeek V3.x · Llama 4 · Gemma 3\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fmistral.ai\">\u003Cb>Mistral\u003C\u002Fb>\u003Cbr\u002F>Large 3 · Medium 3.5 · Codestral · Devstral\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fopenrouter.ai\">\u003Cb>OpenRouter\u003C\u002Fb>\u003Cbr\u002F>19 free-tier models\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmarketplace\u002Fmodels\">\u003Cb>GitHub Models\u003C\u002Fb>\u003Cbr\u002F>GPT-4.1 · GPT-4o\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fdevelopers.cloudflare.com\u002Fworkers-ai\">\u003Cb>Cloudflare\u003C\u002Fb>\u003Cbr\u002F>Kimi K2 · GLM-4.7 · GPT-OSS · Granite 4\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fcohere.com\">\u003Cb>Cohere\u003C\u002Fb>\u003Cbr\u002F>Command R+ · Command-A (trial)\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fdocs.z.ai\">\u003Cb>Z.ai (Zhipu)\u003C\u002Fb>\u003Cbr\u002F>GLM-4.5 · GLM-4.7 Flash\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fbuild.nvidia.com\">\u003Cb>NVIDIA\u003C\u002Fb>\u003Cbr\u002F>NIM (disabled by default)\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ci>Adding another? See \u003Ca href=\"#contributing\">Contributing\u003C\u002Fa>.\u003C\u002Fi>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## Features\n\n- **OpenAI-compatible** — `POST \u002Fv1\u002Fchat\u002Fcompletions` and `GET \u002Fv1\u002Fmodels` work with the official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, Continue, Hermes, etc.). Just change `base_url`.\n- **Streaming and non-streaming** — Server-Sent Events for `stream: true`, JSON response otherwise. Every provider adapter implements both.\n- **Tool calling** — OpenAI-style `tools` \u002F `tool_choice` requests are passed through, and assistant `tool_calls` + `tool` role follow-up messages round-trip across providers.\n- **Automatic fallover** — If the chosen provider returns a 429, 5xx, or times out, the router skips it, puts the key on a short cooldown, and retries on the next model in your fallback chain (up to 20 attempts).\n- **Per-key rate tracking** — RPM, RPD, TPM, and TPD counters per `(platform, model, key)` so the router always picks a key that's under its caps.\n- **Sticky sessions** — Multi-turn conversations keep talking to the same model for 30 minutes to avoid the hallucination spike that comes from mid-conversation model switches.\n- **Encrypted key storage** — API keys are encrypted with AES-256-GCM before hitting SQLite; decryption happens in-memory just before a request.\n- **Unified API key** — Clients authenticate to your proxy with a single `freellmapi-…` bearer token. You never expose upstream provider keys to your apps.\n- **Health checks** — Periodic probes mark keys as `healthy`, `rate_limited`, `invalid`, or `error` so the router skips dead ones automatically.\n- **Admin dashboard** — React + Vite UI to manage keys, reorder the fallback chain, inspect analytics, and run prompts in a playground. Dark mode included.\n- **Analytics** — Per-request logging with latency, token counts, success rate, and per-provider breakdowns.\n- **Deploys to a Raspberry Pi** — Runs happily on a Pi 4 under PM2 behind nginx. ~40 MB RSS at idle.\n\n## Not yet supported\n\nThe scope is deliberately narrow. If a feature isn't on this list and isn't below, assume it isn't there yet.\n\n- **Embeddings** (`\u002Fv1\u002Fembeddings`)\n- **Image generation** (`\u002Fv1\u002Fimages\u002F*`)\n- **Audio \u002F speech** (`\u002Fv1\u002Faudio\u002F*`)\n- **Vision \u002F multimodal inputs** — message content is text-only\n- **Legacy completions** (`\u002Fv1\u002Fcompletions`) — only the chat endpoint is implemented\n- **Moderation** (`\u002Fv1\u002Fmoderations`)\n- **`n > 1`** (multiple completions per request)\n- **Per-user billing \u002F multi-tenant auth** — single-user by design\n\nPRs that add any of these are very welcome. See [Contributing](#contributing).\n\n## Quick start\n\n**Prerequisites:** Node.js 20+, npm.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftashfeenahmed\u002Ffreellmapi.git\ncd freellmapi\nnpm install\n\n# Generate an encryption key for at-rest key storage\ncp .env.example .env\necho \"ENCRYPTION_KEY=$(node -e \"console.log(require('crypto').randomBytes(32).toString('hex'))\")\" >> .env\n\n# Start server + dashboard together\nnpm run dev\n```\n\nOpen http:\u002F\u002Flocalhost:5173 (the Vite dev UI), add your provider keys on the **Keys** page, reorder the **Fallback Chain** to taste, and grab your unified API key from the **Keys** page header. That unified key is what you point your OpenAI SDK at.\n\nFor a production build:\n\n```bash\nnpm run build\nnode server\u002Fdist\u002Findex.js     # server + dashboard both served on :3001\n```\n\n## Using the API\n\nAny OpenAI-compatible client works. Examples:\n\n**Python**\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(\n    base_url=\"http:\u002F\u002Flocalhost:3001\u002Fv1\",\n    api_key=\"freellmapi-your-unified-key\",\n)\n\nresp = client.chat.completions.create(\n    model=\"auto\",  # let the router pick; or specify e.g. \"gemini-2.5-flash\"\n    messages=[{\"role\": \"user\", \"content\": \"Summarise the fall of Rome in one sentence.\"}],\n)\nprint(resp.choices[0].message.content)\nprint(\"Routed via:\", resp.headers.get(\"x-routed-via\"))\n```\n\n**curl**\n\n```bash\ncurl http:\u002F\u002Flocalhost:3001\u002Fv1\u002Fchat\u002Fcompletions \\\n  -H \"Authorization: Bearer freellmapi-your-unified-key\" \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\n    \"model\": \"auto\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"hi\"}]\n  }'\n```\n\n**Streaming**\n\n```python\nstream = client.chat.completions.create(\n    model=\"auto\",\n    messages=[{\"role\": \"user\", \"content\": \"Stream me a haiku about SQLite.\"}],\n    stream=True,\n)\nfor chunk in stream:\n    print(chunk.choices[0].delta.content or \"\", end=\"\", flush=True)\n```\n\n**Tool calling**\n\nPass OpenAI-style `tools` and `tool_choice`; the assistant response round-trips back through the proxy exactly like the OpenAI API. Multi-step flows (assistant `tool_calls` → `tool` role follow-up → final answer) work across every provider the router can reach.\n\n```python\ntools = [{\n    \"type\": \"function\",\n    \"function\": {\n        \"name\": \"get_weather\",\n        \"description\": \"Get current weather for a city.\",\n        \"parameters\": {\n            \"type\": \"object\",\n            \"properties\": {\"city\": {\"type\": \"string\"}},\n            \"required\": [\"city\"],\n        },\n    },\n}]\n\n# 1. Model asks for a tool call\nfirst = client.chat.completions.create(\n    model=\"auto\",\n    messages=[{\"role\": \"user\", \"content\": \"What's the weather in Karachi?\"}],\n    tools=tools,\n    tool_choice=\"required\",\n)\ncall = first.choices[0].message.tool_calls[0]\n\n# 2. You execute the tool, feed the result back\nfinal = client.chat.completions.create(\n    model=\"auto\",\n    messages=[\n        {\"role\": \"user\", \"content\": \"What's the weather in Karachi?\"},\n        first.choices[0].message,\n        {\"role\": \"tool\", \"tool_call_id\": call.id, \"content\": '{\"temp_c\": 32, \"cond\": \"sunny\"}'},\n    ],\n    tools=tools,\n)\nprint(final.choices[0].message.content)\n```\n\nWorks with `stream=True` as well — you'll get `delta.tool_calls` chunks followed by a `finish_reason: \"tool_calls\"` close. Under the hood, OpenAI-compatible providers (Groq, Cerebras, SambaNova, Mistral, OpenRouter, GitHub Models, HuggingFace, Cloudflare, Cohere compat) get the request passed through; Gemini requests get translated into Google's `functionDeclarations` \u002F `functionResponse` shape and the response is translated back.\n\nEvery response carries an `X-Routed-Via: \u003Cplatform>\u002F\u003Cmodel>` header so you can see which provider actually served each call. If a request fell over between providers, you'll also see `X-Fallback-Attempts: N`.\n\n## Screenshots\n\n### Keys\n\nManage provider credentials and grab the unified API key your apps connect with. Each key shows a status dot and when it was last health-checked.\n\n![Keys page](repo-assets\u002Fkeys.png)\n\n### Playground\n\nSend a chat completion through the router and see which provider served it, with the model ID and latency printed right on the message.\n\n![Playground page](repo-assets\u002Fplayground.png)\n\n### Analytics\n\nRequest volume, success rate, tokens in and out, average latency, and per-provider breakdowns over 24h \u002F 7d \u002F 30d windows.\n\n![Analytics page](repo-assets\u002Fanalytics.png)\n\n## How it works\n\n```\n┌──────────────────┐   Bearer freellmapi-…   ┌─────────────────────────┐\n│  OpenAI SDK \u002F    │ ──────────────────────▶ │  Express proxy (:3001)  │\n│  curl \u002F any      │ ◀────────────────────── │  \u002Fv1\u002Fchat\u002Fcompletions   │\n│  OpenAI client   │      streamed tokens    └────────────┬────────────┘\n└──────────────────┘                                      │\n                                                          ▼\n                             ┌────────────────────────────────────────────────┐\n                             │  Router                                        │\n                             │   1. Pick highest-priority model that          │\n                             │      (a) has a healthy key and                 │\n                             │      (b) is under all its rate limits.         │\n                             │   2. Decrypt key, call provider SDK.           │\n                             │   3. On 429\u002F5xx → cooldown + retry next model. │\n                             └────────────────────────────────────────────────┘\n                                          │\n   ┌──────────────┬────────────┬──────────┴─────────┬─────────────┬──────────┐\n   ▼              ▼            ▼                    ▼             ▼          ▼\n Google         Groq        Cerebras           OpenRouter        HF       …10 more\n```\n\n- **Router** (`server\u002Fsrc\u002Fservices\u002Frouter.ts`) — picks a model per request.\n- **Rate-limit ledger** (`server\u002Fsrc\u002Fservices\u002Fratelimit.ts`) — in-memory RPM\u002FRPD\u002FTPM\u002FTPD counters backed by SQLite, with cooldowns on 429s.\n- **Provider adapters** (`server\u002Fsrc\u002Fproviders\u002F*.ts`) — one file per provider, implementing the `Provider` base class: `chatCompletion()` and `streamChatCompletion()`.\n- **Health service** (`server\u002Fsrc\u002Fservices\u002Fhealth.ts`) — periodic probe keeps key status fresh.\n- **Dashboard** (`client\u002F`) — React + Vite + shadcn\u002Fui admin surface.\n- **Storage** — SQLite (`better-sqlite3`) with AES-256-GCM envelope encryption for keys.\n\n## Limitations\n\nStacking free tiers has real trade-offs. Be honest with yourself about them:\n\n- **No frontier models.** The free-tier catalog tops out around Llama 3.3 70B, GLM-4.5, Qwen 3 Coder, and Gemini 2.5 Pro. You will not get GPT-5 or Claude Opus class reasoning through this. For hard problems, pay for a real API.\n- **Intelligence degrades as the day progresses.** Your top-ranked models (usually Gemini 2.5 Pro, GPT-4o via GitHub Models) have the lowest daily caps. Once they hit their limits, the router falls down your priority chain to smaller\u002Fweaker models. Expect the effective intelligence of the endpoint to drop in the late hours of each day — then reset at UTC midnight.\n- **Latency is highly variable.** Cerebras and Groq are extremely fast; others are not. You get whichever one is available.\n- **Free tiers can change without notice.** Providers regularly tighten, loosen, or remove free tiers. When that happens you'll see 429s or auth errors until you update the catalog. Re-seed scripts live in `server\u002Fsrc\u002Fscripts\u002F`.\n- **No SLA, by definition.** If you need reliability, use a paid provider with a contract.\n- **Local-first.** There's no multi-tenant auth. Run this for yourself; don't expose it to the internet.\n\n## Contributing\n\nContributors very welcome! Good first PRs:\n\n- **Add a provider** — copy `server\u002Fsrc\u002Fproviders\u002Fopenai-compat.ts` as a template, wire it into `server\u002Fsrc\u002Fproviders\u002Findex.ts`, seed its models in `server\u002Fsrc\u002Fdb\u002Findex.ts`, add a test in `server\u002Fsrc\u002F__tests__\u002Fproviders\u002F`.\n- **Add an endpoint** — embeddings, images, moderations. The provider base class can grow new methods; adapters declare which they support.\n- **Improve the router** — cost-aware routing (cheapest-healthy-fastest tradeoffs), better latency-weighted priority, regional pinning.\n- **Dashboard polish** — charts on the Analytics page, key rotation UX, batch import of keys from `.env`.\n- **Docs** — more examples, client library snippets for Go\u002FRust\u002Fetc., a deployment recipe for Docker or Fly.\n\n**Development loop:**\n\n```bash\nnpm install\nnpm run dev      # server on :3001, dashboard on :5173, both with HMR\nnpm test         # vitest — 75 tests across providers, routes, router, ratelimit\n```\n\nPRs should include a test, keep the existing test suite green, and match the `.editorconfig` \u002F tsconfig defaults already in the repo. Issues and discussions are open.\n\n### Contributors\n\nThanks to everyone who's helped improve FreeLLMAPI:\n\n- [@moaaz12-web](https:\u002F\u002Fgithub.com\u002Fmoaaz12-web) — tool-calling support across providers (#3)\n- [@lukasulc](https:\u002F\u002Fgithub.com\u002Flukasulc) — better-sqlite3 bump to fix npm install on Node 24+ (#12)\n- [@VinhPhamAI](https:\u002F\u002Fgithub.com\u002FVinhPhamAI) — root `.env` PORT now propagates to server + Vite dev proxy + UI base URL (#27)\n\n## Terms of Service review\n\nA self-hosted, single-user, personal-use setup was re-reviewed against each provider's ToS (May 2026). Summary:\n\n| Provider | Verdict | Notes |\n|---|---|---|\n| Google Gemini | ⚠️ Caution | March 2026 ToS narrows scope to *\"professional or business purposes, not for consumer use\"* — a self-hosted developer proxy is still defensible, but the clause is new. |\n| Groq | ✅ Likely OK | GroqCloud Services Agreement permits Customer Application integration. |\n| Cerebras | ✅ Likely OK | Permitted; explicitly forbids selling\u002Ftransferring API keys. |\n| Mistral | ✅ Likely OK | APIs allowed for personal\u002Finternal business use. |\n| OpenRouter | ✅ Likely OK | April 2026 ToS sharpens the no-resale \u002F no-competing-service clause; private single-user proxy still fine. |\n| SambaNova | ⚠️ Ambiguous | EULA §1.5(c) blocks resale and \"service bureau\" use; single-user with no third-party access is fine. |\n| Cloudflare Workers AI | ⚠️ Ambiguous | No anti-proxy clause; covered by general Self-Serve Subscription Agreement. |\n| NVIDIA NIM | ⚠️ Caution | Trial ToS §1.2 \u002F §1.4: *\"evaluation only, not production.\"* Disabled in default catalog. |\n| GitHub Models | ⚠️ Caution | Free tier explicitly scoped to *\"experimentation\"* and *\"prototyping.\"* |\n| Cohere | ❌ Avoid | Terms §14 still forbids *\"personal, family or household purposes.\"* |\n| Zhipu (open.bigmodel.cn) | ✅ Likely OK | Personal\u002Fnon-commercial research carve-out still in the platform docs. |\n| Z.ai (api.z.ai) | ⚠️ Caution | New row — Singapore entity (distinct from Zhipu CN). §III.3(l) anti-traffic-redirect clause could plausibly be read against a proxy; no explicit personal-use carve-out. |\n| Ollama Cloud | ✅ Likely OK | New row — Free plan permits cloud-model access (1 concurrent, 5-hour session caps). No anti-proxy \u002F anti-resale clauses found. *(Integration tracked in #14.)* |\n\nRules of thumb that keep most providers happy: **one account per provider**, **no reselling**, **no sharing your endpoint with other humans**, **don't hammer a free tier as a paid production backend**. This is informational, not legal advice — read each provider's ToS and make your own call.\n\nRemoved since the April 2026 review: Hugging Face, Moonshot, and MiniMax direct integrations were dropped from the catalog (HF — tool-call format issues; Moonshot — moved to paid only; MiniMax — superseded by the OpenRouter `minimax\u002Fminimax-m2.5:free` route).\n\n## Disclaimer\n\n**This project is for personal experimentation and learning, not production.** Free tiers exist so developers can prototype against them; they aren't a stable, supported inference substrate and shouldn't be treated as one. If you build something real on top of FreeLLMAPI, swap in a paid API before you ship. Your relationship with each upstream provider is governed by the terms you accepted when you created your account — those terms still apply when the traffic is proxied through this project, and you're responsible for complying with them.\n\n## License\n\n[MIT](.\u002FLICENSE)\n","FreeLLMAPI 是一个兼容OpenAI的代理服务，它聚合了来自约14家AI提供商的免费层级密钥，并具备自动故障转移功能。该项目使用TypeScript编写，支持多种主流AI模型如Google的Gemini、Groq的Llama系列等，通过统一的`\u002Fv1\u002Fchat\u002Fcompletions`接口提供服务。其核心特性包括加密存储密钥、智能路由选择最佳可用模型以应对请求、当某一服务商达到速率限制时自动切换至下一个供应商，并跟踪每个密钥的使用情况确保不超出免费限额。适用于个人实验场景下的自然语言处理任务探索与开发。",2,"2026-06-11 02:39:24","CREATED_QUERY"]