[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-70953":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":44,"readmeContent":45,"aiSummary":46,"trendingCount":16,"starSnapshotCount":16,"syncStatus":47,"lastSyncTime":48,"discoverSource":49},70953,"OpenLLM","bentoml\u002FOpenLLM","bentoml","Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.","https:\u002F\u002Fbentoml.com",null,"Python",12352,815,82,6,0,5,10,34,15,87.14,"Apache License 2.0",false,"main",true,[7,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43],"fine-tuning","llama","llama2","llama3-1","llama3-2","llama3-2-vision","llm","llm-inference","llm-ops","llm-serving","llmops","mistral","mlops","model-inference","open-source-llm","openllm","vicuna","2026-06-12 04:00:58","\u003Cdiv align=\"center\">\n\n\u003Ch1>🦾 OpenLLM: Self-Hosting LLMs Made Easy\u003C\u002Fh1>\n\n[![License: Apache-2.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202-green.svg)](https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\u002Fblob\u002Fmain\u002FLICENSE)\n[![Releases](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fopenllm.svg?logo=pypi&label=PyPI&logoColor=gold)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fopenllm)\n[![CI](https:\u002F\u002Fresults.pre-commit.ci\u002Fbadge\u002Fgithub\u002Fbentoml\u002FOpenLLM\u002Fmain.svg)](https:\u002F\u002Fresults.pre-commit.ci\u002Flatest\u002Fgithub\u002Fbentoml\u002FOpenLLM\u002Fmain)\n[![X](https:\u002F\u002Fbadgen.net\u002Fbadge\u002Ficon\u002F@bentomlai\u002F000000?icon=twitter&label=Follow)](https:\u002F\u002Ftwitter.com\u002Fbentomlai)\n[![Community](https:\u002F\u002Fbadgen.net\u002Fbadge\u002Ficon\u002FCommunity\u002F562f5d?icon=slack&label=Join)](https:\u002F\u002Fl.bentoml.com\u002Fjoin-slack)\n\n\u003C\u002Fdiv>\n\nOpenLLM allows developers to run **any open-source LLMs** (Llama 3.3, Qwen2.5, Phi3 and [more](#supported-models)) or **custom models** as **OpenAI-compatible APIs** with a single command. It features a [built-in chat UI](#chat-ui), state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Docker, Kubernetes, and [BentoCloud](#deploy-to-bentocloud).\n\nUnderstand the [design philosophy of OpenLLM](https:\u002F\u002Fwww.bentoml.com\u002Fblog\u002Ffrom-ollama-to-openllm-running-llms-in-the-cloud).\n\n## Get Started\n\nRun the following commands to install OpenLLM and explore it interactively.\n\n```bash\npip install openllm  # or pip3 install openllm\nopenllm hello\n```\n\n![hello](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F5af19f23-1b34-4c45-b1e0-a6798b4586d1)\n\n## Supported models\n\nOpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also add a [model repository to run custom models](#set-up-a-custom-repository) with OpenLLM.\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Cth>Model\u003C\u002Fth>\n    \u003Cth>Parameters\u003C\u002Fth>\n    \u003Cth>Required GPU\u003C\u002Fth>\n    \u003Cth>Start a Server\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>deepseek\u003C\u002Ftd>\n    \u003Ctd>r1-671b\u003C\u002Ftd>\n    \u003Ctd>80Gx16\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve deepseek:r1-671b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>gemma2\u003C\u002Ftd>\n    \u003Ctd>2b\u003C\u002Ftd>\n    \u003Ctd>12G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve gemma2:2b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>gemma3\u003C\u002Ftd>\n    \u003Ctd>3b\u003C\u002Ftd>\n    \u003Ctd>12G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve gemma3:3b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>jamba1.5\u003C\u002Ftd>\n    \u003Ctd>mini-ff0a\u003C\u002Ftd>\n    \u003Ctd>80Gx2\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve jamba1.5:mini-ff0a\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>llama3.1\u003C\u002Ftd>\n    \u003Ctd>8b\u003C\u002Ftd>\n    \u003Ctd>24G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve llama3.1:8b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>llama3.2\u003C\u002Ftd>\n    \u003Ctd>1b\u003C\u002Ftd>\n    \u003Ctd>24G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve llama3.2:1b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>llama3.3\u003C\u002Ftd>\n    \u003Ctd>70b\u003C\u002Ftd>\n    \u003Ctd>80Gx2\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve llama3.3:70b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>llama4\u003C\u002Ftd>\n    \u003Ctd>17b16e\u003C\u002Ftd>\n    \u003Ctd>80Gx8\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve llama4:17b16e\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>mistral\u003C\u002Ftd>\n    \u003Ctd>8b-2410\u003C\u002Ftd>\n    \u003Ctd>24G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve mistral:8b-2410\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>mistral-large\u003C\u002Ftd>\n    \u003Ctd>123b-2407\u003C\u002Ftd>\n    \u003Ctd>80Gx4\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve mistral-large:123b-2407\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>phi4\u003C\u002Ftd>\n    \u003Ctd>14b\u003C\u002Ftd>\n    \u003Ctd>80G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve phi4:14b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>pixtral\u003C\u002Ftd>\n    \u003Ctd>12b-2409\u003C\u002Ftd>\n    \u003Ctd>80G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve pixtral:12b-2409\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>qwen2.5\u003C\u002Ftd>\n    \u003Ctd>7b\u003C\u002Ftd>\n    \u003Ctd>24G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve qwen2.5:7b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>qwen2.5-coder\u003C\u002Ftd>\n    \u003Ctd>3b\u003C\u002Ftd>\n    \u003Ctd>24G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve qwen2.5-coder:3b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>qwq\u003C\u002Ftd>\n    \u003Ctd>32b\u003C\u002Ftd>\n    \u003Ctd>80G\u003C\u002Ftd>\n    \u003Ctd>\u003Ccode>openllm serve qwq:32b\u003C\u002Fcode>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\nFor the full model list, see the [OpenLLM models repository](https:\u002F\u002Fgithub.com\u002Fbentoml\u002Fopenllm-models).\n\n## Start an LLM server\n\nTo start an LLM server locally, use the `openllm serve` command and specify the model version.\n\n> [!NOTE]\n> OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models.\n>\n> 1. Create your Hugging Face token [here](https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftokens).\n> 2. Request access to the gated model, such as [meta-llama\u002FLlama-3.2-1B-Instruct](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-3.2-1B-Instruct).\n> 3. Set your token as an environment variable by running:\n>    ```bash\n>    export HF_TOKEN=\u003Cyour token>\n>    ```\n\n```bash\nopenllm serve llama3.2:1b\n```\n\nThe server will be accessible at [http:\u002F\u002Flocalhost:3000](http:\u002F\u002Flocalhost:3000\u002F), providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following:\n\n- **The API host address**: By default, the LLM is hosted at [http:\u002F\u002Flocalhost:3000](http:\u002F\u002Flocalhost:3000\u002F).\n- **The model name:** The name can be different depending on the tool you use.\n- **The API key**: The API key used for client authentication. This is optional.\n\nHere are some examples:\n\n\u003Cdetails>\n\n\u003Csummary>OpenAI Python client\u003C\u002Fsummary>\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url='http:\u002F\u002Flocalhost:3000\u002Fv1', api_key='na')\n\n# Use the following func to get the available models\n# model_list = client.models.list()\n# print(model_list)\n\nchat_completion = client.chat.completions.create(\n    model=\"meta-llama\u002FLlama-3.2-1B-Instruct\",\n    messages=[\n        {\n            \"role\": \"user\",\n            \"content\": \"Explain superconductors like I'm five years old\"\n        }\n    ],\n    stream=True,\n)\nfor chunk in chat_completion:\n    print(chunk.choices[0].delta.content or \"\", end=\"\")\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n\u003Csummary>LlamaIndex\u003C\u002Fsummary>\n\n```python\nfrom llama_index.llms.openai import OpenAI\n\nllm = OpenAI(api_bese=\"http:\u002F\u002Flocalhost:3000\u002Fv1\", model=\"meta-llama\u002FLlama-3.2-1B-Instruct\", api_key=\"dummy\")\n...\n```\n\n\u003C\u002Fdetails>\n\n## Chat UI\n\nOpenLLM provides a chat UI at the `\u002Fchat` endpoint for the launched LLM server at http:\u002F\u002Flocalhost:3000\u002Fchat.\n\n\u003Cimg width=\"800\" alt=\"openllm_ui\" src=\"https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\u002Fassets\u002F5886138\u002F8b426b2b-67da-4545-8b09-2dc96ff8a707\">\n\n## Chat with a model in the CLI\n\nTo start a chat conversation in the CLI, use the `openllm run` command and specify the model version.\n\n```bash\nopenllm run llama3:8b\n```\n\n## Model repository\n\nA model repository in OpenLLM represents a catalog of available LLMs that you can run. OpenLLM provides a default model repository that includes the latest open-source LLMs like Llama 3, Mistral, and Qwen2, hosted at [this GitHub repository](https:\u002F\u002Fgithub.com\u002Fbentoml\u002Fopenllm-models). To see all available models from the default and any added repository, use:\n\n```bash\nopenllm model list\n```\n\nTo ensure your local list of models is synchronized with the latest updates from all connected repositories, run:\n\n```bash\nopenllm repo update\n```\n\nTo review a model’s information, run:\n\n```bash\nopenllm model get llama3.2:1b\n```\n\n### Add a model to the default model repository\n\nYou can contribute to the default model repository by adding new models that others can use. This involves creating and submitting a Bento of the LLM. For more information, check out this [example pull request](https:\u002F\u002Fgithub.com\u002Fbentoml\u002Fopenllm-models\u002Fpull\u002F1).\n\n### Set up a custom repository\n\nYou can add your own repository to OpenLLM with custom models. To do so, follow the format in the default OpenLLM model repository with a `bentos` directory to store custom LLMs. You need to [build your Bentos with BentoML](https:\u002F\u002Fdocs.bentoml.com\u002Fen\u002Flatest\u002Fguides\u002Fbuild-options.html) and submit them to your model repository.\n\nFirst, prepare your custom models in a `bentos` directory following the guidelines provided by [BentoML to build Bentos](https:\u002F\u002Fdocs.bentoml.com\u002Fen\u002Flatest\u002Fguides\u002Fbuild-options.html). Check out the [default model repository](https:\u002F\u002Fgithub.com\u002Fbentoml\u002Fopenllm-repo) for an example and read the [Developer Guide](https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\u002Fblob\u002Fmain\u002FDEVELOPMENT.md) for details.\n\nThen, register your custom model repository with OpenLLM:\n\n```bash\nopenllm repo add \u003Crepo-name> \u003Crepo-url>\n```\n\n**Note**: Currently, OpenLLM only supports adding public repositories.\n\n## Deploy to BentoCloud\n\nOpenLLM supports LLM cloud deployment via BentoML, the unified model serving framework, and BentoCloud, an AI inference platform for enterprise AI teams. BentoCloud provides fully-managed infrastructure optimized for LLM inference with autoscaling, model orchestration, observability, and many more, allowing you to run any AI model in the cloud.\n\n[Sign up for BentoCloud](https:\u002F\u002Fwww.bentoml.com\u002F) for free and [log in](https:\u002F\u002Fdocs.bentoml.com\u002Fen\u002Flatest\u002Fbentocloud\u002Fhow-tos\u002Fmanage-access-token.html). Then, run `openllm deploy` to deploy a model to BentoCloud:\n\n```bash\nopenllm deploy llama3.2:1b --env HF_TOKEN\n```\n\n> [!NOTE]\n> If you are deploying a gated model, make sure to set HF_TOKEN in enviroment variables.\n\nOnce the deployment is complete, you can run model inference on the BentoCloud console:\n\n\u003Cimg width=\"800\" alt=\"bentocloud_ui\" src=\"https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\u002Fassets\u002F65327072\u002F4f7819d9-73ea-488a-a66c-f724e5d063e6\">\n\n## Community\n\nOpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy to use 👉 [Join our Slack community!](https:\u002F\u002Fl.bentoml.com\u002Fjoin-slack)\n\n## Contributing\n\nAs an open-source project, we welcome contributions of all kinds, such as new features, bug fixes, and documentation. Here are some of the ways to contribute:\n\n- Repost a bug by [creating a GitHub issue](https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\u002Fissues\u002Fnew\u002Fchoose).\n- [Submit a pull request](https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\u002Fcompare) or help review other developers’ [pull requests](https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\u002Fpulls).\n- Add an LLM to the OpenLLM default model repository so that other users can run your model. See the [pull request template](https:\u002F\u002Fgithub.com\u002Fbentoml\u002Fopenllm-models\u002Fpull\u002F1).\n- Check out the [Developer Guide](https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\u002Fblob\u002Fmain\u002FDEVELOPMENT.md) to learn more.\n\n## Acknowledgements\n\nThis project uses the following open-source projects:\n\n- [bentoml\u002Fbentoml](https:\u002F\u002Fgithub.com\u002Fbentoml\u002Fbentoml) for production level model serving\n- [vllm-project\u002Fvllm](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) for production level LLM backend\n- [blrchen\u002Fchatgpt-lite](https:\u002F\u002Fgithub.com\u002Fblrchen\u002Fchatgpt-lite) for a fancy Web Chat UI\n- [astral-sh\u002Fuv](https:\u002F\u002Fgithub.com\u002Fastral-sh\u002Fuv) for blazing fast model requirements installing\n\nWe are grateful to the developers and contributors of these projects for their hard work and dedication.\n","OpenLLM 是一个用于在云端运行开源大语言模型（如 DeepSeek 和 Llama）并提供 OpenAI 兼容 API 的工具。其核心功能包括通过一条命令即可启动任何开源或自定义的大语言模型，并以 OpenAI API 的形式提供服务，支持多种先进的推理后端技术。此外，它还内置了聊天界面，简化了基于 Docker、Kubernetes 以及 BentoCloud 的企业级云部署流程。适用于需要快速搭建和部署大语言模型服务的场景，尤其适合对模型进行微调和优化后的开发者使用。",2,"2026-06-11 03:35:08","high_star"]