[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9742":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":45,"readmeContent":46,"aiSummary":47,"trendingCount":16,"starSnapshotCount":16,"syncStatus":48,"lastSyncTime":49,"discoverSource":50},9742,"inference","xorbitsai\u002Finference","xorbitsai","Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.","https:\u002F\u002Finference.readthedocs.io",null,"Python",9349,837,60,25,0,16,57,1,39.77,"Apache License 2.0",false,"main",true,[26,27,28,29,30,31,32,5,33,34,35,36,37,38,39,40,41,42,43,44],"artificial-intelligence","chatglm","deployment","flan-t5","gemma","ggml","glm4","llama","llama3","llamacpp","llm","machine-learning","mistral","openai-api","pytorch","qwen","vllm","whisper","wizardlm","2026-06-12 02:02:11","\u003Cdiv align=\"center\">\n\u003Cimg src=\".\u002Fassets\u002Fxorbits-logo.png\" width=\"180px\" alt=\"xorbits\" \u002F>\n\n# Xorbits Inference: Model Serving Made Easy 🤖\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fxinference.io\u002Fen\">Xinference Enterprise\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Finstallation.html#installation\">Self-hosting\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Finference.readthedocs.io\u002F\">Documentation\u003C\u002Fa>\n\u003C\u002Fp>\n\n[![PyPI Latest Release](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fxinference.svg?style=for-the-badge)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fxinference\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fl\u002Fxinference.svg?style=for-the-badge)](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fblob\u002Fmain\u002FLICENSE)\n[![Build Status](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fxorbitsai\u002Finference\u002Fpython.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https:\u002F\u002Factions-badge.atrox.dev\u002Fxorbitsai\u002Finference\u002Fgoto?ref=main)\n[![Docker Pulls](https:\u002F\u002Fimg.shields.io\u002Fdocker\u002Fpulls\u002Fxprobe\u002Fxinference?style=for-the-badge&logo=docker)](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fxprobe\u002Fxinference)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fjoin_Discord-5462eb.svg?logo=discord&style=for-the-badge&logoColor=%23f5f5f5)](https:\u002F\u002Fdiscord.gg\u002FXw9tszSkr5)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Fxorbitsio?logo=x&style=for-the-badge)](https:\u002F\u002Ftwitter.com\u002Fxorbitsio)\n\n\u003Cp align=\"center\">\n  \u003Ca href=\".\u002FREADME.md\">\u003Cimg alt=\"README in English\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEnglish-454545?style=for-the-badge\">\u003C\u002Fa>\n  \u003Ca href=\".\u002FREADME_zh_CN.md\">\u003Cimg alt=\"简体中文版自述文件\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F中文介绍-d9d9d9?style=for-the-badge\">\u003C\u002Fa>\n  \u003Ca href=\".\u002FREADME_ja_JP.md\">\u003Cimg alt=\"日本語のREADME\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F日本語-d9d9d9?style=for-the-badge\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\u003Cbr \u002F>\n\n\nXorbits Inference(Xinference) is a powerful and versatile library designed to serve language, \nspeech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy \nand serve your or state-of-the-art built-in models using just a single command. Whether you are a \nresearcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full \npotential of cutting-edge AI models.\n\n\u003Cdiv align=\"center\">\n\u003Ci>\u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FXw9tszSkr5\">👉 Join our Discord community!\u003C\u002Fa>\u003C\u002Fi>\n\u003C\u002Fdiv>\n\n## 🔥 Hot Topics\n### Framework Enhancements\n- Agent-native Serving: Xinference integrates with [Xagent](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Fxagent) to enable dynamic planning, tool use, and autonomous multi-step reasoning — moving beyond static pipelines.\n- Auto batch: Multiple concurrent requests are automatically batched, significantly improving throughput: [#4197](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F4197)\n- [Xllamacpp](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Fxllamacpp): New llama.cpp Python binding, maintained by Xinference team, supports continuous batching and is more production-ready.: [#2997](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F2997)\n- Distributed inference: running models across workers: [#2877](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F2877)\n- VLLM enhancement: Shared KV cache across multiple replicas: [#2732](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F2732)\n### New Models\n- Built-in support for [MiniMax-M2.7](https:\u002F\u002Fwww.minimax.io\u002Fmodels\u002Ftext\u002Fm27): [#4843](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F4843)\n- Built-in support for [GLM-5.1](https:\u002F\u002Fz.ai\u002Fblog\u002Fglm-5.1): [#4832](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F4832)\n- Built-in support for [Qwen3.6](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3.6): [#4831](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F4831)\n- Built-in support for [Gemma-4](https:\u002F\u002Fdeepmind.google\u002Fmodels\u002Fgemma\u002Fgemma-4\u002F): [#4768](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F4768)\n- Built-in support for [Qwen3-TTS](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-TTS): [#4781](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F4781)\n- Built-in support for [Qwen-3.5](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3.5): [#4639](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F4639)\n- Built-in support for [GLM-5](https:\u002F\u002Fgithub.com\u002Fzai-org\u002FGLM-5): [#4638](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F4638)\n- Built-in support for [MiniMax-M2.5](https:\u002F\u002Fgithub.com\u002FMiniMax-AI\u002FMiniMax-M2.5): [#4630](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fpull\u002F4630)\n### Integrations\n- [Xagent](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Fxagent): an enterprise agent platform for building and running AI agents with planning, memory, and tool use — not limited to rigid workflows.\n- [Dify](https:\u002F\u002Fdocs.dify.ai\u002Fadvanced\u002Fmodel-configuration\u002Fxinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.\n- [FastGPT](https:\u002F\u002Fgithub.com\u002Flabring\u002FFastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.\n- [RAGFlow](https:\u002F\u002Fgithub.com\u002Finfiniflow\u002Fragflow): is an open-source RAG engine based on deep document understanding.\n- [MaxKB](https:\u002F\u002Fgithub.com\u002F1Panel-dev\u002FMaxKB): MaxKB = Max Knowledge Brain, it is a powerful and easy-to-use AI assistant that integrates Retrieval-Augmented Generation (RAG) pipelines, supports robust workflows, and provides advanced MCP tool-use capabilities.\n\n\n## Key Features\n🌟 **Model Serving Made Easy**: Simplify the process of serving large language, speech \nrecognition, and multimodal models. You can set up and deploy your models\nfor experimentation and production with a single command.\n\n⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single \ncommand. Inference provides access to state-of-the-art open-source models!\n\n🖥 **Heterogeneous Hardware Utilization**: Make the most of your hardware resources with\n[ggml](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fggml). Xorbits Inference intelligently utilizes heterogeneous\nhardware, including GPUs and CPUs, to accelerate your model inference tasks.\n\n⚙️ **Flexible API and Interfaces**: Offer multiple interfaces for interacting\nwith your models, supporting OpenAI compatible RESTful API (including Function Calling API), RPC, CLI \nand WebUI for seamless model management and interaction.\n\n🌐 **Distributed Deployment**: Excel in distributed deployment scenarios, \nallowing the seamless distribution of model inference across multiple devices or machines.\n\n🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates\nwith popular third-party libraries including [LangChain](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fintegrations\u002Fproviders\u002Fxinference), [LlamaIndex](https:\u002F\u002Fgpt-index.readthedocs.io\u002Fen\u002Fstable\u002Fexamples\u002Fllm\u002FXinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https:\u002F\u002Fdocs.dify.ai\u002Fadvanced\u002Fmodel-configuration\u002Fxinference), and [Chatbox](https:\u002F\u002Fchatboxai.app\u002F).\n\n## Why Xinference\n| Feature                                        | Xinference | FastChat | OpenLLM | RayLLM |\n|------------------------------------------------|------------|----------|---------|--------|\n| OpenAI-Compatible RESTful API                  | ✅ | ✅ | ✅ | ✅ |\n| vLLM Integrations                              | ✅ | ✅ | ✅ | ✅ |\n| More Inference Engines (GGML, TensorRT)        | ✅ | ❌ | ✅ | ✅ |\n| More Platforms (CPU, Metal)                    | ✅ | ✅ | ❌ | ❌ |\n| Multi-node Cluster Deployment                  | ✅ | ❌ | ❌ | ✅ |\n| Image Models (Text-to-Image)                   | ✅ | ✅ | ❌ | ❌ |\n| Text Embedding Models                          | ✅ | ❌ | ❌ | ❌ |\n| Multimodal Models                              | ✅ | ❌ | ❌ | ❌ |\n| Audio Models                                   | ✅ | ❌ | ❌ | ❌ |\n| More OpenAI Functionalities (Function Calling) | ✅ | ❌ | ❌ | ❌ |\n\n## Using Xinference\n\n- **Self-hosting Xinference Community Edition\u003C\u002Fbr>**\nQuickly get Xinference running in your environment with this [starter guide](#getting-started).\nUse our [documentation](https:\u002F\u002Finference.readthedocs.io\u002F) for further references and more in-depth instructions.\n\n- **Xinference for enterprise \u002F organizations\u003C\u002Fbr>**\nWe provide additional enterprise-centric features. [send us an email](mailto:business@xprobe.io?subject=[GitHub]Business%20License%20Inquiry) to discuss enterprise needs. \u003C\u002Fbr>\n\n## Staying Ahead\n\nStar Xinference on GitHub and be instantly notified of new releases.\n\n![star-us](assets\u002Fstay_ahead.gif)\n\n## Getting Started\n\n* [Docs](https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Findex.html)\n* [Built-in Models](https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Fmodels\u002Fbuiltin\u002Findex.html)\n* [Custom Models](https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Fmodels\u002Fcustom.html)\n* [Deployment Docs](https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Fusing_xinference.html)\n* [Examples and Tutorials](https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Fexamples\u002Findex.html)\n\n### Jupyter Notebook\n\nThe lightest way to experience Xinference is to try our [Jupyter Notebook on Google Colab](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fxorbitsai\u002Finference\u002Fblob\u002Fmain\u002Fexamples\u002FXinference_Quick_Start.ipynb).\n\n### Docker \n\nNvidia GPU users can start Xinference server using [Xinference Docker Image](https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Fusing_docker_image.html). Prior to executing the installation command, ensure that both [Docker](https:\u002F\u002Fdocs.docker.com\u002Fget-docker\u002F) and [CUDA](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads) are set up on your system.\n\n```bash\ndocker run --name xinference -d -p 9997:9997 -e XINFERENCE_HOME=\u002Fdata -v \u003C\u002Fon\u002Fyour\u002Fhost>:\u002Fdata --gpus all xprobe\u002Fxinference:latest xinference-local -H 0.0.0.0\n```\n\n### K8s via helm\n\nEnsure that you have GPU support in your Kubernetes cluster, then install as follows.\n\n```\n# add repo\nhelm repo add xinference https:\u002F\u002Fxorbitsai.github.io\u002Fxinference-helm-charts\n\n# update indexes and query xinference versions\nhelm repo update xinference\nhelm search repo xinference\u002Fxinference --devel --versions\n\n# install xinference\nhelm install xinference xinference\u002Fxinference -n xinference --version 0.0.1-v\u003Cxinference_release_version>\n```\n\nFor more customized installation methods on K8s, please refer to the [documentation](https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Fusing_kubernetes.html).\n\n### Quick Start\n\nInstall Xinference by using pip as follows. (For more options, see [Installation page](https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Finstallation.html).)\n\n```bash\npip install \"xinference[all]\"\n```\n\nTo start a local instance of Xinference, run the following command:\n\n```bash\n$ xinference-local\n```\n\nOnce Xinference is running, there are multiple ways you can try it: via the web UI, via cURL,\n via the command line, or via the Xinference’s python client. Check out our [docs]( https:\u002F\u002Finference.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Fusing_xinference.html#run-xinference-locally) for the guide.\n\n![web UI](assets\u002Fscreenshot.png)\n\n## Getting involved\n\n| Platform                                                                                        | Purpose                                     |\n|-------------------------------------------------------------------------------------------------|---------------------------------------------|\n| [Github Issues](https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fissues)                                  | Reporting bugs and filing feature requests. |\n| [Discord](https:\u002F\u002Fdiscord.gg\u002FXw9tszSkr5) | Collaborating with other Xinference users.  |\n| [Twitter](https:\u002F\u002Ftwitter.com\u002Fxorbitsio)                                                        | Staying up-to-date on new features.         |\n\n## Citation\n\nIf this work is helpful, please kindly cite as:\n\n```bibtex\n@inproceedings{lu2024xinference,\n    title = \"Xinference: Making Large Model Serving Easy\",\n    author = \"Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo\",\n    booktitle = \"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations\",\n    month = nov,\n    year = \"2024\",\n    address = \"Miami, Florida, USA\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https:\u002F\u002Faclanthology.org\u002F2024.emnlp-demo.30\",\n    pages = \"291--300\",\n}\n```\n\n## Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fxorbitsai\u002Finference\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Fcontrib.rocks\u002Fimage?repo=xorbitsai\u002Finference\" \u002F>\n\u003C\u002Fa>\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=xorbitsai\u002Finference&type=Date)](https:\u002F\u002Fstar-history.com\u002F#xorbitsai\u002Finference&Date)","Xorbits Inference 是一个强大的多功能库，用于部署和运行语言、语音识别和多模态模型。其核心功能包括通过单一命令轻松部署和使用内置或自定义的先进AI模型，并支持自动批处理以提高吞吐量。此外，它还集成了Xagent以实现动态规划和工具使用，以及分布式推理能力。Xorbits Inference 适用于需要在云环境、本地服务器或个人电脑上进行模型推理的各种场景，无论是研究人员、开发者还是数据科学家都能从中受益。",2,"2026-06-11 03:24:31","top_topic"]