[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72137":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":35,"readmeContent":36,"aiSummary":37,"trendingCount":16,"starSnapshotCount":16,"syncStatus":38,"lastSyncTime":39,"discoverSource":40},72137,"vllm-omni","vllm-project\u002Fvllm-omni","vllm-project","A framework for efficient model inference with omni-modality models","https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Fvllm-omni",null,"Python",5071,1094,51,490,0,102,219,426,306,115.12,"Apache License 2.0",false,"main",[26,27,28,29,30,31,32,33,34],"audio-generation","diffusion","image-generation","inference","model-serving","multimodal","pytorch","transformer","video-generation","2026-06-12 04:01:03","\u003Cp align=\"center\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fvllm-project\u002Fvllm-omni\u002Frefs\u002Fheads\u002Fmain\u002Fdocs\u002Fsource\u002Flogos\u002Fvllm-omni-logo.png\">\n    \u003Cimg alt=\"vllm-omni\" src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fvllm-project\u002Fvllm-omni\u002Frefs\u002Fheads\u002Fmain\u002Fdocs\u002Fsource\u002Flogos\u002Fvllm-omni-logo.png\" width=55%>\n  \u003C\u002Fpicture>\n\u003C\u002Fp>\n\u003Ch3 align=\"center\">\nEasy, fast, and cheap omni-modality model serving for everyone\n\u003C\u002Fh3>\n\n\u003Cp align=\"center\">\n| \u003Ca href=\"https:\u002F\u002Fvllm-omni.readthedocs.io\u002Fen\u002Flatest\u002F\">\u003Cb>Documentation\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdeepwiki.com\u002Fvllm-project\u002Fvllm-omni\">\u003Cb>DeepWiki\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdiscuss.vllm.ai\">\u003Cb>User Forum\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fslack.vllm.ai\">\u003Cb>Developer Slack\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"docs\u002Fassets\u002FWeChat.jpg\">\u003Cb>WeChat\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.02204\">\u003Cb>Paper\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002F111-L8zF7A1j_YI_cR8JsblofdScdRr2f\u002Fedit?usp=sharing&ouid=110473603432222024453&rtpof=true&sd=true\">\u003Cb>Slides\u003C\u002Fb>\u003C\u002Fa> |\n\u003C\u002Fp>\n\n\n---\n\n*Latest News* 🔥\n- [2026\u002F05] We released [0.20.0](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni\u002Freleases\u002Ftag\u002Fv0.20.0) - refreshes the serving\u002Fruntime stack for large-scale omni workloads, and improves diffusion model performance, quantization, and hardware readiness across CUDA, ROCm, MUSA, NPU, and XPU backends.\n- [2026\u002F03] We released [0.18.0](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni\u002Freleases\u002Ftag\u002Fv0.18.0) - strengthens the core runtime through a large entrypoint refactor and scheduler\u002Fruntime cleanups, expands unified quantization and diffusion execution, broadens multimodal model coverage, and improves production readiness across audio, omni, image, video, RL, and multi-platform deployments.\n- [2026\u002F03] Check out our first public [project deepdive](https:\u002F\u002Fyoutu.be\u002FsgwNfsNnR9I) at the vLLM Hong Kong Meetup!\n- [2026\u002F03] **[vllm-omni-skills](https:\u002F\u002Fgithub.com\u002Fhsliuustc0106\u002Fvllm-omni-skills)** is a community-driven collection of AI assistant skills that help developers work with vLLM-Omni more effectively. These skills can be used with popular agentic AI coding assistants like **Cursor IDE**, **Claude**, **Codex**, and more.\n- [2026\u002F02] We released [0.16.0](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni\u002Freleases\u002Ftag\u002Fv0.16.0) - A major alignment + capability release that rebases onto **upstream vLLM v0.16.0** and significantly expands performance, distributed execution, and production readiness across **Qwen3-Omni \u002F Qwen3-TTS**, **Bagel**, **MiMo-Audio**, **GLM-Image** and the **Diffusion (DiT) image\u002Fvideo stack**—while also improving platform coverage (CUDA \u002F ROCm \u002F NPU \u002F XPU), CI quality, and documentation.\n- [2026\u002F02] We released [0.14.0](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni\u002Freleases\u002Ftag\u002Fv0.14.0) - This is the first **stable release** of vLLM-Omni that expands Omni’s diffusion \u002F image-video generation and audio \u002F TTS stack, improves distributed execution and memory efficiency, and broadens platform\u002Fbackend coverage (GPU\u002FROCm\u002FNPU\u002FXPU). It also brings meaningful upgrades to serving APIs, profiling & benchmarking, and overall stability. Please check our latest [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.02204) for architecture design and performance results.\n- [2025\u002F11] vLLM community officially released [vllm-project\u002Fvllm-omni](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni) in order to support omni-modality models serving.\n\n---\n\n## About\n\n[vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) was originally designed to support large language models for text-based autoregressive generation tasks. vLLM-Omni is a framework that extends its support for omni-modality model inference and serving:\n\n- **Omni-modality**: Text, image, video, and audio data processing\n- **Non-autoregressive Architectures**: extend the AR support of vLLM to Diffusion Transformers (DiT) and other parallel generation models\n- **Heterogeneous outputs**: from traditional text generation to multimodal outputs\n\n\u003Cp align=\"center\">\n  \u003Cpicture>\n    \u003Cimg alt=\"vllm-omni\" src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fvllm-project\u002Fvllm-omni\u002Frefs\u002Fheads\u002Fmain\u002Fdocs\u002Fsource\u002Farchitecture\u002Fomni-modality-model-architecture.png\" width=55%>\n  \u003C\u002Fpicture>\n\u003C\u002Fp>\n\nvLLM-Omni is fast with:\n\n- State-of-the-art AR support by leveraging efficient KV cache management from vLLM\n- Pipelined stage execution overlapping for high throughput performance\n- Fully disaggregation based on OmniConnector and dynamic resource allocation across stages\n\nvLLM-Omni is flexible and easy to use with:\n\n- Heterogeneous pipeline abstraction to manage complex model workflows\n- Seamless integration with popular Hugging Face models\n- Tensor, pipeline, data and expert parallelism support for distributed inference\n- Streaming outputs\n- OpenAI-compatible API server\n\nvLLM-Omni seamlessly supports most popular open-source models on HuggingFace, including:\n\n- Omni-modality models (e.g. Qwen-Omni)\n- Multi-modality generation models (e.g. Qwen-Image)\n\n## Getting Started\n\nVisit our [documentation](https:\u002F\u002Fvllm-omni.readthedocs.io\u002Fen\u002Flatest\u002F) to learn more.\n\n- [Installation](https:\u002F\u002Fvllm-omni.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Finstallation\u002F)\n- [Quickstart](https:\u002F\u002Fvllm-omni.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Fquickstart\u002F)\n- [List of Supported Models](https:\u002F\u002Fvllm-omni.readthedocs.io\u002Fen\u002Flatest\u002Fmodels\u002Fsupported_models\u002F)\n\n## Contributing\n\nWe welcome and value any contributions and collaborations.\nPlease check out [Contributing to vLLM-Omni](https:\u002F\u002Fvllm-omni.readthedocs.io\u002Fen\u002Flatest\u002Fcontributing\u002F) for how to get involved.\n\n## Citation\n\nIf you use vLLM-Omni for your research, please cite our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.02204):\n\n```bibtex\n@article{yin2026vllmomni,\n  title={vLLM-Omni: Fully Disaggregated Serving for Any-to-Any Multimodal Models},\n  author={Peiqi Yin, Jiangyun Zhu, Han Gao, Chenguang Zheng, Yongxiang Huang, Taichang Zhou, Ruirui Yang, Weizhi Liu, Weiqing Chen, Canlin Guo, Didan Deng, Zifeng Mo, Cong Wang, James Cheng, Roger Wang, Hongsheng Liu},\n  journal={arXiv preprint arXiv:2602.02204},\n  year={2026}\n}\n```\n\n## Join the Community\nFeel free to ask questions, provide feedbacks and discuss with fellow users of vLLM-Omni in `#sig-omni` slack channel at [slack.vllm.ai](https:\u002F\u002Fslack.vllm.ai) or vLLM user forum at [discuss.vllm.ai](https:\u002F\u002Fdiscuss.vllm.ai).\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=vllm-project\u002Fvllm-omni&type=date&legend=top-left)](https:\u002F\u002Fwww.star-history.com\u002F#vllm-project\u002Fvllm-omni&type=date&legend=top-left)\n\n## License\n\nApache License 2.0, as found in the [LICENSE](.\u002FLICENSE) file.\n","vllm-omni 是一个用于全模态模型高效推理的框架。它支持音频生成、图像生成、视频生成等多种模态，并通过 PyTorch 和 Transformer 技术实现高性能推理。项目具备强大的量化和分布式执行能力，优化了在 CUDA、ROCm、NPU、XPU 等多种硬件平台上的性能。vllm-omni 适用于需要处理多模态数据的应用场景，如多媒体内容生成、智能助手技能开发等，能够显著提升模型服务的速度和成本效益。",2,"2026-06-11 03:40:32","high_star"]