[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71992":3},{"id":4,"name":5,"fullName":6,"owner":5,"repo":5,"description":7,"homepage":8,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":36,"readmeContent":37,"aiSummary":38,"trendingCount":15,"starSnapshotCount":15,"syncStatus":39,"lastSyncTime":40,"discoverSource":41},71992,"LMCache","LMCache\u002FLMCache","LMCache: Supercharge Your LLM with the Fastest KV Cache Layer","https:\u002F\u002Flmcache.ai\u002F",null,"Python",8480,1270,44,119,0,50,92,231,150,40.31,"Apache License 2.0",false,"dev",true,[26,27,28,29,30,31,32,33,34,35],"amd","cuda","fast","inference","kv-cache","llm","pytorch","rocm","speed","vllm","2026-06-12 02:02:57","\u003Cdiv align=\"center\">\n  \u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FLMCache\u002FLMCache\u002Fdev\u002Fasset\u002Flogo.png\" width=\"720\" alt=\"lmcache logo\">\n  \u003C\u002Fp>\n  \n  [![Docs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-live-brightgreen)](https:\u002F\u002Fdocs.lmcache.ai\u002F)\n  [![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Flmcache)](https:\u002F\u002Fpypi.org\u002Fproject\u002Flmcache\u002F)\n  [![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Flmcache)](https:\u002F\u002Fpypi.org\u002Fproject\u002Flmcache\u002F)\n  [![Unit Tests](https:\u002F\u002Fbadge.buildkite.com\u002Fce25f1819a274b7966273bfa54f0e02f092c3de0d7563c5c9d.svg)](https:\u002F\u002Fbuildkite.com\u002Flmcache\u002Flmcache-unittests)\n  [![Code Quality](https:\u002F\u002Fgithub.com\u002Flmcache\u002Flmcache\u002Factions\u002Fworkflows\u002Fcode_quality_checks.yml\u002Fbadge.svg?branch=dev&label=tests)](https:\u002F\u002Fgithub.com\u002FLMCache\u002FLMCache\u002Factions\u002Fworkflows\u002Fcode_quality_checks.yml)\n  [![Integration Tests](https:\u002F\u002Fbadge.buildkite.com\u002F108ddd4ab482a2480999dec8c62a640a3315ed4e6c4e86798e.svg)](https:\u002F\u002Fbuildkite.com\u002Flmcache\u002Flmcache-vllm-integration-tests)\n\n   \u003Cbr \u002F>\n\n  [![OpenSSF Best Practices](https:\u002F\u002Fwww.bestpractices.dev\u002Fprojects\u002F10841\u002Fbadge)](https:\u002F\u002Fwww.bestpractices.dev\u002Fprojects\u002F10841)\n  [![OpenSSF Scorecard](https:\u002F\u002Fapi.scorecard.dev\u002Fprojects\u002Fgithub.com\u002FLMCache\u002FLMCache\u002Fbadge)](https:\u002F\u002Fscorecard.dev\u002Fviewer\u002F?uri=github.com\u002FLMCache\u002FLMCache)\n  [![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002FLMCache\u002FLMCache\u002F)\n  [![GitHub commit activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fw\u002FLMCache\u002FLMCache)](https:\u002F\u002Fgithub.com\u002FLMCache\u002FLMCache\u002Fgraphs\u002Fcommit-activity)\n  [![PyPI - Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Flmcache)](https:\u002F\u002Fpypi.org\u002Fproject\u002Flmcache\u002F)\n  [![YouTube Channel Views](https:\u002F\u002Fimg.shields.io\u002Fyoutube\u002Fchannel\u002Fviews\u002FUC58zMz55n70rtf1Ak2PULJA)](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUC58zMz55n70rtf1Ak2PULJA)\n\n\u003C\u002Fdiv>\n\n\n--------------------------------------------------------------------------------\n\n| [**Blog**](https:\u002F\u002Fblog.lmcache.ai\u002F)\n| [**Documentation**](https:\u002F\u002Fdocs.lmcache.ai\u002F)\n| [**Join Slack**](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Flmcacheworkspace\u002Fshared_invite\u002Fzt-3g8e6xzz8-KzS_HI8bPERGFK5PTB~MYg)\n| [**Interest Form**](https:\u002F\u002Fforms.gle\u002FMHwLiYDU6kcW3dLj7)\n| [**Roadmap**](https:\u002F\u002Fgithub.com\u002FLMCache\u002FLMCache\u002Fissues\u002F1253)\n\n## Summary\n\nLMCache is an **LLM** serving engine extension to **reduce TTFT** and **increase throughput**, especially under long-context scenarios. By storing the KV caches of reusable texts all over the datacenter (including GPU, CPU, Disk and even S3) with a wide range of acceleration technqiue (zero cpu copy, NIXL, GDS and more). LMCache reuses the KV caches of **_any_** reused text (not necessarily prefix) in **_any_** serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.  \n\nBy combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.\n\n![performance](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F86137f17-f216-41a0-96a7-e537764f7a4c)\n\nLMCache is used, integrated, or referenced across a growing ecosystem of LLM serving platforms, infrastructure providers, and open-source projects:\n\n- Initiated and officially supported by: [Tensormesh](https:\u002F\u002Fwww.tensormesh.ai\u002F)\n- Adopted by inference providers: GMI cloud ([blog post](https:\u002F\u002Fwww.gmicloud.ai\u002Fblog\u002Fgmi-cloud-achieves-4x-llm-performance-boost-with-tensormesh)), Google cloud ([blog post](https:\u002F\u002Fcloud.google.com\u002Fblog\u002Ftopics\u002Fdevelopers-practitioners\u002Fboosting-llm-performance-with-tiered-kv-cache-on-google-kubernetes-engine)), CoreWeave ([blog post](https:\u002F\u002Fwww.coreweave.com\u002Fnews\u002Fcoreweave-unveils-ai-object-storage-redefining-how-ai-workloads-access-and-scale-data)) and more\n- Integrated with data and storage infrastructure providers: Redis ([blog post](https:\u002F\u002Fredis.io\u002Fblog\u002Fget-faster-llm-inference-and-cheaper-responses-with-lmcache-and-redis\u002F)), Weka ([blog post](https:\u002F\u002Fwww.weka.io\u002Fblog\u002Fai-ml\u002Fopen-sourcing-gds-integration-from-augmented-memory-grid-see-results-for-yourself\u002F)), PliOps ([blog post](https:\u002F\u002Fwww.manilatimes.net\u002F2025\u002F03\u002F12\u002Ftmt-newswire\u002Fglobenewswire\u002Fpliops-announces-collaboration-with-vllm-production-stack-to-enhance-llm-inference-performance\u002F2072000)) and more\n- Used by open-source projects and platforms: [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fvllm-project\u002Fvllm?style=social)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)\n, [SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang) [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsgl-project\u002Fsglang?style=social)](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang)\n, [vLLM Production Stack](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fproduction-stack) [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fvllm-project\u002Fproduction-stack?style=social)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fproduction-stack), [llm-d](https:\u002F\u002Fgithub.com\u002Fllm-d\u002Fllm-d\u002F) [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fllm-d\u002Fllm-d?style=social)](https:\u002F\u002Fgithub.com\u002Fllm-d\u002Fllm-d), [NVIDIA dynamo](https:\u002F\u002Fgithub.com\u002Fai-dynamo\u002Fdynamo) [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fai-dynamo\u002Fdynamo)](https:\u002F\u002Fgithub.com\u002Fai-dynamo\u002Fdynamo), [KServe](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve) [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fkserve\u002Fkserve?style=social)](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve) and more.\n\nFor more details, please check our [Ray Summit talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TwLd15HE6AM) and [technical report](https:\u002F\u002Flmcache.ai\u002Ftech_report.pdf).\n\n\n## Features\n\n- [x] 🔥 Integration with vLLM v1 with the following features:\n  * High performance CPU KVCache offloading\n  * Disaggregated prefill\n  * P2P KVCache sharing\n- [x] Integration with SGLang for KV cache offloading\n- [x] Storage support as follows:\n  * CPU\n  * Disk\n  * [NIXL](https:\u002F\u002Fgithub.com\u002Fai-dynamo\u002Fnixl)\n- [x] Installation support through pip and latest vLLM\n\n## Installation\n\nTo use LMCache, simply install `lmcache` from your package manager, e.g. pip:\n\n```bash\npip install lmcache\n```\n\nWorks on Linux NVIDIA GPU platform.\n\nMore [detailed installation instructions](https:\u002F\u002Fdocs.lmcache.ai\u002Fgetting_started\u002Finstallation) are available in the docs, particularly if you are not using the latest stable version of vllm or using another serving engine with different dependencies. Any \"undefined symbol\" or torch mismatch versions can be resolved in the documentation. \n\n## Getting started\n\nThe best way to get started is to checkout the [Quickstart Examples](https:\u002F\u002Fdocs.lmcache.ai\u002Fgetting_started\u002Fquickstart\u002F) in the docs.\n\n## Documentation\n\nCheck out the LMCache [documentation](https:\u002F\u002Fdocs.lmcache.ai\u002F) which is available online.\n\nWe also post regularly in [LMCache blogs](https:\u002F\u002Fblog.lmcache.ai\u002F).\n\n## Examples\n\nGo hands-on with our [examples](https:\u002F\u002Fgithub.com\u002FLMCache\u002FLMCache\u002Ftree\u002Fdev\u002Fexamples),\ndemonstrating how to address different use cases with LMCache.\n\n## Interested in Connecting?\n\nFill out the [interest form](https:\u002F\u002Fforms.gle\u002FmQfQDUXbKfp2St1z7), [sign up for our newsletter](https:\u002F\u002Fmailchi.mp\u002Ftensormesh\u002Flmcache-sign-up-newsletter), [join LMCache slack](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Flmcacheworkspace\u002Fshared_invite\u002Fzt-3g8e6xzz8-KzS_HI8bPERGFK5PTB~MYg), or [drop an email](mailto:contact@lmcache.ai), and our team will reach out to you!\n\n## Community meeting\n\nThe community meeting [Zoom Link]( https:\u002F\u002Fuchicago.zoom.us\u002Fj\u002F6603596916?pwd=Z1E5MDRWUSt2am5XbEt4dTFkNGx6QT09) for LMCache is hosted bi-weekly. All are welcome to join!\n\nMeetings are held bi-weekly on: Tuesdays at 9:00 AM PT – [Add to Google Calendar](https:\u002F\u002Fcalendar.google.com\u002Fcalendar\u002Fu\u002F0\u002Fr?cid=Y19mNGY2ZmMwZjUxMWYyYTZmZmE1ZTVlMGI2Yzk2NmFmZjNhM2Y4ODZiZmU5OTU5MDJlMmE3ZmUyOGZmZThlOWY5QGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20)\n\nWe keep notes from each meeting on this [document](https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F1_Fl3vLtERFa3vTH00cezri78NihNBtSClK-_1tSrcow) for summaries of standups, discussion, and action items.\n\nRecordings of meetings are available on the [YouTube LMCache channel](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUC58zMz55n70rtf1Ak2PULJA).\n\n## Contributing\n\nWe welcome and value all contributions and collaborations.  Please check out [Contributing Guide](CONTRIBUTING.md) on how to contribute.\n\nWe continually update [[Onboarding] Welcoming contributors with good first issues!](https:\u002F\u002Fgithub.com\u002FLMCache\u002FLMCache\u002Fissues\u002F627)\n\n## Citation\n\nIf you use LMCache for your research, please cite our papers:\n\n```\n@inproceedings{liu2024cachegen,\n  title={Cachegen: Kv cache compression and streaming for fast large language model serving},\n  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},\n  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},\n  pages={38--56},\n  year={2024}\n}\n\n@article{cheng2024large,\n  title={Do Large Language Models Need a Content Delivery Network?},\n  author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},\n  journal={arXiv preprint arXiv:2409.13761},\n  year={2024}\n}\n\n@inproceedings{10.1145\u002F3689031.3696098,\n  author = {Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},\n  title = {CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion},\n  year = {2025},\n  url = {https:\u002F\u002Fdoi.org\u002F10.1145\u002F3689031.3696098},\n  doi = {10.1145\u002F3689031.3696098},\n  booktitle = {Proceedings of the Twentieth European Conference on Computer Systems},\n  pages = {94–109},\n}\n\n@article{cheng2025lmcache,\n  title={LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference},\n  author={Cheng, Yihua and Liu, Yuhan and Yao, Jiayi and An, Yuwei and Chen, Xiaokun and Feng, Shaoting and Huang, Yuyang and Shen, Samuel and Du, Kuntai and Jiang, Junchen},\n  journal={arXiv preprint arXiv:2510.09665},\n  year={2025}\n}\n```\n\n## Socials\n\n[Linkedin](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Flmcache-lab\u002F?viewAsMember=true) | [Twitter](https:\u002F\u002Fx.com\u002Flmcache) | [Youtube](https:\u002F\u002Fwww.youtube.com\u002F@LMCacheTeam)\n\n## License\n\nThe LMCache codebase is licensed under Apache License 2.0. See the [LICENSE](LICENSE) file for details.\n","LMCache 是一个用于加速大规模语言模型（LLM）推理过程的KV缓存层扩展。它通过在数据中心范围内存储可重用文本的KV缓存（支持GPU、CPU、磁盘乃至S3），并利用多种加速技术如零CPU拷贝、NIXL和GDS等，显著减少了首次响应时间（TTFT）并提高了吞吐量，特别是在处理长上下文场景时表现尤为出色。LMCache能够跨不同服务实例重用任意重复文本的KV缓存，从而节省宝贵的GPU计算资源并降低用户等待延迟。该项目非常适合需要高效处理多轮对话问答、检索增强生成（RAG）等对实时性和成本敏感的应用场景。",2,"2026-06-11 03:39:50","high_star"]