[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9689":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},9689,"nano-vllm","GeeeekExplorer\u002Fnano-vllm","GeeeekExplorer","Nano vLLM","",null,"Python",13993,2200,84,27,0,19,138,622,98,45,"MIT License",false,"main",[26,27,28,29,30,31],"deep-learning","inference","llm","nlp","pytorch","transformer","2026-06-12 02:02:11","\u003Cp align=\"center\">\n\u003Cimg width=\"300\" src=\"assets\u002Flogo.png\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F15323\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Ftrendshift.io\u002Fapi\u002Fbadge\u002Frepositories\u002F15323\" alt=\"GeeeekExplorer%2Fnano-vllm | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n# Nano-vLLM\n\nA lightweight vLLM implementation built from scratch.\n\n## Key Features\n\n* 🚀 **Fast offline inference** - Comparable inference speeds to vLLM\n* 📖 **Readable codebase** - Clean implementation in ~ 1,200 lines of Python code\n* ⚡ **Optimization Suite** - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc.\n\n## Installation\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002FGeeeekExplorer\u002Fnano-vllm.git\n```\n\n## Model Download\n\nTo download the model weights manually, use the following command:\n```bash\nhuggingface-cli download --resume-download Qwen\u002FQwen3-0.6B \\\n  --local-dir ~\u002Fhuggingface\u002FQwen3-0.6B\u002F \\\n  --local-dir-use-symlinks False\n```\n\n## Quick Start\n\nSee `example.py` for usage. The API mirrors vLLM's interface with minor differences in the `LLM.generate` method:\n```python\nfrom nanovllm import LLM, SamplingParams\nllm = LLM(\"\u002FYOUR\u002FMODEL\u002FPATH\", enforce_eager=True, tensor_parallel_size=1)\nsampling_params = SamplingParams(temperature=0.6, max_tokens=256)\nprompts = [\"Hello, Nano-vLLM.\"]\noutputs = llm.generate(prompts, sampling_params)\noutputs[0][\"text\"]\n```\n\n## Benchmark\n\nSee `bench.py` for benchmark.\n\n**Test Configuration:**\n- Hardware: RTX 4070 Laptop (8GB)\n- Model: Qwen3-0.6B\n- Total Requests: 256 sequences\n- Input Length: Randomly sampled between 100–1024 tokens\n- Output Length: Randomly sampled between 100–1024 tokens\n\n**Performance Results:**\n| Inference Engine | Output Tokens | Time (s) | Throughput (tokens\u002Fs) |\n|----------------|-------------|----------|-----------------------|\n| vLLM           | 133,966     | 98.37    | 1361.84               |\n| Nano-vLLM      | 133,966     | 93.41    | 1434.13               |\n\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=GeeeekExplorer\u002Fnano-vllm&type=Date)](https:\u002F\u002Fwww.star-history.com\u002F#GeeeekExplorer\u002Fnano-vllm&Date)","Nano-vLLM 是一个从零构建的轻量级vLLM实现。它提供了快速的离线推理能力，性能与vLLM相当，并且代码库简洁易读，仅约1,200行Python代码。该项目集成了多种优化技术，包括前缀缓存、张量并行、Torch编译和CUDA图等，以进一步提升运行效率。适用于需要高效自然语言处理模型推理但又希望保持系统轻量化和易于维护的场景，如基于文本生成的应用开发或研究项目。",2,"2026-06-11 03:24:14","top_topic"]