[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80701":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":12,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},80701,"nano-vllm-omni","Rising0321\u002Fnano-vllm-omni","Rising0321","A lightweight `vLLM-Omni`-style diffusion implementation built around `Wan2.2-TI2V-5B-Diffusers` inspired from nano-vllm",null,"Python",50,5,45,1,0,3,4,9,2.33,false,"main",true,[],"2026-06-12 02:04:05","\u003Cp align=\"center\">\n\u003Cimg width=\"300\" src=\"assets\u002Flogo.png\">\n\u003C\u002Fp>\n\n# Nano-vLLM-Omni\n\nA lightweight `vLLM-Omni`-style diffusion implementation built around `Wan2.2-TI2V-5B-Diffusers`.\n\n## Key Features\n\n* 🚀 **Real engine boundaries** - explicit `request -> scheduler -> runner -> pipeline`\n* 📖 **Readable codebase** - core implementation in ~`1,079` lines of Python for studying diffusion serving\n* ⚡ **Step execution** - preserves the `prepare_encode -> denoise_step -> step_scheduler -> post_decode` contract from `vllm-omni`\n* 🧠 **Minimal reuse path** - CPU prompt-embedding cache as the diffusion analogue of prefix\u002FKV reuse\n* 💾 **Practical memory path** - explicit module-level CPU offload for a 24GB 3090\n\n## What It Keeps\n\nThis project keeps the core `vllm-omni diffusion` shape:\n\n- explicit scheduler-owned request lifecycle\n- per-request mutable runner state\n- step-wise denoising instead of one giant `pipe(...)` call\n- a dedicated pipeline adapter instead of hiding all logic inside Diffusers\n\nIt does **not** try to reimplement distributed executors, cache backends, tensor parallel diffusion, or multi-model orchestration.\n\n## Installation\n\nThis project was validated with Python `3.10`, a CUDA-capable NVIDIA GPU, and `ffmpeg`.\n\n1. Create an environment.\n\n```bash\nconda create -n nano-vllm-omni python=3.10 -y\nconda activate nano-vllm-omni\n```\n\n2. Install a CUDA-enabled PyTorch build that matches your system.\n\nExample for CUDA `12.1`:\n\n```bash\npython -m pip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n```\n\nIf your machine uses a different CUDA version, use the selector on the official PyTorch site instead of this example.\n\n3. Install system `ffmpeg`.\n\nUbuntu:\n\n```bash\nsudo apt-get update\nsudo apt-get install -y ffmpeg\n```\n\n4. Install this project and the Hugging Face CLI.\n\nThis project currently depends on the `Wan` pipeline from the `diffusers` main branch, so `pip install -e .` will fetch `diffusers` from GitHub automatically.\n\n```bash\npython -m pip install -e .\npython -m pip install huggingface_hub\n```\n\n## Model Download\n\nThis repo expects the Wan model under `.\u002Fmodels\u002FWan2.2-TI2V-5B-Diffusers`.\n\nCreate the directory and download the official Diffusers weights:\n\n```bash\nmkdir -p models\nhuggingface-cli download --resume-download Wan-AI\u002FWan2.2-TI2V-5B-Diffusers \\\n  --local-dir .\u002Fmodels\u002FWan2.2-TI2V-5B-Diffusers \\\n  --local-dir-use-symlinks False\n```\n\nThe repository already includes a demo image at `.\u002Fassets\u002Fi2v_input.JPG`, so no extra asset download is required for the default example.\n\n## Sample Inputs\n\nThe repo includes the official Wan TI2V sample input image under `.\u002Fassets`:\n\n- `i2v_input.JPG`: the official Wan cat-on-surfboard example\n\nSource and license details are listed in [assets\u002FREADME.md](\u002Fhome\u002Fzhangrx\u002FlearnVLLM\u002Fnano-vllm-omni\u002Fassets\u002FREADME.md:1).\n\n## Quick Start\n\nAfter the model is downloaded, this command should run end-to-end:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python example_wan22_i2v.py \\\n  --model .\u002Fmodels\u002FWan2.2-TI2V-5B-Diffusers \\\n  --image .\u002Fassets\u002Fi2v_input.JPG \\\n  --preset quality \\\n  --output .\u002Foutput\u002Fexample_wan22_i2v_quality.mp4\n```\n\nOr via the CLI entrypoint:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 nano-vllm-omni \\\n  --model .\u002Fmodels\u002FWan2.2-TI2V-5B-Diffusers \\\n  --image .\u002Fassets\u002Fi2v_input.JPG \\\n  --preset quality \\\n  --output .\u002Foutput\u002Fexample_wan22_i2v_quality.mp4\n```\n\nThe current defaults in `nanovllm_omni\u002Fconfig.py` point to these same repo-relative paths, so after you place the model under `.\u002Fmodels\u002FWan2.2-TI2V-5B-Diffusers`, the explicit `--model` and `--image` flags become optional.\n\nThe CLI also accepts `--negative-prompt`. The default negative prompt already suppresses `ghosting`, `double image`, `duplicate subject`, and `motion trails`.\n\n## Preset\n\n- `quality`: target 480P-class area, `17` frames, `12` steps, `flow_shift=3.0`\n\n## Performance Comparison\n\nSee `bench.py` for the benchmark used below.\n\n**Test Configuration:**\n- Hardware: RTX 3090 24GB\n- Model: `Wan2.2-TI2V-5B-Diffusers`\n- Input: `.\u002Fassets\u002Fi2v_input.JPG`\n- Resolution: `576x768`\n- Frames: `17`\n- Sampler: `Euler`\n- Inference Steps: `12`\n- Metric: post-load `generate` time only, including text embedding and video generation\n- Warmup: `1` run, then `5` timed runs\n\n**Performance Results:**\n\n| Inference Engine | Mean Generate Time (s) | Min (s) | Max (s) | Notes |\n|------------------|------------------------|---------|---------|-------|\n| `vllm-omni`      | `28.2445`              | `28.1874` | `28.3386` | official `0.18.0` |\n| `nano-vllm-omni` | `25.8918`              | `25.0985` | `26.7400` | current implementation |\n\nOn this single-GPU Wan2.2 I2V benchmark, `nano-vllm-omni` is about `9.1%` faster than the official `vllm-omni` path while keeping the codebase small and readable.\n\n## Notes\n\n- `ffmpeg` is required to export frames to `mp4`.\n- Pure full-GPU decode OOMed on this 24GB card at higher resolutions, so CPU offload stays enabled by default.\n- The current implementation is optimized for clarity first: single process, single GPU, one scheduled diffusion step at a time.\n\n\n## Layout\n\n- `config.py`: engine\u002Fruntime configuration\n- `sampling_params.py`: runtime sampling arguments and the validated `quality` preset\n- `request.py`: user-facing request object\n- `cache.py`: CPU-side prompt embedding cache\n- `sched\u002Finterface.py`: scheduler contract and request state\n- `sched\u002Fbase_scheduler.py`: waiting\u002Frunning\u002Ffinished queue bookkeeping\n- `sched\u002Fstep_scheduler.py`: step-wise diffusion scheduler\n- `worker\u002Futils.py`: per-request runner state and runner output\n- `models\u002Finterface.py`: minimal step-execution pipeline protocol\n- `models\u002Fwan22\u002Fpipeline.py`: Wan2.2 TI2V\u002FI2V step-execution pipeline adapter\n- `engine\u002Fmodel_runner.py`: step-wise request execution and state cache\n- `engine\u002Fomni_engine.py`: top-level engine loop\n- `llm.py`: user-facing API\n- `utils.py`: resize\u002Fexport helpers\n\n## Acknowledgements\n\n- [nano-vllm](https:\u002F\u002Fgithub.com\u002FGeeeekExplorer\u002Fnano-vllm) for showing how to turn a high-performance serving stack into a compact, readable teaching implementation.\n- [vllm-omni](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni) for the diffusion-serving architecture and execution model that this project studies and simplifies.\n","Nano-vLLM-Omni 是一个基于 Wan2.2-TI2V-5B-Diffusers 的轻量级扩散模型实现。该项目的核心功能包括明确的请求调度器生命周期管理、简洁易读的代码库（约1079行Python代码）、分步骤执行机制以及最小化的CPU缓存重用路径，以优化内存使用。技术上，它通过显式的模块级CPU卸载支持在24GB 3090 GPU上的运行。适用于需要高效研究和部署文本到图像生成模型的研究人员或开发者，特别是那些希望深入理解扩散模型内部工作原理的人群。项目不涉及分布式执行器、缓存后端或张量并行扩散等高级特性，专注于提供清晰且易于扩展的基础架构。",2,"2026-06-11 04:01:41","CREATED_QUERY"]