[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-70981":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},70981,"StreamDiffusion","cumulo-autumn\u002FStreamDiffusion","cumulo-autumn","StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation","",null,"Python",10757,833,91,101,0,6,12,44,18,43.76,"Apache License 2.0",false,"main",[],"2026-06-12 02:02:46","# StreamDiffusion\n\n[English](.\u002FREADME.md) | [日本語](.\u002FREADME-ja.md) | [한국어](.\u002FREADME-ko.md)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fdemo_07.gif\" width=90%>\n  \u003Cimg src=\".\u002Fassets\u002Fdemo_09.gif\" width=90%>\n\u003C\u002Fp>\n\n# StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation\n\n**Authors:** [Akio Kodaira*](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fakio-kodaira-1a7b98252\u002F), [Chenfeng Xu*](https:\u002F\u002Fwww.chenfengx.com\u002F), Toshiki Hazama*, [Takanori Yoshimoto](https:\u002F\u002Ftwitter.com\u002F__ramu0e__), [Kohei Ohno](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fkohei--ohno\u002F), [Shogo Mitsuhori](https:\u002F\u002Fme.ddpn.world\u002F), [Soichi Sugano](https:\u002F\u002Ftwitter.com\u002Ftoni_nimono), [Hanying Cho](https:\u002F\u002Ftwitter.com\u002Fhanyingcl), [Zhijian Liu](https:\u002F\u002Fzhijianliu.com\u002F), [Masayoshi Tomizuka](https:\u002F\u002Fme.berkeley.edu\u002Fpeople\u002Fmasayoshi-tomizuka\u002F), [Kurt Keutzer](https:\u002F\u002Fscholar.google.com\u002Fcitations?hl=en&user=ID9QePIAAAAJ)\n\nStreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation. It introduces significant performance enhancements to current diffusion-based image generation techniques.\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2312.12491-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.12491)\n[![Hugging Face Papers](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-papers-yellow)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2312.12491)\n\nWe sincerely thank [Taku Fujimoto](https:\u002F\u002Ftwitter.com\u002FAttaQjp) and [Radamés Ajna](https:\u002F\u002Ftwitter.com\u002Fradamar) and Hugging Face team for their invaluable feedback, courteous support, and insightful discussions.\n\n## Key Features\n\n1. **Stream Batch**\n\n   - Streamlined data processing through efficient batch operations.\n\n2. **Residual Classifier-Free Guidance** - [Learn More](#residual-cfg-rcfg)\n\n   - Improved guidance mechanism that minimizes computational redundancy.\n\n3. **Stochastic Similarity Filter** - [Learn More](#stochastic-similarity-filter)\n\n   - Improves GPU utilization efficiency through advanced filtering techniques.\n\n4. **IO Queues**\n\n   - Efficiently manages input and output operations for smoother execution.\n\n5. **Pre-Computation for KV-Caches**\n\n   - Optimizes caching strategies for accelerated processing.\n\n6. **Model Acceleration Tools**\n   - Utilizes various tools for model optimization and performance boost.\n\nWhen images are produced using our proposed StreamDiffusion pipeline in an environment with **GPU: RTX 4090**, **CPU: Core i9-13900K**, and **OS: Ubuntu 22.04.3 LTS**.\n\n|            model            | Denoising Step | fps on Txt2Img | fps on Img2Img |\n| :-------------------------: | :------------: | :------------: | :------------: |\n|          SD-turbo           |       1        |     106.16     |     93.897     |\n| LCM-LoRA \u003Cbr>+\u003Cbr> KohakuV2 |       4        |     38.023     |     37.133     |\n\nFeel free to explore each feature by following the provided links to learn more about StreamDiffusion's capabilities. If you find it helpful, please consider citing our work:\n\n```bash\n@article{kodaira2023streamdiffusion,\n      title={StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation},\n      author={Akio Kodaira and Chenfeng Xu and Toshiki Hazama and Takanori Yoshimoto and Kohei Ohno and Shogo Mitsuhori and Soichi Sugano and Hanying Cho and Zhijian Liu and Kurt Keutzer},\n      year={2023},\n      eprint={2312.12491},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\n## Installation\n\n### Step0: clone this repository\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fcumulo-autumn\u002FStreamDiffusion.git\n```\n\n### Step1: Make Environment\n\nYou can install StreamDiffusion via pip, conda, or Docker(explanation below).\n\n```bash\nconda create -n streamdiffusion python=3.10\nconda activate streamdiffusion\n```\n\nOR\n\n```cmd\npython -m venv .venv\n# Windows\n.\\.venv\\Scripts\\activate\n# Linux\nsource .venv\u002Fbin\u002Factivate\n```\n\n### Step2: Install PyTorch\n\nSelect the appropriate version for your system.\n\nCUDA 11.8\n\n```bash\npip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n```\n\nCUDA 12.1\n\n```bash\npip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n```\n\ndetails: https:\u002F\u002Fpytorch.org\u002F\n\n### Step3: Install StreamDiffusion\n\n#### For User\n\nInstall StreamDiffusion\n\n```bash\n#for Latest Version (recommended)\npip install git+https:\u002F\u002Fgithub.com\u002Fcumulo-autumn\u002FStreamDiffusion.git@main#egg=streamdiffusion[tensorrt]\n\n\n#or\n\n\n#for Stable Version\npip install streamdiffusion[tensorrt]\n```\n\nInstall TensorRT extension\n\n```bash\npython -m streamdiffusion.tools.install-tensorrt\n```\n\n(Only for Windows) You may need to install pywin32 additionally, if you installed Stable Version(`pip install streamdiffusion[tensorrt]`).\n\n```bash\npip install --force-reinstall pywin32\n```\n\n#### For Developer\n\n```bash\npython setup.py develop easy_install streamdiffusion[tensorrt]\npython -m streamdiffusion.tools.install-tensorrt\n```\n\n### Docker Installation (TensorRT Ready)\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fcumulo-autumn\u002FStreamDiffusion.git\ncd StreamDiffusion\ndocker build -t stream-diffusion:latest -f Dockerfile .\ndocker run --gpus all -it -v $(pwd):\u002Fhome\u002Fubuntu\u002Fstreamdiffusion stream-diffusion:latest\n```\n\n## Quick Start\n\nYou can try StreamDiffusion in [`examples`](.\u002Fexamples) directory.\n\n| ![画像3](.\u002Fassets\u002Fdemo_02.gif) | ![画像4](.\u002Fassets\u002Fdemo_03.gif) |\n| :----------------------------: | :----------------------------: |\n| ![画像5](.\u002Fassets\u002Fdemo_04.gif) | ![画像6](.\u002Fassets\u002Fdemo_05.gif) |\n\n## Real-Time Txt2Img Demo\n\nThere is an interactive txt2img demo in [`demo\u002Frealtime-txt2img`](.\u002Fdemo\u002Frealtime-txt2img) directory!\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fdemo_01.gif\" width=100%>\n\u003C\u002Fp>\n\n## Real-Time Img2Img Demo\n\nThere is a real time img2img demo with a live webcam feed or screen capture on a web browser in [`demo\u002Frealtime-img2img`](.\u002Fdemo\u002Frealtime-img2img) directory!\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fimg2img1.gif\" width=100%>\n\u003C\u002Fp>\n\n## Usage Example\n\nWe provide a simple example of how to use StreamDiffusion. For more detailed examples, please refer to [`examples`](.\u002Fexamples) directory.\n\n### Image-to-Image\n\n```python\nimport torch\nfrom diffusers import AutoencoderTiny, StableDiffusionPipeline\nfrom diffusers.utils import load_image\n\nfrom streamdiffusion import StreamDiffusion\nfrom streamdiffusion.image_utils import postprocess_image\n\n# You can load any models using diffuser's StableDiffusionPipeline\npipe = StableDiffusionPipeline.from_pretrained(\"KBlueLeaf\u002Fkohaku-v2.1\").to(\n    device=torch.device(\"cuda\"),\n    dtype=torch.float16,\n)\n\n# Wrap the pipeline in StreamDiffusion\nstream = StreamDiffusion(\n    pipe,\n    t_index_list=[32, 45],\n    torch_dtype=torch.float16,\n)\n\n# If the loaded model is not LCM, merge LCM\nstream.load_lcm_lora()\nstream.fuse_lora()\n# Use Tiny VAE for further acceleration\nstream.vae = AutoencoderTiny.from_pretrained(\"madebyollin\u002Ftaesd\").to(device=pipe.device, dtype=pipe.dtype)\n# Enable acceleration\npipe.enable_xformers_memory_efficient_attention()\n\n\nprompt = \"1girl with dog hair, thick frame glasses\"\n# Prepare the stream\nstream.prepare(prompt)\n\n# Prepare image\ninit_image = load_image(\"assets\u002Fimg2img_example.png\").resize((512, 512))\n\n# Warmup >= len(t_index_list) x frame_buffer_size\nfor _ in range(2):\n    stream(init_image)\n\n# Run the stream infinitely\nwhile True:\n    x_output = stream(init_image)\n    postprocess_image(x_output, output_type=\"pil\")[0].show()\n    input_response = input(\"Press Enter to continue or type 'stop' to exit: \")\n    if input_response == \"stop\":\n        break\n```\n\n### Text-to-Image\n\n```python\nimport torch\nfrom diffusers import AutoencoderTiny, StableDiffusionPipeline\n\nfrom streamdiffusion import StreamDiffusion\nfrom streamdiffusion.image_utils import postprocess_image\n\n# You can load any models using diffuser's StableDiffusionPipeline\npipe = StableDiffusionPipeline.from_pretrained(\"KBlueLeaf\u002Fkohaku-v2.1\").to(\n    device=torch.device(\"cuda\"),\n    dtype=torch.float16,\n)\n\n# Wrap the pipeline in StreamDiffusion\n# Requires more long steps (len(t_index_list)) in text2image\n# You recommend to use cfg_type=\"none\" when text2image\nstream = StreamDiffusion(\n    pipe,\n    t_index_list=[0, 16, 32, 45],\n    torch_dtype=torch.float16,\n    cfg_type=\"none\",\n)\n\n# If the loaded model is not LCM, merge LCM\nstream.load_lcm_lora()\nstream.fuse_lora()\n# Use Tiny VAE for further acceleration\nstream.vae = AutoencoderTiny.from_pretrained(\"madebyollin\u002Ftaesd\").to(device=pipe.device, dtype=pipe.dtype)\n# Enable acceleration\npipe.enable_xformers_memory_efficient_attention()\n\n\nprompt = \"1girl with dog hair, thick frame glasses\"\n# Prepare the stream\nstream.prepare(prompt)\n\n# Warmup >= len(t_index_list) x frame_buffer_size\nfor _ in range(4):\n    stream()\n\n# Run the stream infinitely\nwhile True:\n    x_output = stream.txt2img()\n    postprocess_image(x_output, output_type=\"pil\")[0].show()\n    input_response = input(\"Press Enter to continue or type 'stop' to exit: \")\n    if input_response == \"stop\":\n        break\n```\n\nYou can make it faster by using SD-Turbo.\n\n### Faster generation\n\nReplace the following code in the above example.\n\n```python\npipe.enable_xformers_memory_efficient_attention()\n```\n\nTo\n\n```python\nfrom streamdiffusion.acceleration.tensorrt import accelerate_with_tensorrt\n\nstream = accelerate_with_tensorrt(\n    stream, \"engines\", max_batch_size=2,\n)\n```\n\nIt requires TensorRT extension and time to build the engine, but it will be faster than the above example.\n\n## Optionals\n\n### Stochastic Similarity Filter\n\n![demo](assets\u002Fdemo_06.gif)\n\nStochastic Similarity Filter reduces processing during video input by minimizing conversion operations when there is little change from the previous frame, thereby alleviating GPU processing load, as shown by the red frame in the above GIF. The usage is as follows:\n\n```python\nstream = StreamDiffusion(\n    pipe,\n    [32, 45],\n    torch_dtype=torch.float16,\n)\nstream.enable_similar_image_filter(\n    similar_image_filter_threshold,\n    similar_image_filter_max_skip_frame,\n)\n```\n\nThere are the following parameters that can be set as arguments in the function:\n\n#### `similar_image_filter_threshold`\n\n- The threshold for similarity between the previous frame and the current frame before the processing is paused.\n\n#### `similar_image_filter_max_skip_frame`\n\n- The maximum interval during the pause before resuming the conversion.\n\n### Residual CFG (RCFG)\n\n![rcfg](assets\u002Fcfg_conparision.png)\n\nRCFG is a method for approximately realizing CFG with competitive computational complexity compared to cases where CFG is not used. It can be specified through the cfg_type argument in the StreamDiffusion. There are two types of RCFG: one with no specified items for negative prompts RCFG Self-Negative and one where negative prompts can be specified RCFG Onetime-Negative. In terms of computational complexity, denoting the complexity without CFG as N and the complexity with a regular CFG as 2N, RCFG Self-Negative can be computed in N steps, while RCFG Onetime-Negative can be computed in N+1 steps.\n\nThe usage is as follows:\n\n```python\n# w\u002F0 CFG\ncfg_type = \"none\"\n# CFG\ncfg_type = \"full\"\n# RCFG Self-Negative\ncfg_type = \"self\"\n# RCFG Onetime-Negative\ncfg_type = \"initialize\"\nstream = StreamDiffusion(\n    pipe,\n    [32, 45],\n    torch_dtype=torch.float16,\n    cfg_type=cfg_type,\n)\nstream.prepare(\n    prompt=\"1girl, purple hair\",\n    guidance_scale=guidance_scale,\n    delta=delta,\n)\n```\n\nThe delta has a moderating effect on the effectiveness of RCFG.\n\n## Development Team\n\n[Aki](https:\u002F\u002Ftwitter.com\u002Fcumulo_autumn),\n[Ararat](https:\u002F\u002Ftwitter.com\u002FAttaQjp),\n[Chenfeng Xu](https:\u002F\u002Ftwitter.com\u002FChenfeng_X),\n[ddPn08](https:\u002F\u002Ftwitter.com\u002FddPn08),\n[kizamimi](https:\u002F\u002Ftwitter.com\u002FArtengMimi),\n[ramune](https:\u002F\u002Ftwitter.com\u002F__ramu0e__),\n[teftef](https:\u002F\u002Ftwitter.com\u002Fhanyingcl),\n[Tonimono](https:\u002F\u002Ftwitter.com\u002Ftoni_nimono),\n[Verb](https:\u002F\u002Ftwitter.com\u002FIMG_5955),\n\n(\\*alphabetical order)\n\u003C\u002Fbr>\n\n## Acknowledgements\n\nThe video and image demos in this GitHub repository were generated using [LCM-LoRA](https:\u002F\u002Fhuggingface.co\u002Flatent-consistency\u002Flcm-lora-sdv1-5) + [KohakuV2](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F136268\u002Fkohaku-v2) and [SD-Turbo](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.17042).\n\nSpecial thanks to [LCM-LoRA authors](https:\u002F\u002Flatent-consistency-models.github.io\u002F) for providing the LCM-LoRA and Kohaku BlueLeaf ([@KBlueleaf](https:\u002F\u002Ftwitter.com\u002FKBlueleaf)) for providing the KohakuV2 model and ,to [Stability AI](https:\u002F\u002Fja.stability.ai\u002F) for [SD-Turbo](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.17042).\n\nKohakuV2 Models can be downloaded from [Civitai](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F136268\u002Fkohaku-v2) and [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FKBlueLeaf\u002Fkohaku-v2.1).\n\nSD-Turbo is also available on [Hugging Face Space](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fsd-turbo).\n\n## Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcumulo-autumn\u002FStreamDiffusion\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Fcontrib.rocks\u002Fimage?repo=cumulo-autumn\u002FStreamDiffusion\" \u002F>\n\u003C\u002Fa>\n","StreamDiffusion 是一个专为实时交互生成设计的创新扩散管道。该项目通过引入流批处理、残差无分类器引导、随机相似性过滤等核心功能，显著提升了现有基于扩散的图像生成技术的性能。它特别适用于需要高效GPU利用率和快速响应的场景，如实时图像编辑或内容创作平台。StreamDiffusion 采用Python开发，并在多种硬件配置下进行了优化测试，确保了其广泛的适用性和高性能表现。",2,"2026-06-11 03:35:18","high_star"]