[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74207":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":14,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":45,"readmeContent":46,"aiSummary":47,"trendingCount":16,"starSnapshotCount":16,"syncStatus":48,"lastSyncTime":49,"discoverSource":50},74207,"Helios","PKU-YuanGroup\u002FHelios","PKU-YuanGroup","Helios: Real Real-Time Long Video Generation Model","https:\u002F\u002Fpku-yuangroup.github.io\u002FHelios-Page",null,"Python",1900,150,9,32,0,28,112,27,96.54,"Apache License 2.0",false,"main",[25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],"acceleration","diffusion","diffusion-model","diffusion-models","efficient-tuning","high-quality","image-to-video","image2video","interactive","long-context","long-video-generation","real-time","text-to-video","text2video","video-generation","video-generator","video-to-video","video2video","world-model","world-models","2026-06-12 04:01:13","\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FSHYuanBest\u002Fshyuanbest_media\u002Fmain\u002FHelios\u002Flogo_white.png\" width=\"300px\">\n\u003C\u002Fdiv>\n\n\u003Ch1 align=\"center\">Helios: Real Real-Time Long Video Generation Model\u003C\u002Fh1>\n\n\u003Ch5 align=\"center\">⭐ 14B Real-Time Long Video Generation Model can be Cheaper, Faster but Keep Stronger than 1.3B ones ⭐\u003C\u002Fh5>\n\n\u003Ch5 align=\"center\">\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2603.04379-b31b1b.svg?logo=arxiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.04379)\n[![hf_paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-Paper%20In%20HF-red.svg)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2603.04379)\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Website-2ea44f)](https:\u002F\u002Fpku-yuangroup.github.io\u002FHelios-Page)\n[![hf_space](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-Gradio-00b4d8.svg)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FBestWishYsh\u002FHelios-14B-RealTime)\n[![HuggingFace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-blue)](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FBestWishYsh\u002Fhelios)\n[![ModelScope](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤖-ModelScope-purple)](https:\u002F\u002Fmodelscope.cn\u002Fcollections\u002FBestWishYSH\u002FHelios)\n[![GitHub](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-black?logo=github)](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FHelios)\n[![GitCode](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitCodes-blue?logo=gitcode)](https:\u002F\u002Fgitcode.com\u002Fweixin_47617277\u002FHelios)\n\n[![Ascend](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FInference-Ascend--NPU-red)](https:\u002F\u002Fwww.hiascend.com\u002F)\n[![Diffusers](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FInference-Diffusers-blueviolet)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers\u002Fpull\u002F13208)\n[![SGLang Diffusion](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBackend-SGLang--Diffusion-yellow)](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Fpull\u002F19782)\n[![vLLM-Omni](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBackend-vLLM--Omni-orange)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni\u002Fpull\u002F1604)\n\n\n\n\u003C\u002Fh5>\n\n\u003Cdiv align=\"center\">\nThis repository is the official implementation of Helios, which is a breakthrough video generation model that achieves minute-scale, high-quality video synthesis at \u003Cstrong>19.5 FPS on a single H100 GPU\u003C\u002Fstrong> (about 10 FPS on a single Ascend NPU) —without relying on conventional long video anti-drifting strategies or standard video acceleration techniques.\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n## ✨ Highlights\n\n\n1. **Without commonly used anti-drifting strategies** (e.g., self-forcing, error-banks, keyframe sampling, or inverted sampling), Helios generates minute-scale videos with high quality and strong coherence.\n\n2. **Without standard acceleration techniques** (e.g., KV-cache, causal masking, sparse\u002Flinear attention, TinyVAE, progressive noise schedules, hidden-state caching, or quantization), Helios achieves 19.5 FPS in end-to-end inference on a single H100 GPU.\n\n3. **We introduce optimizations that improve both training and inference throughput while reducing memory consumption,** enabling image-diffusion-scale batch sizes during training while fitting up to four 14B models within 80 GB of GPU memory.\n\n\n\n## 🎬 Video Demos\n\n[![Demo Video of Helios](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1d10da4a-aba9-4ac1-ab02-cd0dfce8d35b)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vd_AgHtOUFQ)\nor you can click \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FHelios-Page\u002Fblob\u002Fmain\u002Fvideos\u002Fhelios_features.mp4\">here\u003C\u002Fa> to get the video. Some best prompts are [here](.\u002Fexample\u002Fprompt.txt).\n\n\n## 📣 Latest News!!\n\n* `[2026.03.26]` 🔥 Add summary of FAQ, Tips, and Tutorals: https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FHelios\u002Fissues\u002F47.\n* `[2026.03.24]` 👋 A community-made, unofficial YouTube tutorial for Helios is available [here](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=AvFniggt6qg). It covers installation on a **consumer-grade PC** and supports **4K video generation**.\n* `[2026.03.20]` 🚀 Helios now supports [Ahead-of-Time Compilation (AOTI)](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Fzerogpu-aoti) on Spaces, with special thanks to the HuggingFace Team! Please refer to [this Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FBestWishYsh\u002FHelios-14B-RealTime-AOTI) for a usage example.\n* `[2026.03.20]` 🔧 Based on [issue #38](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FHelios\u002Fissues\u002F38), we've identified several ways to further improve Helios's performance, such as fixing the i2v train-inference inconsistency and fully enabling Easy Anti-Drifting. Please refer to [commits](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FHelios\u002Fcommits\u002Fmain\u002F) and [correct.yaml](.\u002Fscripts\u002Ftraining\u002Fconfigs\u002Fcorrect.yaml) for details.\n* `[2026.03.12]` ⚡️ Please note that real-time generation performance depends not only on the GPU, but also on the CPU, memory, CUDA driver version, etc. As [tested by a user](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FHelios\u002Fissues\u002F3#issuecomment-4034710182) on better hardware with single H100, Helios can reach up to **20.89 FPS**!\n* `[2026.03.08]` 🚀 Helios now fully supports [Group Offloading](#-group-offloading-to-save-vram) and [Context Parallelism](#-context-parallelism-on-multiple-gpus)! These features significantly optimize VRAM (**only ~6GB**) usage and enable inference across multiple GPUs with *Ulysses Attention*, *Ring Attention*, *Unified Attention*, and *Ulysses Anything Attention*.\n* `[2026.03.06]` 👋 [Cache-DiT](https:\u002F\u002Fgithub.com\u002Fvipshop\u002Fcache-dit\u002Fpull\u002F834) now supports Helios, it offers Fully Cache Acceleration and Parallelism support for Helios! Special thanks to the Cache-DiT Team for their amazing work.\n* `[2026.03.06]` 🔧 We fix the Parallel Inference logits for Helios, and provide an example [here](#-context-parallelism-on-multiple-gpus).\n* `[2026.03.06]` 🚀 We official release the [Gradio Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FBestWishYsh\u002FHelios-14B-RealTime), welcome to try it.\n* `[2026.03.05]` 🔥 We are excited to announce the release of the Helios [technical report](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.04379) on arXiv. We welcome discussions and feedback!\n* `[2026.03.04]` 👋 Day-0 support for [Ascend-NPU](https:\u002F\u002Fwww.hiascend.com)，with sincere gratitude to the Ascend Team for their support.\n* `[2026.03.04]` 👋 Day-0 support for [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers\u002Fpull\u002F13208)，with special thanks to the HuggingFace Team for their support.\n* `[2026.03.04]` 👋 Day-0 support for [SGLang-Diffusion](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Fpull\u002F19782)，with huge thanks to the SGLang Team for their support.\n* `[2026.03.04]` 👋 Day-0 support for [vLLM-Omni](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni\u002Fpull\u002F1604)，with heartfelt gratitude to the vLLM Team for their support.\n* `[2026.03.04]` 🔥 We've released the training\u002Finference code and weights of **Helios-Base**, **Helios-Mid** and **Helios-Distilled**.\n\n\n## 🔥 Friendly Links\n\nIf your work has improved **Helios** and you would like more people to see it, please inform us.\n\n* [Ascend-NPU](https:\u002F\u002Fwww.hiascend.com\u002F): Developed by Huawei, this hardware is designed for efficient AI model training and inference, boosting performance in tasks like computer vision, natural language processing, and autonomous driving.\n* [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers\u002Fpull\u002F13208): A popular library designed for working with diffusion models and other generative models in deep learning. It supports easy integration and manipulation of a wide range of generative models.\n* [SGLang-Diffusion](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Fpull\u002F19782): An inference framework for accelerated image and video generation using diffusion models. It provides an end-to-end unified pipeline with optimized kernels and an efficient scheduler loop.\n* [vLLM-Omni](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni\u002Fpull\u002F1604): A fully disaggregated serving system for any-to-any models. vLLM-Omni breaks complex architectures into a stage-based graph, using a decoupled backend to maximize resource efficiency and throughput.\n* [Cache-DiT](https:\u002F\u002Fgithub.com\u002Fvipshop\u002Fcache-dit\u002Fpull\u002F834): A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs. It built on top of the Diffusers library and now supports nearly ALL DiTs from Diffusers.\n\n## ⚙️ Requirements and Installation\n\n### Video Tutorial\n\nIf you prefer a step-by-step walkthrough, check out this **community-made** [YouTube Tutorial](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=AvFniggt6qg). It covers local installation, 4K video generation, and how to run Helios on a **consumer-grade PC**, along with other practical usage tips.\n\n### Prepare Environment\n\n```bash\n# 0. Clone the repo\ngit clone --depth=1 https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FHelios.git\ncd Helios\n\n# 1. Create conda environment\nconda create -n helios python=3.11.2\nconda activate helios\n\n# 2. Install PyTorch (adjust for your CUDA version)\n# CUDA 12.6\npip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu126\n# CUDA 12.8\npip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n# CUDA 13.0\npip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu130\n\n# 3. Install dependencies\nbash install.sh\n```\n\n### Model Download\n\n| Models           | Download Link                                                                                                                                            | Supports                                      | Notes                                                                                       |\n|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------|\n| Helios-Base      | 🤗 [Huggingface](https:\u002F\u002Fhuggingface.co\u002FBestWishYsh\u002FHelios-Base) 🤖 [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FBestWishYSH\u002FHelios-Base)                | T2V ✅ I2V ✅ V2V ✅ Interactive ✅           | Best Quality, with v-prediction, standard CFG and custom HeliosScheduler.                   |\n| Helios-Mid       | 🤗 [Huggingface](https:\u002F\u002Fhuggingface.co\u002FBestWishYsh\u002FHelios-Mid) 🤖 [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FBestWishYSH\u002FHelios-Mid)                  | T2V ✅ I2V ✅ V2V ✅ Interactive ✅           | Intermediate Ckpt, with v-prediction, CFG-Zero* and custom HeliosScheduler.                 |\n| Helios-Distilled | 🤗 [Huggingface](https:\u002F\u002Fhuggingface.co\u002FBestWishYsh\u002FHelios-Distilled) 🤖 [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FBestWishYSH\u002FHelios-Distilled)      | T2V ✅ I2V ✅ V2V ✅ Interactive ✅           | Best Efficiency, with x0-prediction and custom HeliosDMDScheduler.                          |\n\n\n\n> 💡Note: \n> * All three models share the same architecture, but Helios-Mid and Helios-Distilled use a more aggressive multi-scale sampling pipeline to achieve better efficiency.\n> * Helios-Mid is an intermediate checkpoint generated in the process of distilling Helios-Base into Helios-Distilled, and may not meet expected quality.\n> * For Image-to-Video or Video-to-Video, since training is based on Text-to-Video, these two functions may be slightly inferior to Text-to-Video. You may enable `is_skip_first_chunk` if you find the first few chunks are static or imporve the value of `image_noise_sigma_min`, `image_noise_sigma_max`, `video_noise_sigma_min`, and `video_noise_sigma_max`.\n\n\nDownload models using huggingface-cli:\n``` sh\npip install \"huggingface_hub[cli]\"\nhuggingface-cli download BestWishYSH\u002FHelios-Base --local-dir BestWishYSH\u002FHelios-Base\nhuggingface-cli download BestWishYSH\u002FHelios-Mid --local-dir BestWishYSH\u002FHelios-Mid\nhuggingface-cli download BestWishYSH\u002FHelios-Distilled --local-dir BestWishYSH\u002FHelios-Distilled\n```\n\nDownload models using modelscope-cli:\n``` sh\npip install modelscope\nmodelscope download BestWishYSH\u002FHelios-Base --local_dir BestWishYSH\u002FHelios-Base\nmodelscope download BestWishYSH\u002FHelios-Mid --local_dir BestWishYSH\u002FHelios-Mid\nmodelscope download BestWishYSH\u002FHelios-Distilled --local_dir BestWishYSH\u002FHelios-Distilled\n```\n\n## 🚀 Inference\n\n\nHelios uses an autoregressive approach that generates **33 frames per chunk**. For optimal performance, `num_frames` should be set to a multiple of `33`. If a non-multiple value is provided, it will be automatically rounded up to the nearest multiple of 33.\n\n**Example frame counts for different video lengths:**\n\n| num_frames | Adjusted Frames | 24 FPS | 16 FPS |\n|------------|-----------------|--------|--------|\n| 1449       | 1452 (33×44)    | ~60s (1min) | ~90s (1min 30s) |\n| 720        | 726 (33×22)     | ~30s | ~45s |\n| 240        | 264 (33×8)      | ~11s | ~16s |\n| 129        | 132 (33×4)      | ~5.5s | ~8s |\n| 81         | 99  (33×3)      | ~4s | ~6s |\n\n### Run the model\n\nWe provide inference scripts for all models covering text-to-video, image-to-video, and video-to-video in this [directory](.\u002Fscripts\u002Finference).\n\n```bash\ncd scripts\u002Finference\n\n# For Helios-Base\nbash helios-base_t2v.sh\nbash helios-base_i2v.sh\nbash helios-base_v2v.sh\n\n# For Helios-Mid\nbash helios-mid_t2v.sh\nbash helios-mid_i2v.sh\nbash helios-mid_v2v.sh\n\n# For Helios-Distilled\nbash helios-distilled_t2v.sh\nbash helios-distilled_i2v.sh\nbash helios-distilled_v2v.sh\n\n# For Interactive\n# ⚠️ This feature is still under development — results may not always meet expectations\ncd scripts\u002Finference\u002Fexperiment_interactive\n```\n\n### Sanity Check\n\nBefore trying your own inputs, we highly recommend going through the sanity check to find out if any hardware or software went wrong.\n\n| Task    | **Helios-Base**                                                                                                            | **Helios-Mid**                                                                                                             | **Helios-Distilled**                                                                                                       |\n| ------- | -------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |\n| **T2V** | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F14e10753-0366-4790-ad8f-7b66d821ed11\" controls width=\"240\">\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc1778691-a80b-428c-8094-88bb1dd1d52b\" controls width=\"240\">\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4ca28c79-9dfa-49de-9c3a-f4c7b6c766cd\" controls width=\"240\">\u003C\u002Fvideo> |\n| **V2V** | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F420cb572-85c2-42d8-98d7-37b0bc24c844\" controls width=\"240\">\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7d703fa6-dc1a-4138-a897-e58cfd9236d6\" controls width=\"240\">\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F45329c55-1a25-459c-bbf0-4e584ec5b23d\" controls width=\"240\">\u003C\u002Fvideo> |\n\n\n### ✨ Group Offloading to Save VRAM\n\nHelios supports group offloading to significantly reduce VRAM consumption, allowing you to run on GPU with limited memory footprint. For more details on the underlying mechanics, please refer to the [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fmain\u002Fen\u002Foptimization\u002Fmemory#group-offloading).\n\nThe Helios model below requires `~6GB of VRAM`.\n\n\u003Cdetails>\n  \u003Csummary>Click to expand the code\u003C\u002Fsummary>\n\n  ```bash\n  CUDA_VISIBLE_DEVICES=0 python infer_helios.py \\\n      --base_model_path \"BestWishYsh\u002FHelios-Distilled\" \\\n      --transformer_path \"BestWishYsh\u002FHelios-Distilled\" \\\n      --sample_type \"t2v\" \\\n      --prompt \"A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement.\" \\\n      --num_frames 240 \\\n      --guidance_scale 1.0 \\\n      --is_enable_stage2 \\\n      --pyramid_num_inference_steps_list 2 2 2 \\\n      --is_amplify_first_chunk \\\n      --output_folder \".\u002Foutput_helios\u002Fhelios-distilled\" \\\n      --enable_low_vram_mode \\\n      --group_offloading_type \"leaf_level\"\n  ```\n  \n\u003C\u002Fdetails>\n\n### ✨ Context Parallelism on Multiple GPUs\nHelios supports various Context Parallelism mechanisms, including `Ulysses Attention`, `Ring Attention`, `Unified Attention`, and `Ulysses Anything Attention`. For more details, please refer to the [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fmain\u002Fen\u002Ftraining\u002Fdistributed_inference#context-parallelism).\n\nFor example, let's take Helios-Base with 4 GPUs.\n\n\u003Cdetails>\n  \u003Csummary>Click to expand the code\u003C\u002Fsummary>\n\n  ```bash\n  CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 infer_helios.py \\\n      --enable_parallelism \\     #  remember to enable this config\n      --cp_backend \"ulysses\" \\   #  [\"ring\", \"ulysses\", \"unified\", \"ulysses_anything\"]\n      --base_model_path \"BestWishYsh\u002FHelios-Base\" \\\n      --transformer_path \"BestWishYsh\u002FHelios-Base\" \\\n      --sample_type \"t2v\" \\\n      --num_frames 99 \\\n      --fps 24 \\\n      --prompt \"A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement.\" \\\n      --guidance_scale 5.0 \\\n      --output_folder \".\u002Foutput_helios\u002Fhelios-base\"\n  ```\n  \n\u003C\u002Fdetails>\n\n### ✨ Diffusers Pipeline\n\nInstall diffusers from source:\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers.git\n```\n\nFor example, let's take Helios-Distilled (**Standard Pipeline**).\n\n\u003Cdetails>\n  \u003Csummary>Click to expand the code\u003C\u002Fsummary>\n\n  ```bash\n  import torch\n  from diffusers import AutoModel, HeliosPyramidPipeline\n  from diffusers.utils import export_to_video, load_video, load_image\n\n  vae = AutoModel.from_pretrained(\"BestWishYsh\u002FHelios-Distilled\", subfolder=\"vae\", torch_dtype=torch.float32)\n\n  pipeline = HeliosPyramidPipeline.from_pretrained(\n      \"BestWishYsh\u002FHelios-Distilled\",\n      vae=vae,\n      torch_dtype=torch.bfloat16\n  )\n  pipeline.to(\"cuda\")\n\n  negative_prompt = \"\"\"\n  Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality,\n  low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured,\n  misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards\n  \"\"\"\n\n  # --- T2V ---\n  prompt = \"\"\"\n  A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue \n  and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with \n  a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, \n  allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades \n  of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and \n  the vivid colors of its surroundings. A close-up shot with dynamic movement.\n  \"\"\"\n\n  output = pipeline(\n      prompt=prompt,\n      negative_prompt=negative_prompt,\n      num_frames=240,\n      pyramid_num_inference_steps_list=[2, 2, 2],\n      guidance_scale=1.0,\n      is_amplify_first_chunk=True,\n      generator=torch.Generator(\"cuda\").manual_seed(42),\n  ).frames[0]\n  export_to_video(output, \"helios_distilled_t2v_output.mp4\", fps=24)\n\n  # --- I2V ---\n  i2v_prompt = \"\"\"\n  A towering emerald wave surges forward, its crest curling with raw power and energy. Sunlight glints off the translucent water, \n  illuminating the intricate textures and deep green hues within the wave’s body. A thick spray erupts from the breaking crest, \n  casting a misty veil that dances above the churning surface. As the perspective widens, the immense scale of the wave becomes \n  apparent, revealing the restless expanse of the ocean stretching beyond. The scene captures the ocean’s untamed beauty and \n  relentless force, with every droplet and ripple shimmering in the light. The dynamic motion and vivid colors evoke both awe and \n  respect for nature’s might.\n  \"\"\"\n  image_path = \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fdiffusers\u002Fhelios\u002Fwave.jpg\"\n\n  output = pipeline(\n      prompt=i2v_prompt,\n      negative_prompt=negative_prompt,\n      image=load_image(image_path).resize((640, 384)),\n      num_frames=240,\n      pyramid_num_inference_steps_list=[2, 2, 2],\n      guidance_scale=1.0,\n      is_amplify_first_chunk=True,\n      generator=torch.Generator(\"cuda\").manual_seed(42),\n  ).frames[0]\n  export_to_video(output, \"helios_distilled_i2v_output.mp4\", fps=24)\n\n  # --- V2V ---\n  v2v_prompt = \"\"\"\n  A bright yellow Lamborghini Huracn Tecnica speeds along a curving mountain road, surrounded by lush green trees \n  under a partly cloudy sky. The car's sleek design and vibrant color stand out against the natural backdrop, \n  emphasizing its dynamic movement. The road curves gently, with a guardrail visible on one side, adding depth to \n  the scene. The motion blur captures the sense of speed and energy, creating a thrilling and exhilarating atmosphere. \n  A front-facing shot from a slightly elevated angle, highlighting the car's aggressive stance and the surrounding greenery.\n  \"\"\"\n  video_path = \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fdiffusers\u002Fhelios\u002Fcar.mp4\"\n\n  output = pipeline(\n      prompt=v2v_prompt,\n      negative_prompt=negative_prompt,\n      video=load_video(video_path),\n      num_frames=240,\n      pyramid_num_inference_steps_list=[2, 2, 2],\n      guidance_scale=1.0,\n      is_amplify_first_chunk=True,\n      generator=torch.Generator(\"cuda\").manual_seed(42),\n  ).frames[0]\n  export_to_video(output, \"helios_distilled_v2v_output.mp4\", fps=24)\n  ```\n\n\u003C\u002Fdetails>\n\nFor example, let's take Helios-Distilled (**Modular Pipeline**).\n\n\u003Cdetails>\n  \u003Csummary>Click to expand the code\u003C\u002Fsummary>\n\n  ```bash\n  import torch\n  from diffusers import ModularPipeline, ClassifierFreeGuidance\n  from diffusers.utils import export_to_video, load_image, load_video\n\n  mod_pipe = ModularPipeline.from_pretrained(\"BestWishYsh\u002FHelios-Distilled\")\n  mod_pipe.load_components(torch_dtype=torch.bfloat16)\n  mod_pipe.to(\"cuda\")\n\n  # we need to upload guider to the model repo, so each checkpoint will be able to config their guidance differently\n  guider = ClassifierFreeGuidance(guidance_scale=1.0)\n  mod_pipe.update_components(guider=guider)\n\n  # --- T2V ---\n  print(\"=== T2V ===\")\n  prompt = (\n      \"A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. \"\n      \"The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving \"\n      \"fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and \"\n      \"sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef \"\n      \"itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures \"\n      \"the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. \"\n      \"A close-up shot with dynamic movement.\"\n  )\n\n  output = mod_pipe(\n      prompt=prompt,\n      height=384,\n      width=640,\n      num_frames=240,\n      pyramid_num_inference_steps_list=[2, 2, 2],\n      is_amplify_first_chunk=True,\n      generator=torch.Generator(\"cuda\").manual_seed(42),\n      output=\"videos\",\n  )\n\n  export_to_video(output[0], \"helios_distilled_modular_t2v_output.mp4\", fps=24)\n  print(f\"T2V max memory: {torch.cuda.max_memory_allocated() \u002F 1024**3:.3f} GB\")\n  torch.cuda.empty_cache()\n  torch.cuda.reset_peak_memory_stats()\n\n  # --- I2V ---\n  print(\"=== I2V ===\")\n  image = load_image(\n      \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fdiffusers\u002Fhelios\u002Fwave.jpg\"\n  )\n  i2v_prompt = (\n      \"A towering emerald wave surges forward, its crest curling with raw power and energy. \"\n      \"Sunlight glints off the translucent water, illuminating the intricate textures and deep green hues within the wave's body.\"\n  )\n\n  output = mod_pipe(\n      prompt=i2v_prompt,\n      image=image,\n      height=384,\n      width=640,\n      num_frames=240,\n      pyramid_num_inference_steps_list=[2, 2, 2],\n      is_amplify_first_chunk=True,\n      generator=torch.Generator(\"cuda\").manual_seed(42),\n      output=\"videos\",\n  )\n\n  export_to_video(output[0], \"helios_distilled_modular_i2v_output.mp4\", fps=24)\n  print(f\"I2V max memory: {torch.cuda.max_memory_allocated() \u002F 1024**3:.3f} GB\")\n  torch.cuda.empty_cache()\n  torch.cuda.reset_peak_memory_stats()\n\n  # --- V2V ---\n  print(\"=== V2V ===\")\n  video = load_video(\n      \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fdiffusers\u002Fhelios\u002Fcar.mp4\"\n  )\n  v2v_prompt = (\n      \"A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train. \"\n      \"The camera captures various elements such as lush green fields, towering trees, quaint countryside houses, \"\n      \"and distant mountain ranges passing by quickly.\"\n  )\n\n  output = mod_pipe(\n      prompt=v2v_prompt,\n      video=video,\n      height=384,\n      width=640,\n      num_frames=240,\n      pyramid_num_inference_steps_list=[2, 2, 2],\n      is_amplify_first_chunk=True,\n      generator=torch.Generator(\"cuda\").manual_seed(42),\n      output=\"videos\",\n  )\n\n  export_to_video(output[0], \"helios_distilled_modular_v2v_output.mp4\", fps=24)\n  print(f\"V2V max memory: {torch.cuda.max_memory_allocated() \u002F 1024**3:.3f} GB\")\n  ```\n\n\u003C\u002Fdetails>\n\n### ✨ vLLM-Omni Pipeline\n\nInstall vllm-omni from source:\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni.git\n```\n\nFor example, let's take Text-to-Video.\n\n\u003Cdetails>\n  \u003Csummary>Click to expand the code\u003C\u002Fsummary>\n\n  ```bash\n  cd vllm-omni\n\n  # Helios-Base\n  python3 examples\u002Foffline_inference\u002Fhelios\u002Fend2end.py \\\n    --sample-type t2v \\\n    --model .\u002FHelios-Base \\\n    --prompt \"A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement.\" \\\n    --num-frames 99 \\\n    --seed 42 \\\n    --output helios_t2v_base.mp4\n\n  # Helios-Mid\n  python examples\u002Foffline_inference\u002Fhelios\u002Fend2end.py \\\n    --model .\u002FHelios-Mid --sample-type t2v \\\n    --prompt \"A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement.\" \\\n    --guidance-scale 5.0 --is-enable-stage2 \\\n    --pyramid-num-inference-steps-list 20 20 20 \\\n    --num-frames 99 \\\n    --use-cfg-zero-star --use-zero-init --zero-steps 1 \\\n    --output helios_t2v_mid.mp4\n\n  # Helios-Distilled\n  python examples\u002Foffline_inference\u002Fhelios\u002Fend2end.py \\\n    --model .\u002FHelios-Distilled --sample-type t2v \\\n    --prompt \"A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement.\" \\\n    --num-frames 240 --guidance-scale 1.0 --is-enable-stage2 \\\n    --pyramid-num-inference-steps-list 2 2 2 \\\n    --is-amplify-first-chunk --output helios_t2v_distilled.mp4\n  ```\n\u003C\u002Fdetails>\n\n### ✨ SGLang-Diffusion Pipeline\n\nInstall sglang-diffusion from source:\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang.git\n```\n\nFor example, let's take Helios-Base. **(Native Support)**\n\n\u003Cdetails>\n  \u003Csummary>Click to expand the code\u003C\u002Fsummary>\n\n  ```bash\n  sglang generate \\\n    --model-path BestWishYsh\u002FHelios-Base \\\n    --prompt \"A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement.\" \\\n    --negative-prompt \"Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards\" \\\n    --height 384 \\\n    --width 640 \\\n    --num-frames 99 \\\n    --num-inference-steps 50 \\\n    --guidance-scale 5.0\n  ```\n\u003C\u002Fdetails>\n\nFor example, let's take Helios-Base. **(Diffusers Backend)**\n\n\u003Cdetails>\n  \u003Csummary>Click to expand the code\u003C\u002Fsummary>\n\n  ```bash\n  sglang generate \\\n    --model-path BestWishYsh\u002FHelios-Base \\\n    --prompt \"A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement.\" \\\n    --negative-prompt \"Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards\" \\\n    --height 384 \\\n    --width 640 \\\n    --num-frames 99 \\\n    --num-inference-steps 50 \\\n    --guidance-scale 5.0 \\\n    --backend diffusers\n  ```\n\u003C\u002Fdetails>\n\n## 🗝️ Training\n\nWe use a three-stage progressive pipeline, all the setting can be found [here](.\u002Fscripts\u002Ftraining\u002Fconfigs\u002F). Stage-1 (Base) performs architectural adaptation: we apply Unified History Injection, Easy Anti-Drifting, and Multi-Term Memory Patchification to convert the bidirectional pretrained model into an autoregressive generator. Stage-2 (Mid) targets token compression by introducing Pyramid Unified Predictor Corrector, which aggressively reduces the number of noisy tokens and thus the overall computation. Stage-3 (Distilled) applies Adversarial Hierarchical Distillation, reducing the sampling steps from 50 to 3 and eliminating the need for classifier-free guidance (CFG). Throughout training, we apply dynamic shifting to all timestep-dependent operations to match the noise schedule to the latent size.\n\n### Data Preparation\n\nPlease refer to [this guide](.\u002Ftools\u002Foffload_data\u002FREADME.md) for how to obtain the training data required by Helios. And we prepare a toy training data [here](https:\u002F\u002Fhuggingface.co\u002FBestWishYsh\u002FHeliosBench-Weights\u002Ftree\u002Fmain\u002Fdemo_data).\n\n### Run the model\n\n```bash\n# Use DDP\nbash scripts\u002Ftraining\u002Ftrain_ddp.sh\n\n# or\n\n# Use DeepSpeed\nbash scripts\u002Ftraining\u002Ftrain_deepspeed.sh\n```\n\nTraining configuration can be adjusted in `scripts\u002Ftraining\u002Fconfigs`. You can use `scripts\u002Ftraining\u002Fcompare_yaml.py` to check for configuration completeness or differences between stages.\n\n### Model Merging\n\nAfter training, you can use this [script](.\u002Ftools\u002Fmerge_lora_for_helios.py) to merge all the checkpoints and obtain the final safetensors file, similar to [this](https:\u002F\u002Fhuggingface.co\u002FBestWishYsh\u002FHelios-Distilled\u002Ftree\u002Fmain\u002Ftransformer).\n\n\n## 📊 HeliosBench\n\nHeliosBench is a specialized benchmark for real-time long-video generation, please refer to [this guide](.\u002Feval\u002FREADME.md) for how to eval your own model.\n\n\n## 👍 Acknowledgement\n\nThis project wouldn't be possible without the following open-sourced repositories: [Open-Sora Plan](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FOpen-Sora-Plan), [Ascend](https:\u002F\u002Fwww.hiascend.com), [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers), [vLLM-Omni](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm), [SGLang Diffusion](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang), [Wan](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.1), [FramePack](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FFramePack), [PyramidFlow](https:\u002F\u002Fgithub.com\u002Fjy0205\u002FPyramid-Flow), [DMD](https:\u002F\u002Fgithub.com\u002Ftianweiy\u002FDMD2).\n\n\n## 🔒 License\n\nThis project is released under the Apache 2.0 license as found in the [LICENSE](LICENSE) file.\n\n## ✏️ Citation\n\nIf you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝:\n\n```BibTeX\n@article{helios,\n  title={Helios: Real Real-Time Long Video Generation Model},\n  author={Yuan, Shenghai and Yin, Yuanyang and Li, Zongjian and Huang, Xinwei and Yang, Xiao and Yuan, Li},\n  journal={arXiv preprint arXiv:2603.04379},\n  year={2026}\n}\n```\n\n## 🤝 Contact\n\nFor questions and feedback, please contact us at: shyuan-cs@hotmail.com\n\n\n","Helios是一个实时长视频生成模型，能够在单个H100 GPU上以19.5 FPS的速度生成高质量的分钟级视频。该项目的核心功能包括在不依赖传统抗漂移策略和标准加速技术的情况下实现视频的高效合成，通过优化训练和推理过程来提高吞吐量并减少内存消耗。Helios适合需要快速生成连贯且高质量视频内容的应用场景，如创意设计、虚拟现实体验制作以及教育材料生产等。项目采用Python编写，并遵循Apache License 2.0开源许可协议。",2,"2026-06-11 03:49:30","high_star"]