[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82393":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":36,"lastSyncTime":37,"discoverSource":38},82393,"LongLive","NVlabs\u002FLongLive","NVlabs","LongLive 2.0: Infra - Long Video Gen","https:\u002F\u002Fnvlabs.github.io\u002FLongLive",null,"Python",2271,203,22,3,0,40,116,128,120,28.93,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32],"infra","long","nvfp4","parallel","real-time","video-generation","2026-06-12 02:04:25","\u003Cp align=\"center\" style=\"border-radius: 10px\">\n  \u003Cimg src=\"assets\u002Flonglive2\u002Flogo.png\" width=\"100%\" alt=\"LongLive2.0 logo\"\u002F>\n\u003C\u002Fp>\n\n# 🎬 LongLive 2.0: An NVFP4 Parallel Infrastructure for Long Video Generation\n\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArXiv-Paper-brown)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.18739)\n[![Code](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-Code-blue)](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FLongLive)\n[![Video](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FYouTube-Video-red)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=7oQALy32fiU)\n[![Models](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-BF16-yellow)](https:\u002F\u002Fhuggingface.co\u002FEfficient-Large-Model\u002FLongLive-2.0-5B)\n[![Models](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-NVFP4-orange)](https:\u002F\u002Fhuggingface.co\u002FEfficient-Large-Model\u002FLongLive-2.0-5B-NVFP4-S4)\n[![Demo](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-Page-brightgreen)](https:\u002F\u002Fnvlabs.github.io\u002FLongLive\u002FLongLive2\u002F)\n[![Docs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FFull-Documentation-green)](https:\u002F\u002Fnvlabs.github.io\u002FLongLive\u002FLongLive2\u002Fdocs\u002F)\n\n\u003Cdiv align=\"center\">\n\n\u003C!-- TODO: replace this text block with the final project-page video\u002Fdemo embed. -->\n\n[![Watch the video](assets\u002Flonglive2\u002Ffirst-video-frame.png)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=7oQALy32fiU)\n\n\u003C\u002Fdiv>\n\n## 💡 TLDR: Infra with NVFP4 and parallelism for both training and inference\n\n\u003Cp align=\"center\" style=\"border-radius: 10px\">\n  \u003Cimg src=\"assets\u002Flonglive2\u002Fteaser.jpg\" width=\"100%\" alt=\"LongLive2.0 teaser\"\u002F>\n\u003C\u002Fp>\n\n## News\n- 🔥 [2026.05.30] LongLive2.0 now supports I2V AR teacher-forcing training and I2V DMD distillation for Wan2.2-TI2V-5B.\n- ⚡ [2026.05.25] We optimized the NVFP4 inference path with fused Triton RoPE\u002FadaLN kernels, reduced KV-cache synchronization overhead, in-place quantized KV-cache updates, faster FP4 KV dequantization, pinned VAE transfers, and safer LoRA-before-quantization setup, improving overall throughput by **18.6%**.\n- 🔥 [2026.05.13] We release **LongLive 2.0**, infra with NVFP4, parallelism and multi-shot for AR training, DMD distillation, and inference (⚡45.7 FPS). The original LongLive 1.0 is now in the [v1.0](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FLongLive\u002Ftree\u002Fv1.0) branch.\n- 🔥 [2026.04.12] LongLive supports kv cache compression with [TriAttention](https:\u002F\u002Fgithub.com\u002FWeianMao\u002Ftriattention), with 50% KV reduction and no quality drop. Check it [here](https:\u002F\u002Fgithub.com\u002FWeianMao\u002Ftriattention\u002Ftree\u002Fmain\u002Flonglive)\n- 🎉 [2026.1.27] LongLive is accepted by **ICLR-2026**.\n- 🔥 [2026.1.11] LongLive supports adapting LongLive's original RoPE into KV-cache relative RoPE and generates infinite long videos!\n- 🔥 [2025.11.3] We implement LongLive on linear attention model [SANA-Video](https:\u002F\u002Fnvlabs.github.io\u002FSana\u002FVideo\u002F)! Now SANA-Video can generate 60s interactive videos in real-time.\n- 🔥 [2025.9.29] We release [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22622), this GitHub repo [LongLive](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FLongLive) with all training and inference code, the model weight [LongLive-1.3B](https:\u002F\u002Fhuggingface.co\u002FEfficient-Large-Model\u002FLongLive-1.3B), and demo page [Website](https:\u002F\u002Fnvlabs.github.io\u002FLongLive).\n\n## Introduction\n\n**LongLive 1.0**: Real-time Interactive Long Video Generation. [You can find it here](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FLongLive\u002Ftree\u002Fv1.0) in our V1.0 branch.\n\n**LongLive 2.0**: an NVFP4 Parallel Infrastructure for Long Video Generation\n- For training, it supports\n  - [x] Balanced sequence parallel for T2V\u002FI2V AR training (teacher-forcing).\n  - [x] T2V\u002FI2V AR training on multi-shot (or single-shot) videos.\n  - [x] NVFP4 (or BF16) for both AR training and few-step distillation.\n- For inference, it supports\n  - [x] NVFP4 inference (W4A4) and NVFP4 KV Cache.\n  - [x] Multi-shot attention sink.\n  - [x] Sequence parallel inference.\n  - [x] Async decoding.\n\n\n\u003Cp align=\"left\" style=\"border-radius: 10px\">\n  \u003Cimg src=\"assets\u002Flonglive2\u002Ffig_framework_overview.png\" width=\"80%\" alt=\"LongLive2.0 framework overview\"\u002F>\n\u003C\u002Fp>\n\n\n**LongLive 1.0**: Real-time Interactive Long Video Generation. It accepts sequential user prompts and generates corresponding videos in real time, enabling user-guided long video generation. The key insights are attention sink, KV-recache, and streaming long tuning. \n\n\n\u003Cp align=\"left\" style=\"border-radius: 10px\">\n  \u003Cimg src=\"assets\u002Flonglive2\u002FLongLive1_teaser.png\" width=\"80%\" alt=\"LongLive1.0 framework overview\"\u002F>\n\u003C\u002Fp>\n\n\n## Getting Started\n- [Full Documentation](https:\u002F\u002Fnvlabs.github.io\u002FLongLive\u002FLongLive2\u002Fdocs\u002F)\n- [Installation](https:\u002F\u002Fnvlabs.github.io\u002FLongLive\u002FLongLive2\u002Fdocs\u002F#installation)\n- [NVFP4 Setup](https:\u002F\u002Fnvlabs.github.io\u002FLongLive\u002FLongLive2\u002Fdocs\u002F#nvfp4-installation)\n- [Training Modes](https:\u002F\u002Fnvlabs.github.io\u002FLongLive\u002FLongLive2\u002Fdocs\u002F#training)\n- [Inference](https:\u002F\u002Fnvlabs.github.io\u002FLongLive\u002FLongLive2\u002Fdocs\u002F#inference)\n- [Data Organization](https:\u002F\u002Fnvlabs.github.io\u002FLongLive\u002FLongLive2\u002Fdocs\u002F#training-data)\n\n### Quick Start\n\n#### BF16\n\n```python\nimport torch\nfrom omegaconf import OmegaConf\n\nfrom pipeline import CausalDiffusionInferencePipeline\nfrom utils.config import normalize_config\nfrom utils.inference_utils import (\n    load_generator_checkpoint,\n    place_vae_for_streaming,\n    prepare_single_prompt_inputs,\n    save_video,\n)\n\nprompt = \"A compact silver robot walks through a clean robotics lab.\"\nmerged_checkpoint_path = \"LongLive-2.0-5B\u002Fmodel_bf16.pt\"\n\nconfig = normalize_config(OmegaConf.load(\"configs\u002Finference.yaml\"))\ndevice = torch.device(\"cuda\")\n\ntorch.set_grad_enabled(False)\npipe = CausalDiffusionInferencePipeline(config, device=device)\nload_generator_checkpoint(pipe.generator, merged_checkpoint_path)\npipe = pipe.to(device=device, dtype=torch.bfloat16)\nplace_vae_for_streaming(pipe, config)  # honor streaming_vae + vae_device when set\npipe.generator.model.eval().requires_grad_(False)\n\nnoise, prompts = prepare_single_prompt_inputs(config, prompt, device)\nvideo = pipe.inference(noise=noise, text_prompts=prompts)\nsave_video(video[0], \"videos\u002Fquickstart\u002Fsample.mp4\", fps=24)\n```\n\n`place_vae_for_streaming` is a no-op unless `inference.streaming_vae` is true and `inference.vae_device` is set, so toggling streaming-pipeline decode in your yaml is enough — the script does not need to change.\n\n#### NVFP4\n\nPoint `checkpoints.generator_ckpt` in `configs\u002Fnvfp4\u002Finference_nvfp4.yaml` at the downloaded checkpoint and set `model_quant_use_transformer_engine` according to the backend you are using:\n\n- TransformerEngine checkpoint (`model_te.pt`): `model_quant_use_transformer_engine: true`\n- FourOverSix checkpoint (`model_4o6.pt`): `model_quant_use_transformer_engine: false`\n\n`setup_nvfp4_pipeline` handles checkpoint loading, NVFP4 module wrapping, weight materialization, dtype\u002Fdevice placement, and the streaming-pipeline VAE relocation for both backends — the bf16 `pipe.to(...)` shortcut is unsafe here because it would cast the quantized buffers.\n\n```python\nimport torch\nfrom omegaconf import OmegaConf\n\nfrom pipeline import CausalDiffusionInferencePipeline\nfrom utils.config import normalize_config\nfrom utils.inference_utils import prepare_single_prompt_inputs, save_video, setup_nvfp4_pipeline\n\nprompt = \"A compact silver robot walks through a clean robotics lab.\"\n\nconfig = normalize_config(OmegaConf.load(\"configs\u002Fnvfp4\u002Finference_nvfp4.yaml\"))\ndevice = torch.device(\"cuda\")\n\ntorch.set_grad_enabled(False)\npipe = CausalDiffusionInferencePipeline(config, device=device)\nsetup_nvfp4_pipeline(pipe, config, device)\npipe.generator.model.eval().requires_grad_(False)\n\nnoise, prompts = prepare_single_prompt_inputs(config, prompt, device)\nvideo = pipe.inference(noise=noise, text_prompts=prompts)\nsave_video(video[0], \"videos\u002Fquickstart\u002Fsample_nvfp4.mp4\", fps=24)\n```\n\n## Training Modes\n\nLongLive2.0 supports both T2V and I2V training. Each modality follows the same two-stage recipe: AR teacher-forcing training first, then DMD distillation from the AR checkpoint.\n\n### T2V Training\n\n```bash\ntorchrun --standalone --nnodes=1 --nproc_per_node=8 train.py \\\n  --config_path configs\u002Ftrain_ar.yaml \\\n  --logdir logs\u002Ftrain_ar \\\n  --wandb-save-dir wandb \\\n  --disable-wandb\n\ntorchrun --standalone --nnodes=1 --nproc_per_node=8 train.py \\\n  --config_path configs\u002Ftrain_dmd.yaml \\\n  --logdir logs\u002Ftrain_dmd \\\n  --wandb-save-dir wandb \\\n  --disable-wandb\n```\n\n### I2V Training\n\n```bash\ntorchrun --standalone --nnodes=1 --nproc_per_node=8 train.py \\\n  --config_path configs\u002Ftrain_i2v_ar.yaml \\\n  --logdir logs\u002Ftrain_i2v_ar \\\n  --wandb-save-dir wandb \\\n  --disable-wandb\n\ntorchrun --standalone --nnodes=1 --nproc_per_node=8 train.py \\\n  --config_path configs\u002Ftrain_i2v_dmd.yaml \\\n  --logdir logs\u002Ftrain_i2v_dmd \\\n  --wandb-save-dir wandb \\\n  --disable-wandb\n```\n\nFor I2V configs, set `algorithm.i2v: true` and `algorithm.independent_first_frame: true`. `data.image_or_video_shape[1]` is the full latent sequence length, for example `96`, not `96 + 1`: the clean image latent replaces the first latent during denoising and that first latent is masked out of the training loss. For I2V DMD, set `checkpoints.generator_ckpt` to the I2V AR checkpoint used to initialize the student.\n\n## Models\n\n| Model | FPS ↑ | Params | VBench ↑ | Multi-shot |\n| --- | ---: | ---: | ---: | :---: |\n| [LongLive-1.3B](https:\u002F\u002Fhuggingface.co\u002FEfficient-Large-Model\u002FLongLive-1.3B) | 20.7 | 1.3B | 84.87 |  |\n| [LongLive-2.0-5B](https:\u002F\u002Fhuggingface.co\u002FEfficient-Large-Model\u002FLongLive-2.0-5B) | 24.8 | 5B | 85.06 | ✅ |\n| [LongLive-2.0-5B-NVFP4-4Step](https:\u002F\u002Fhuggingface.co\u002FEfficient-Large-Model\u002FLongLive-2.0-5B-NVFP4-S4) | 29.7 | 5B | 84.51 | ✅ |\n| [LongLive-2.0-5B-NVFP4-2Step](https:\u002F\u002Fhuggingface.co\u002FEfficient-Large-Model\u002FLongLive-2.0-5B-NVFP4-S2) | 45.7 | 5B | 83.14 | ✅ |\n\n## License\nThis repository is released under the Apache 2.0 license. See [LICENSE](LICENSE) for details.\n\n## Citation\nPlease consider citing our work if you find them useful:\n\n```bibtex\n@article{longlive_2.0,\n  title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},\n  author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},\n  journal={arXiv preprint arXiv},\n  year={2026}\n}\n```\n\n```bibtex\n@inproceedings{longlive,\n    title={Longlive: Real-time interactive long video generation}, \n    author={Yang, Shuai and Huang, Wei and Chu, Ruihang and Xiao, Yicheng and Zhao, Yuyang and Wang, Xianbang and Li, Muyang and Xie, Enze and Chen, Yingcong and Lu, Yao and others},\n    booktitle={ICLR},\n    year={2026},\n}\n```\n\n## Acknowledgement\n- [Self-Forcing](https:\u002F\u002Fgithub.com\u002Fguandeh17\u002FSelf-Forcing): the AR training codebase and formulation we build upon.\n- [Wan2.2](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.2): the base video diffusion model components used in this release.\n","LongLive 2.0 是一个用于长视频生成的基础设施，基于 NVFP4 并行技术。该项目通过引入 NVFP4 和并行处理机制，在训练和推理阶段都实现了高效运行，特别是在生成长视频时表现优异。它支持多种先进技术，如 I2V AR 教师强制训练、I2V DMD 蒸馏以及 KV 缓存压缩等，显著提升了视频生成的质量与速度。LongLive 2.0 适合需要实时或近实时生成高质量长视频的应用场景，例如在线教育、虚拟现实内容创作及流媒体服务等领域。",2,"2026-06-11 04:08:28","high_star"]