[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72305":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72305,"Self-Forcing","guandeh17\u002FSelf-Forcing","guandeh17","Official codebase for \"Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion\" (NeurIPS 2025 Spotlight)","",null,"Python",3389,270,23,59,0,4,12,60,29.3,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:03:01","\u003Cp align=\"center\">\n\u003Ch1 align=\"center\">Self Forcing\u003C\u002Fh1>\n\u003Ch3 align=\"center\">Bridging the Train-Test Gap in Autoregressive Video Diffusion\u003C\u002Fh3>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.xunhuang.me\u002F\">Xun Huang\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fzhengqili.github.io\u002F\">Zhengqi Li\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fguandehe.github.io\u002F\">Guande He\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fmingyuanzhou.github.io\u002F\">Mingyuan Zhou\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fresearch.adobe.com\u002Fperson\u002Feli-shechtman\u002F\">Eli Shechtman\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\u003Cbr>\n    \u003Csup>1\u003C\u002Fsup>Adobe Research \u003Csup>2\u003C\u002Fsup>UT Austin\n  \u003C\u002Fp>\n  \u003Ch3 align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08009\">Paper\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fself-forcing.github.io\">Website\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fgdhe17\u002FSelf-Forcing\u002Ftree\u002Fmain\">Models (HuggingFace)\u003C\u002Fa>\u003C\u002Fh3>\n\u003C\u002Fp>\n\n---\n\nSelf Forcing trains autoregressive video diffusion models by **simulating the inference process during training**, performing autoregressive rollout with KV caching. It resolves the train-test distribution mismatch and enables **real-time, streaming video generation on a single RTX 4090** while matching the quality of state-of-the-art diffusion models.\n\n---\n\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7548c2db-fe03-4ba8-8dd3-52d2c6160739\n\n\n## Requirements\nWe tested this repo on the following setup:\n* Nvidia GPU with at least 24 GB memory (RTX 4090, A100, and H100 are tested).\n* Linux operating system.\n* 64 GB RAM.\n\nOther hardware setup could also work but hasn't been tested.\n\n## Installation\nCreate a conda environment and install dependencies:\n```\nconda create -n self_forcing python=3.10 -y\nconda activate self_forcing\npip install -r requirements.txt\npip install flash-attn --no-build-isolation\npython setup.py develop\n```\n\n## Quick Start\n### Download checkpoints\n```\nhuggingface-cli download Wan-AI\u002FWan2.1-T2V-1.3B --local-dir-use-symlinks False --local-dir wan_models\u002FWan2.1-T2V-1.3B\nhuggingface-cli download gdhe17\u002FSelf-Forcing checkpoints\u002Fself_forcing_dmd.pt --local-dir .\n```\n\n### GUI demo\n```\npython demo.py\n```\nNote:\n* **Our model works better with long, detailed prompts** since it's trained with such prompts. We will integrate prompt extension into the codebase (similar to [Wan2.1](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.1\u002Ftree\u002Fmain?tab=readme-ov-file#2-using-prompt-extention)) in the future. For now, it is recommended to use third-party LLMs (such as GPT-4o) to extend your prompt before providing to the model.\n* You may want to adjust FPS so it plays smoothly on your device.\n* The speed can be improved by enabling `torch.compile`, [TAEHV-VAE](https:\u002F\u002Fgithub.com\u002Fmadebyollin\u002Ftaehv\u002F), or using FP8 Linear layers, although the latter two options may sacrifice quality. It is recommended to use `torch.compile` if possible and enable TAEHV-VAE if further speedup is needed.\n\n### CLI Inference\nExample inference script using the chunk-wise autoregressive checkpoint trained with DMD:\n```\npython inference.py \\\n    --config_path configs\u002Fself_forcing_dmd.yaml \\\n    --output_folder videos\u002Fself_forcing_dmd \\\n    --checkpoint_path checkpoints\u002Fself_forcing_dmd.pt \\\n    --data_path prompts\u002FMovieGenVideoBench_extended.txt \\\n    --use_ema\n```\nOther config files and corresponding checkpoints can be found in [configs](configs) folder and our [huggingface repo](https:\u002F\u002Fhuggingface.co\u002Fgdhe17\u002FSelf-Forcing\u002Ftree\u002Fmain\u002Fcheckpoints).\n\n## Training\n### Download text prompts and ODE initialized checkpoint\n```\nhuggingface-cli download gdhe17\u002FSelf-Forcing checkpoints\u002Fode_init.pt --local-dir .\nhuggingface-cli download gdhe17\u002FSelf-Forcing vidprom_filtered_extended.txt --local-dir prompts\n```\nNote: Our training algorithm (except for the GAN version) is data-free (**no video data is needed**). For now, we directly provide the ODE initialization checkpoint and will add more instructions on how to perform ODE initialization in the future (which is identical to the process described in the [CausVid](https:\u002F\u002Fgithub.com\u002Ftianweiy\u002FCausVid) repo).\n\n### Self Forcing Training with DMD\n```\ntorchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \\\n  --rdzv_backend=c10d \\\n  --rdzv_endpoint $MASTER_ADDR \\\n  train.py \\\n  --config_path configs\u002Fself_forcing_dmd.yaml \\\n  --logdir logs\u002Fself_forcing_dmd \\\n  --disable-wandb\n```\nOur training run uses 600 iterations and completes in under 2 hours using 64 H100 GPUs. By implementing gradient accumulation, it should be possible to reproduce the results in less than 16 hours using 8 H100 GPUs.\n\n## Acknowledgements\nThis codebase is built on top of the open-source implementation of [CausVid](https:\u002F\u002Fgithub.com\u002Ftianweiy\u002FCausVid) by [Tianwei Yin](https:\u002F\u002Ftianweiy.github.io\u002F) and the [Wan2.1](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.1) repo.\n\n## Citation\nIf you find this codebase useful for your research, please kindly cite our paper:\n```\n@article{huang2025selfforcing,\n  title={Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion},\n  author={Huang, Xun and Li, Zhengqi and He, Guande and Zhou, Mingyuan and Shechtman, Eli},\n  journal={arXiv preprint arXiv:2506.08009},\n  year={2025}\n}\n```\n","Self Forcing 是一个用于自回归视频扩散模型的训练框架，通过在训练过程中模拟推理过程来减少训练和测试之间的分布差异。其核心功能包括使用KV缓存进行自回归展开，从而实现在单个RTX 4090 GPU上进行实时、流式视频生成，并保持与最先进扩散模型相当的质量。该项目采用Python语言编写，适用于需要高质量视频合成的应用场景，如视频编辑软件、内容创作工具等。此外，它还支持长且详细的提示词输入以生成更连贯的视频片段，适合于创意产业和技术研究领域。",2,"2026-06-11 03:41:18","high_star"]