[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83451":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":13,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":13,"stars7d":16,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":10,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},83451,"EverAnimate","vita-epfl\u002FEverAnimate","vita-epfl","[ArXiv 26] EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration","https:\u002F\u002Feveranimate.github.io\u002Fhomepage\u002F",null,"Python",60,1,8,0,9,6,0.9,"MIT License",false,"main",true,[24,25,26],"human-animation","long-video-generation","video-generation","2026-06-12 02:04:34","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo.png\" alt=\"EverAnimate logo\" width=\"540\">\n\n  \u003Ch1>Minute-Scale Human Animation via Latent Flow Restoration\u003C\u002Fh1>\n\n  \u003Cp>\n    \u003Ca href=\"https:\u002F\u002Fwymancv.github.io\u002Fwuyang.github.io\u002F\">\u003Cstrong>Wuyang Li\u003C\u002Fstrong>\u003C\u002Fa> &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=rpT0Q6AAAAAJ&hl=en\">\u003Cstrong>Yang Gao\u003C\u002Fstrong>\u003C\u002Fa> &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fpeople.epfl.ch\u002Fmariam.hassan?lang=en\">\u003Cstrong>Mariam Hassan\u003C\u002Fstrong>\u003C\u002Fa> &nbsp;\n    \u003Ca href=\"https:\u002F\u002Falan-lanfeng.github.io\u002F\">\u003Cstrong>Lan Feng\u003C\u002Fstrong>\u003C\u002Fa>\u003Cbr>\n    \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=sHKkAToAAAAJ&hl=zh-CN\">\u003Cstrong>Wentao Pan\u003C\u002Fstrong>\u003C\u002Fa> &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=Y2Oth4MAAAAJ&hl=zh-TW\">\u003Cstrong>Po-Chien Luan\u003C\u002Fstrong>\u003C\u002Fa> &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fpeople.epfl.ch\u002Falexandre.alahi\u002F?lang=en\">\u003Cstrong>Alexandre Alahi\u003C\u002Fstrong>\u003C\u002Fa>\u003Cbr>\n    \u003Ca href=\"https:\u002F\u002Fwww.epfl.ch\u002Flabs\u002Fvita\u002F\">\u003Cem>VITA@EPFL\u003C\u002Fem>\u003C\u002Fa>\n  \u003C\u002Fp>\n\n  \u003Cp>\n    \u003Ca href=\"https:\u002F\u002Feveranimate.github.io\u002Fhomepage\u002F\">\n      \u003Cimg alt=\"Project page\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-111827?style=flat-square&logo=githubpages&logoColor=white\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15042\">\n      \u003Cimg alt=\"Paper\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.15042-b31b1b?style=flat-square&logo=arxiv&logoColor=white\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fepfl-vita\u002Feveranimate\">\n      \u003Cimg alt=\"Model checkpoints\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Checkpoints-ffcc4d?style=flat-square\">\n    \u003C\u002Fa>\n    \u003Ca href=\"LICENSE\">\n      \u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-2563eb?style=flat-square\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n\n  \u003Cp>\n    \u003Cimg alt=\"Python\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10-3776ab?style=flat-square&logo=python&logoColor=white\">\n    \u003Cimg alt=\"PyTorch\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPyTorch-2.5-ee4c2c?style=flat-square&logo=pytorch&logoColor=white\">\n    \u003Cimg alt=\"CUDA\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCUDA-12-76b900?style=flat-square&logo=nvidia&logoColor=white\">\n    \u003Cimg alt=\"Resolution\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLoRA-480p-7c3aed?style=flat-square\">\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n---\n\nContact: \u003Ca href=\"https:\u002F\u002Fwymancv.github.io\u002Fwuyang.github.io\u002F\">\u003Cstrong>Wuyang Li\u003C\u002Fstrong>\u003C\u002Fa>   \nEmail: wymanbest@outlook.com\n\n## ✨ Highlights\n\n- **GPU-friendly training.** Rank-32 LoRA post-training on Wan2.2-Animate reaches strong results with only thousands of iterations on 4 GPUs.\n- **Long-horizon animation.** EverAnimate supports minute-scale human animation with controlled identity and motion consistency.\n- **Fully open source.** Code, training\u002Finference scripts, LoRA checkpoints, demo data, and ablation videos are released for reproducible research.\n\n**Note.** EverAnimate builds on the long-video generation framework of SVI 2.0 Pro. Unlike the version described in our paper, which uses a rank of 128, we reproduce and release a lighter, more user-friendly LoRA (rank 32) version focused on long-horizon human animation. It comes with ready-to-run training and inference scripts and can be used on 80GB GPUs without DeepSpeed ZeRO-2 or ZeRO-3.\n\nIf you find EverAnimate useful for your research or applications, we would greatly appreciate a ⭐.\n\n> [!IMPORTANT]\n> **ComfyUI workflow notice:** The current community-deployed ComfyUI workflow has known issues and may cause severe background flickering between chunks. We are preparing an official version and will release it publicly once it is ready.\n>\n> Known issues:\n> - The padding at [`wan_video_animate_adapter.py#L625`](https:\u002F\u002Fgithub.com\u002Fvita-epfl\u002FEverAnimate\u002Fblob\u002Fdb7703c4dc05f66ac79a4425a19e0ec3c3ce2e94\u002Fdiffsynth\u002Fmodels\u002Fwan_video_animate_adapter.py#L625) should correspond to four anchors, while the default Wan-Animate setting uses one anchor.\n> - Our method is theoretically not compatible with background masks.\n\n## 📰 News\n\n- **27 May 2026:** Code released.\n- **14 May 2026:** Paper released.\n\n## 🛠️ Environment Setup\n\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fvita-epfl\u002FEverAnimate.git\ncd EverAnimate\n\nconda create -n everanimate python=3.10 -y\nconda activate everanimate\n\npip install --upgrade pip setuptools wheel packaging ninja\npip install torch torchvision --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\npip install -e .\npip install flash-attn --no-build-isolation\n```\n\n## 📦 Download Models\n\nDownload all required files with one command:\n\n```bash\nbash scripts\u002Fdownload_models.sh\n```\n\nThis downloads:\n\n- Wan2.2-Animate diffusion, T5 encoder, VAE, and CLIP model files\n- The `google\u002Fumt5-xxl` tokenizer required by the DiffSynth Wan pipeline\n- The Wav2Vec processor files used by the default training pipeline\n- EverAnimate 480p LoRA checkpoints and the 720p beta checkpoint under `ckpts\u002Feveranimate-v1-lora32`\n- Demo assets from [`data`](https:\u002F\u002Fhuggingface.co\u002Fepfl-vita\u002Feveranimate\u002Ftree\u002Fmain\u002Fdata), including the minimal training sample, inference demo, and Stage-1\u002FStage-2 ablation videos\n\nAfter downloading, the default scripts use the local `ckpts\u002F` folder for both base models and EverAnimate LoRA checkpoints. For offline runs, set:\n\n```bash\nexport DIFFSYNTH_MODEL_BASE_PATH=$PWD\u002Fckpts\nexport DIFFSYNTH_SKIP_DOWNLOAD=True\n```\n\nExpected layout:\n\n```text\nckpts\u002F\n|-- Wan-AI\u002FWan2.2-Animate-14B\u002F\n|   |-- diffusion_pytorch_model*.safetensors\n|   |-- models_t5_umt5-xxl-enc-bf16.pth\n|   |-- Wan2.1_VAE.pth\n|   `-- models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth\n|-- Wan-AI\u002FWan2.1-T2V-1.3B\u002F\n|   `-- google\u002Fumt5-xxl\u002F      # Tokenizer used by DiffSynth\n|-- Wan-AI\u002FWan2.2-S2V-14B\u002F\n|   `-- wav2vec2-large-xlsr-53-english\u002F\n|-- everanimate-v1-lora32\u002F\n|   |-- stage1_480p.safetensors\n|   |-- stage2_480p.safetensors\n|   `-- stage3_720p_beta.safetensors  # Beta, tested only at small scale\ndata\u002F\n|-- train\u002F       # Minimal training sample\n|-- test\u002F        # Inference demo\n`-- ablation\u002F    # Stage-1 and Stage-2 ablation videos\n```\n\nThe ablation videos are the two-stage outputs: `data\u002Fablation\u002Fstage1.mp4` is the Stage-1 result, and `data\u002Fablation\u002Fstage2.mp4` is the Stage-2 result.\n\nEverAnimate follows the official DiffSynth-Studio model-loading convention:\n\n- [DiffSynth model inference and loading](https:\u002F\u002Fdiffsynth-studio-doc.readthedocs.io\u002Fzh-cn\u002Flatest\u002FPipeline_Usage\u002FModel_Inference.html)\n- [DiffSynth Wan model details](https:\u002F\u002Fdiffsynth-studio-doc.readthedocs.io\u002Fen\u002Flatest\u002FModel_Details\u002FWan.html)\n\n## 🎬 Inference\n\nRun the bundled test 480p demo:\n\n```bash\nbash test.sh\n```\n\nDuring inference, EverAnimate automatically saves the latest chunk latents so long videos can be resumed from the saved state. We use 4 overlap frames between two chunks, and use the last latent (without deocoding) of previous chunk to guide the current chunk generation.\n\nRun a longer demo with 20 chunks:\n\n```bash\nNUM_CLIPS=20 OUTPUT_PATH=outputs\u002Ftest\u002Fdemo_000001_20chunks.mp4 bash test.sh\n```\n\nRun the 720p beta checkpoint:\n\n> The 720p checkpoint is a beta version and has only been tested at small scale.\n\n```bash\nLORA_PATH=ckpts\u002Feveranimate-v1-lora32\u002Fstage3_720p_beta.safetensors \\\nWIDTH=1280 \\\nHEIGHT=720 \\\nOUTPUT_PATH=outputs\u002Ftest\u002Fdemo_000001_720p_beta.mp4 \\\nbash test.sh\n```\n\nUse custom inputs:\n\n```bash\nINPUT_IMAGE=path\u002Fto\u002Fimage.png \\\nPOSE_VIDEO=path\u002Fto\u002Fpose.mp4 \\\nFACE_VIDEO=path\u002Fto\u002Fface.mp4 \\\nOUTPUT_PATH=outputs\u002Fcustom.mp4 \\\nbash test.sh\n```\n\n## 🚀 Training\n\nThe repository includes a minimal toy training sample under `data\u002Ftrain\u002F`. Training videos should be longer than 160 frames.\n\nStage 1 performs video extension with the last latent and memory, without anti-drifting. (`data\u002Fablation\u002Fstage1.mp4`)\n\n```bash\nbash train_stage1.sh\n```\n\nStage 2 conducts restorative flow matching. (`data\u002Fablation\u002Fstage2.mp4`)\n\nTo apply an SVI\u002FHelios-style anti-drifting strategy, pass `--enable_image_enhancement` and `--image_enhancement_prob 0.95` to explicitly augment the motion latents. While this approach can further improve stability, it may introduce cross-chunk flickering.\n\n```bash\nbash train_stage2.sh\n```\n\nUse custom training data:\n\n```bash\nDATASET_BASE_PATH=path\u002Fto\u002Fdata \\\nDATASET_METADATA_PATH=path\u002Fto\u002Fmetadata.csv \\\nOUTPUT_PATH=experiments\u002Fstage1_custom \\\nbash train_stage1.sh\n```\n\nFor Stage 2, the default Stage-1 LoRA is `ckpts\u002Feveranimate-v1-lora32\u002Fstage1_480p.safetensors`. To use another checkpoint:\n\n```bash\nLORA_CHECKPOINT=path\u002Fto\u002Fstage1.safetensors \\\nOUTPUT_PATH=experiments\u002Fstage2_custom \\\nbash train_stage2.sh\n```\n\n720p beta fine-tuning:\n\n> `train_stage3.sh` uses 1280x720-scale training through `MAX_PIXELS=921600`. This path is still beta and has only been tested at small scale. Training without DeepSpeed requires more than 80GB of GPU memory.\n\n```bash\nbash train_stage3.sh\n```\n\nContinue fine-tuning from the 720p beta checkpoint:\n\n```bash\nLORA_CHECKPOINT=ckpts\u002Feveranimate-v1-lora32\u002Fstage3_720p_beta.safetensors \\\nOUTPUT_PATH=experiments\u002Fstage3_720p_continue \\\nbash train_stage3.sh\n```\n\n## 📁 Repository Layout\n\n```text\nEverAnimate\u002F\n|-- diffsynth\u002F              # Core model, pipeline, diffusion, and utility code\n|-- scripts\u002F                # Download, training, and inference utilities\n|-- data\u002Ftrain\u002F             # Minimal toy training sample\n|-- data\u002Ftest\u002F              # Minimal inference demo sample\n|-- ckpts\u002FWan-AI\u002F           # Wan base model files used by DiffSynth\n|-- ckpts\u002Feveranimate-v1-lora32\u002F # EverAnimate 480p LoRA and 720p beta checkpoints\n|-- train_stage1.sh\n|-- train_stage2.sh\n|-- train_stage3.sh\n`-- test.sh\n```\n\n## ❓ FAQ\n\n**Q: What input resolution is supported?**\n\nDue to compute constraints, the stable public release focuses on 480p LoRA checkpoints. We also provide a 720p beta checkpoint, but it has only been tested at small scale. A more thoroughly fine-tuned and evaluated 720p model is planned for a future update, and the current 480p model can also support 720p inference to some extent.\n\n**Q: How are anchor frames used?**\n\nEverAnimate uses four anchor frames as guidance. For the first chunk, we directly copy the provided reference as the anchor. Starting from the second chunk, we use the first frame plus three randomly selected frames as anchors. Therefore, video decoding starts from the fifth latent, and `WanAnimateAdapter` in `diffsynth\u002Fmodels\u002Fwan_video_animate_adapter.py` pads the anchor positions accordingly. Users can also provide anchor frames explicitly for their own workflows.\n\n**Q: What minor future improvements are planned?**\n\nIn some samples, we observe a visible transition between the first and second chunks. We plan to improve this boundary behavior in future releases.\n\n\n\n## 📝 Abstract\n\n**EverAnimate** is an efficient post-training method for long-horizon animated video generation that preserves visual quality and character identity. Long-form animation remains challenging because highly dynamic human motion must be synthesized against relatively static environments, making chunk-based generation prone to accumulated drift: low-level quality drift, such as progressive degradation of static backgrounds, and high-level semantic drift, such as inconsistent character identity and view-dependent attributes. EverAnimate restores drifted flow trajectories by anchoring generation to a persistent latent context memory. It consists of two complementary mechanisms: **Persistent Latent Propagation**, which maintains context memory across chunks to propagate identity and motion in latent space while mitigating temporal forgetting, and **Restorative Flow Matching**, which introduces an implicit restoration objective during sampling through velocity adjustment to improve within-chunk fidelity. With only lightweight LoRA tuning, EverAnimate outperforms state-of-the-art long-animation methods in both short- and long-horizon settings: at 10 seconds, it improves PSNR\u002FSSIM by 8%\u002F7% and reduces LPIPS\u002FFID by 22%\u002F11%; at 90 seconds, the gains increase to 15%\u002F15% and 32%\u002F27%, respectively.\n\n## 🧩 Method Overview\n\n| Component | Purpose |\n| --- | --- |\n| Persistent Latent Propagation | Propagates identity and motion through latent memory across chunks. |\n| Restorative Flow Matching | Corrects drifted latent trajectories with a bounded restorative velocity target. |\n| Lightweight LoRA Adaptation | Enables efficient post-training on top of a video animation backbone. |\n\n## 🙏 Acknowledgements\n\nThis work builds on the following projects:\n\n- [DiffSynth-Studio](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FDiffSynth-Studio)\n- [Wan-Animate: Unified Character Animation and Replacement with Holistic Replication](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.14055)\n- [Stable Video Infinity: Infinite-Length Video Generation with Error Recycling](https:\u002F\u002Fstable-video-infinity.github.io\u002Fhomepage\u002F)\n\nThis work has also been inspired by SVI 2.0 Pro and LongCat Video Avatar.\n\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fteaser.png\" alt=\"EverAnimate teaser\" width=\"100%\">\n\u003C\u002Fdiv>\n\n\n## 📚 Citation\n\n```bibtex\n@misc{li2026everanimate,\n  title         = {EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration},\n  author        = {Wuyang Li and Yang Gao and Mariam Hassan and Lan Feng and Wentao Pan and Po-Chien Luan and Alexandre Alahi},\n  year          = {2026},\n  eprint        = {2605.15042},\n  archivePrefix = {arXiv},\n  primaryClass  = {cs.CV},\n  url           = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15042}\n}\n\n@misc{li2025stablevideoinfinity,\n  title         = {Stable Video Infinity: Infinite-Length Video Generation with Error Recycling},\n  author        = {Wuyang Li and Wentao Pan and Po-Chien Luan and Yang Gao and Alexandre Alahi},\n  year          = {2025},\n  eprint        = {2510.09212},\n  archivePrefix = {arXiv},\n  primaryClass  = {cs.CV},\n  url           = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.09212}\n}\n```\n",2,"2026-06-11 04:11:11","CREATED_QUERY"]