[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72593":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},72593,"LiveAvatar","Alibaba-Quark\u002FLiveAvatar","Alibaba-Quark","Implementation of \"Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length\"",null,"Python",2140,242,63,39,0,9,27,66,97.26,"Apache License 2.0",false,"main",[],"2026-06-12 04:01:06","\u003Cdiv align=\"center\">\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Flogo.png\" width=\"200px\" alt=\"Live Avatar Teaser\">\n\u003C\u002Fp>\n\n\u003Ch1>🎬 Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length\u003C\u002Fh1>\n\u003C!-- \u003Ch3>The code will be open source in \u003Cstrong>\u003Cspan style=\"color: #87CEEB;\">early December\u003C\u002Fspan>\u003C\u002Fstrong>.\u003C\u002Fh3> -->\n\n\n\n\u003Cp>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FYubo-Shankui\" style=\"color: inherit;\">Yubo Huang\u003C\u002Fa>\u003Csup>1,2\u003C\u002Fsup> ·\n\u003Ca href=\"#\" style=\"color: inherit;\">Hailong Guo\u003C\u002Fa>\u003Csup>2,3\u003C\u002Fsup> ·\n\u003Ca href=\"#\" style=\"color: inherit;\">Fangtai Wu\u003C\u002Fa>\u003Csup>2,4\u003C\u002Fsup> ·\n\u003Ca href=\"#\" style=\"color: inherit;\">Weiqiang Wang\u003C\u002Fa>\u003Csup>5\u003C\u002Fsup> ·\n\u003Ca href=\"#\" style=\"color: inherit;\">Shifeng Zhang\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup> ·\n\u003Ca href=\"#\" style=\"color: inherit;\">Shijie Huang\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup> ·\n\u003Ca href=\"#\" style=\"color: inherit;\">Qijun Gan\u003C\u002Fa>\u003Csup>4\u003C\u002Fsup> ·\n\u003Ca href=\"#\" style=\"color: inherit;\">Lin Liu\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup> ·\n\u003Ca href=\"#\" style=\"color: inherit;\">Sirui Zhao\u003C\u002Fa>\u003Csup>1,*\u003C\u002Fsup> ·\n\u003Ca href=\"http:\u002F\u002Fstaff.ustc.edu.cn\u002F~cheneh\u002F\" style=\"color: inherit;\">Enhong Chen\u003C\u002Fa>\u003Csup>1,*\u003C\u002Fsup> ·\n\u003Ca href=\"https:\u002F\u002Fopenreview.net\u002Fprofile?id=%7EJiaming_Liu7\" style=\"color: inherit;\">Jiaming Liu\u003C\u002Fa>\u003Csup>2,‡\u003C\u002Fsup> ·\n\u003Ca href=\"https:\u002F\u002Fsites.google.com\u002Fview\u002Fstevenhoi\u002F\" style=\"color: inherit;\">Steven Hoi\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>\n\u003C\u002Fp>\n\n\u003Cp style=\"font-size: 0.9em;\">\n\u003Csup>1\u003C\u002Fsup> University of Science and Technology of China &nbsp;&nbsp;\n\u003Csup>2\u003C\u002Fsup> Alibaba Group &nbsp;&nbsp;\n\u003Csup>3\u003C\u002Fsup> Beijing University of Posts and Telecommunications &nbsp;&nbsp;\n\u003Csup>4\u003C\u002Fsup> Zhejiang University\n\u003Csup>5\u003C\u002Fsup> Monash University\n  \n\u003C\u002Fp>\n\n\u003Cp style=\"font-size: 0.9em;\">\n\u003Csup>*\u003C\u002Fsup> Corresponding authors. &nbsp;&nbsp; \u003Csup>‡\u003C\u002Fsup> Project leader.\n\u003C\u002Fp>\n\n\u003C!-- Badges -->\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.04677\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2512.04677-b31b1b.svg?style=for-the-badge\" alt=\"arXiv\">\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2512.04677\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗%20Daily%20Paper-ff9d00?style=for-the-badge\" alt=\"Daily Paper\">\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FQuark-Vision\u002FLive-Avatar\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Model-ffbd45?style=for-the-badge&logo=huggingface&logoColor=white\" alt=\"HuggingFace\">\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAlibaba-Quark\u002FLiveAvatar\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGithub-Code-black?style=for-the-badge&logo=github\" alt=\"Github\">\u003C\u002Fa> \u003Ca href=\"https:\u002F\u002Fliveavatar.github.io\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-blue?style=for-the-badge&logo=googlechrome&logoColor=white\" alt=\"Project Page\">\u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n> **TL;DR:** **Live Avatar** is an algorithm–system co-designed framework that enables real-time, streaming, infinite-length interactive avatar video generation. Powered by a **14B-parameter** diffusion model, it achieves **45 FPS** on multi-card **H800** GPUs with **4-step** sampling and supports **Block-wise Autoregressive** processing for **10,000+** second streaming videos.\n\n\u003Cdiv align=\"center\">\n\n[![Watch the video](assets\u002Fdemo.png)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=srbsGlLNpAc)\n\n\u003Cstrong>👀 More Demos:\u003C\u002Fstrong> \u003Cbr>\n🤖 Human-AI Conversation &nbsp;|&nbsp; ♾️ Infinite Video &nbsp;|&nbsp; 🎭 Diverse Characters &nbsp;|&nbsp; 🎬 Animated Tech Explanation \u003Cbr>\n\u003Ca href=\"https:\u002F\u002Fliveavatar.github.io\u002F\">\n  \u003Cstrong>👉 Click Here to Visit Project Page! 🌐\u003C\u002Fstrong>\n\u003C\u002Fa>\n\u003Cbr>\n\n\u003C\u002Fdiv>\n\n---\n## ✨ Highlights\n\n> - ⚡ **​​Real-time Streaming Interaction**​​ - Achieve **45** FPS real-time streaming with low latency\n> - ♾️ ​​**​​Infinite-length Autoregressive Generation**​​​​ - Support **10,000+** second continuous video generation\n> - 🎨 ​​**​​Generalization Performances**​​​​ - Strong generalization across cartoon characters, singing, and diverse scenarios \n\n\n---\n## 📰 News\n- **[2026.1.20]** 🚀 Major performance breakthrough (**v1.1**)! **FP8 quantization** enables inference on **48GB GPUs**, while advanced **compilation** and **cuDNN** attention boost speed to **~2.5x** peak and **3x** average FPS. Achieving stable **45+ FPS** on multi-H800 — share your results on different GPUs! Inference fixes also bring noticeable **quality improvements**, significantly surpassing the teacher model on qualitative metrics.\n\u003C!-- - **[2026.1.9]** 🚀 Major performance update! Inference speed boosted to Peak 1.5x and Average 2x, achieving stable 30+ FPS on multi-H800 setups.  -->\n- **[2025.12.16]** 🎉 LiveAvatar has reached **1,000+** stars on GitHub! Thank you to the community for the incredible support! ⭐\n- **[2025.12.12]** 🚀 We released **single-gpu** inference [Code](infinite_inference_single_gpu.sh) — no need for 5×H800 (house-priced server), a single 80GB VRAM GPU is enough to enjoy. \n- **[2025.12.08]** 🚀 We released **real-time** inference [Code](infinite_inference_multi_gpu.sh) and the model [Weight](https:\u002F\u002Fhuggingface.co\u002FQuark-Vision\u002FLive-Avatar).\n- **[2025.12.08]** 🎉 LiveAvatar won the Hugging Face [#1 Paper of the day](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002Fdate\u002F2025-12-05)!\n- **[2025.12.04]** 🏃‍♂️ We committed to open-sourcing the code in **early December**.\n- **[2025.12.04]** 🔥 We released [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.04677) and [demo page](https:\u002F\u002Fliveavatar.github.io\u002F) Website.\n\n---\n\n## 📑 Todo List\n\n### 🌟 **Early December** (core code release)\n\n- ✅ Release the paper\n- ✅ Release the demo website\n- ✅ Release checkpoints on Hugging Face\n- ✅ Release Gradio Web UI\n- ✅ Experimental real-time streaming inference on at least H800 GPUs\n  - ✅ Distribution-matching distillation to 4 steps\n  - ✅ Timestep-forcing pipeline parallelism\n\n### ⚙️ **Later updates**\n\n- ✅ Inference code supporting single GPU (offline generation)\n- ✅ Multi-character support\n- ✅ Inference Acceleration Stage1 (RoPE optimization, compilation, LoRA merge)\n- ✅ Streaming-VAE intergration\n- ✅ Inference Acceleration Stage2 (further compilation, fp8, cudnn attn)\n- ⬜ UI integration for easily streaming interaction\n- ⬜ TTS integration\n- ⬜ Training code \n- ⬜ LiveAvatar v1.2\n\n## 🛠️ Installation\n\nPlease follow the steps below to set up the environment.\n\n### 1. Create Environment\n```bash\nconda create -n liveavatar python=3.10 -y\nconda activate liveavatar\n```\n\n### 2. Install CUDA Dependencies (optional)\n```bash\nconda install nvidia\u002Flabel\u002Fcuda-12.4.1::cuda -y\nconda install -c nvidia\u002Flabel\u002Fcuda-12.4.1 cudatoolkit -y\n```\n\n### 3. Install PyTorch & Flash Attention\n```bash\npip install torch==2.8.0 torchvision==0.23.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n\n# If you are using NVIDIA Hopper architecture (H800\u002FH200, etc.), FlashAttention 3 is recommended for a significant speedup:\npip install flash_attn_3 --find-links https:\u002F\u002Fwindreamer.github.io\u002Fflash-attention3-wheels\u002Fcu128_torch280 --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n\n# Otherwise, use FlashAttention 2:\npip install flash-attn==2.8.3 --no-build-isolation\n```\n\n### 4. Install Python Requirements\n```bash\npip install -r requirements.txt\n```\n### 5. Install FFMPEG\n```bash\napt-get update && apt-get install -y ffmpeg                 \n```\n\n---\n\n## 📥 Download Models\n\nPlease download the pretrained checkpoints from links below and place them in the `.\u002Fckpt\u002F` directory.\n\n| Model Component | Description | Link |\n| :--- | :--- | :---: |\n| `WanS2V-14B` | base model| 🤗 [Huggingface](https:\u002F\u002Fhuggingface.co\u002FWan-AI\u002FWan2.2-S2V-14B) |\n| `liveAvatar` | our lora model| 🤗 [Huggingface](https:\u002F\u002Fhuggingface.co\u002FQuark-Vision\u002FLive-Avatar) |\n```bash\n# If you are in china mainland, run this first: export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com\npip install \"huggingface_hub[cli]\"\nhuggingface-cli download Wan-AI\u002FWan2.2-S2V-14B --local-dir .\u002Fckpt\u002FWan2.2-S2V-14B\nhuggingface-cli download Quark-Vision\u002FLive-Avatar --local-dir .\u002Fckpt\u002FLiveAvatar\n```\n\nAfter downloading, your directory structure should look like this:\n\n```\nckpt\u002F\n├── Wan2.2-S2V-14B\u002F          # Base model\n│   ├── config.json\n│   ├── diffusion_pytorch_model-*.safetensors\n│   └── ...\n└── LiveAvatar\u002F              # Our LoRA model\n    ├── liveavatar.safetensors\n    └── ...\n```\n\n\n\n## 🚀 Inference\n### Real-time Inference with TPP\n> 💡 Currently, This command can run on GPUs with at least 80GB VRAM.\n```bash\n# CLI Inference\nbash infinite_inference_multi_gpu.sh\n# Gradio Web UI\nbash gradio_multi_gpu.sh\n```\n> 💡 The model can generate videos from audio input combined with reference image and optional text prompt.\n\n> 💡 The `size` parameter represents the area of the generated video, with the aspect ratio following that of the original input image.\n\n> 💡 The `--num_clip` parameter controls the number of video clips generated, useful for quick preview with shorter generation time.\n\n> 💡 Currently, our TPP pipeline requires **five** GPUs for inference. We are planning to develop a 3-step version that can be deployed on a 4-GPU cluster.\nFurthermore, we are planning to integrate the [LightX2V](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V) VAE component. This integration will eliminate the dependency on additional single-GPU VAE parallelism and support 4-step inference within a 4-GPU setup.\n\n> 💡 Compilation **(`ENABLE_COMPILE`)**: Enabling compilation will cause a long wait time during the first inference as the model compiles, but subsequent runs will see significant performance improvements. This is highly valuable for streaming long video scenarios. However, if you just want to quickly run a few test cases, we recommend disabling it by setting `export ENABLE_COMPILE=false` in your inference script.\n\n> 💡 FP8 Quantization **(`ENABLE_FP8`)**: FP8 offers **notable VRAM savings**, enabling inference on **48GB GPUs**, and also provides modest performance gains. Note that this may cause slight quality degradation. You can enable it by setting `export ENABLE_FP8=true` in your inference script.\n\nPlease visit our [project page](https:\u002F\u002Fliveavatar.github.io\u002F) to see more examples and learn about the scenarios suitable for this model.\n### Single-GPU Inference\n> 💡 This command can run on a single GPU with at least 80GB VRAM.\n```bash\n# CLI Inference\nbash infinite_inference_single_gpu.sh\n# Gradio Web UI\nbash gradio_single_gpu.sh\n```\n\n> 💡 If you encounter OOM errors after multiple runs in the Gradio Web UI, please try lowering the resolution (the `size` parameter) as a temporary fix. We are actively developing enhanced single GPU memory optimization; track our progress in the \"Later updates\" section.\n\n> 💡 To avoid performance degradation caused by frequent CPU offloading, we set the `enable_online_decode` parameter to `false` by default in the single-GPU scripts. This may slightly reduce quality when generating extremely long videos; in such cases, consider adding `--enable_online_decode` to your inference command.\n## 📝 Citation\n\nIf you find this project useful for your research, please consider citing our paper:\n\n```bibtex\n@misc{huang2025liveavatarstreamingrealtime,\n      title={Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length}, \n      author={Yubo Huang and Hailong Guo and Fangtai Wu and Shifeng Zhang and Shijie Huang and Qijun Gan and Lin Liu and Sirui Zhao and Enhong Chen and Jiaming Liu and Steven Hoi},\n      year={2025},\n      eprint={2512.04677},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.04677}, \n}\n```\n## ⭐ Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=Alibaba-Quark\u002FLiveAvatar&type=date&legend=top-left)](https:\u002F\u002Fwww.star-history.com\u002F#Alibaba-Quark\u002FLiveAvatar&type=date&legend=top-left)\n\n## 📜 License Agreement\n* The majority of this project is released under the Apache 2.0 license as found in the [LICENSE](LICENSE).\n* The Wan model (Our base model) is also released under the Apache 2.0 license as found in the [LICENSE](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.2\u002Fblob\u002Fmain\u002FLICENSE.txt).\n* The project is a research preview. Please contact us if you find any potential violations. (jmliu1217@gmail.com)\n\n### 💬 WeChat Group\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fwechat_group.png\" alt=\"WeChat group\" width=\"360\" \u002F>\n\u003C\u002Fp>\n\n\n\n## 🙏 Acknowledgements\n\nWe would like to express our gratitude to the following projects:\n\n*   [CausVid](https:\u002F\u002Fgithub.com\u002Ftianweiy\u002FCausVid)\n*   [Longlive](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FLongLive)\n*   [WanS2V](https:\u002F\u002Fhumanaigc.github.io\u002Fwan-s2v-webpage\u002F)\n","Live Avatar 是一个实现实时音频驱动的无限长度虚拟形象生成的框架。该项目利用140亿参数的扩散模型，能够在多张H800 GPU上以45帧每秒的速度生成高质量的虚拟形象视频，并支持长达10,000秒以上的流媒体视频处理。其核心技术特点包括高效的4步采样方法和块自回归处理技术，确保了长时间视频生成过程中的稳定性和流畅性。适用于需要实时交互式虚拟形象的应用场景，如在线教育、虚拟主播、远程会议等。",2,"2026-06-11 03:42:42","high_star"]