[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72585":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":8,"rankLanguage":8,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},72585,"HunyuanVideo-Avatar","Tencent-Hunyuan\u002FHunyuanVideo-Avatar","Tencent-Hunyuan",null,"Python",2116,342,25,71,0,5,10,22,15,29.61,"Other",false,"main",[],"2026-06-12 02:03:05","\u003C!-- ## **HunyuanVideo-Avatar** -->\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fmaterial\u002Flogo.png\"  height=100>\n\u003C\u002Fp>\n\n# **HunyuanVideo-Avatar** 🌅\n \n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FTencent\u002FHunyuanVideo-Avatar\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=HunyuanVideo-Avatar%20Code&message=Github&color=blue\">\u003C\u002Fa> &ensp;\n  \u003Ca href=\"https:\u002F\u002FHunyuanVideo-Avatar.github.io\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Project%20Page&message=Web&color=green\">\u003C\u002Fa> &ensp;\n  \u003Ca href=\"https:\u002F\u002Fhunyuan.tencent.com\u002FmodelSquare\u002Fhome\u002Fplay?modelId=126\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Playground&message=Web&color=green\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.20156\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArXiv-2505.20156-red\">\u003C\u002Fa> &ensp;\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FHunyuanVideo-Avatar\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=HunyuanVideo-Avatar&message=HunyuanVideo-Avatar&color=yellow\">\u003C\u002Fa> &ensp;\n\u003C\u002Fdiv>\n\n![image](assets\u002Fmaterial\u002Fteaser.png)\n\n> [**HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.20156) \u003Cbe>\n\n## 🔥🔥🔥 News!!\n* Jun 06, 2025: 🔥 HunyuanVideo-Avatar supports **Single GPU** with only **10GB VRAM**, with **TeaCache** included, **HUGE THANKS** to [Wan2GP](https:\u002F\u002Fgithub.com\u002Fdeepbeepmeep\u002FWan2GP)\n* May 28, 2025: 🔥 HunyuanVideo-Avatar is available in Cloud-Native-Build (CNB) [HunyuanVideo-Avatar](https:\u002F\u002Fcnb.cool\u002Ftencent\u002Fhunyuan\u002FHunyuanVideo-Avatar).\n* May  28, 2025: 👋 We release the inference code and model weights of HunyuanVideo-Avatar. [Download](weights\u002FREADME.md).\n\n\n## 📑 Open-source Plan\n\n- HunyuanVideo-Avatar\n  - [x] Inference \n  - [x] Checkpoints\n  - [ ] ComfyUI\n\n## Contents\n- [**HunyuanVideo-Avatar** 🌅](#HunyuanVideo-Avatar-)\n  - [🔥🔥🔥 News!!](#-news)\n  - [📑 Open-source Plan](#-open-source-plan)\n  - [Contents](#contents)\n  - [**Abstract**](#abstract)\n  - [**HunyuanVideo-Avatar Overall Architecture**](#HunyuanVideo-Avatar-overall-architecture)\n  - [🎉 **HunyuanVideo-Avatar Key Features**](#-HunyuanVideo-Avatar-key-features)\n    - [**Multimodal Video customization**](#multimodal-video-customization)\n    - [**Various Applications**](#various-applications)\n  - [📈 Comparisons](#-comparisons)\n  - [📜 Requirements](#-requirements)\n  - [🛠️ Dependencies and Installation](#️-dependencies-and-installation)\n    - [Installation Guide for Linux](#installation-guide-for-linux)\n  - [🧱 Download Pretrained Models](#-download-pretrained-models)\n  - [🚀 Parallel Inference on Multiple GPUs](#-parallel-inference-on-multiple-gpus)\n  - [🔑 Single-gpu Inference](#-single-gpu-inference)\n    - [Run with very low VRAM](#run-with-very-low-vram)\n  - [Run a Gradio Server](#run-a-gradio-server)\n  - [🔗 BibTeX](#-bibtex)\n  - [Acknowledgements](#acknowledgements)\n---\n\n## **Abstract**\n\nRecent years have witnessed significant progress in audio-driven human animation. However, critical challenges remain in (i) generating highly dynamic videos while preserving character consistency, (ii) achieving precise emotion alignment between characters and audio, and (iii) enabling multi-character audio-driven animation. To address these challenges, we propose HunyuanVideo-Avatar, a multimodal diffusion transformer (MM-DiT)-based model capable of simultaneously generating dynamic, emotion-controllable, and multi-character dialogue videos. Concretely, HunyuanVideo-Avatar introduces three key innovations: (i) A character image injection module is designed to replace the conventional addition-based character conditioning scheme, eliminating the inherent condition mismatch between training and inference. This ensures the dynamic motion and strong character consistency; (ii) An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video, enabling fine-grained and accurate emotion style control; (iii) A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask, enabling independent audio injection via cross-attention for multi-character scenarios. These innovations empower HunyuanVideo-Avatar to surpass state-of-the-art methods on benchmark datasets and a newly proposed wild dataset, generating realistic avatars in dynamic, immersive scenarios. The source code and model weights will be released publicly.\n\n## **HunyuanVideo-Avatar Overall Architecture**\n\n![image](assets\u002Fmaterial\u002Fmethod.png)\n\nWe propose **HunyuanVideo-Avatar**, a multi-modal diffusion transformer(MM-DiT)-based model capable of generating **dynamic**, **emotion-controllable**, and **multi-character dialogue** videos.\n\n## 🎉 **HunyuanVideo-Avatar Key Features**\n\n![image](assets\u002Fmaterial\u002Fdemo.png)\n\n### **High-Dynamic and Emotion-Controllable Video Generation**\n\nHunyuanVideo-Avatar supports animating any input **avatar images** to **high-dynamic** and **emotion-controllable** videos with simple **audio conditions**. Specifically, it takes as input **multi-style** avatar images at **arbitrary scales and resolutions**. The system supports multi-style avatars encompassing photorealistic, cartoon, 3D-rendered, and anthropomorphic characters. Multi-scale generation spanning portrait, upper-body and full-body. It generates videos with high-dynamic foreground and background, achieving superior realistic and naturalness. In addition, the system supports controlling facial emotions of the characters conditioned on input audio. \n\n### **Various Applications**\n\nHunyuanVideo-Avatar supports various downstream tasks and applications. For instance, the system generates talking avatar videos, which could be applied to e-commerce, online streaming, social media video production, etc. In addition, its multi-character animation feature enlarges the application such as video content creation, editing, etc. \n\n## 📜 Requirements\n\n* An NVIDIA GPU with CUDA support is required. \n  * The model is tested on a machine with 8GPUs.\n  * **Minimum**: The minimum GPU memory required is 24GB for 704px768px129f but very slow.\n  * **Recommended**: We recommend using a GPU with 96GB of memory for better generation quality.\n  * **Tips**: If OOM occurs when using GPU with 80GB of memory, try to reduce the image resolution. \n* Tested operating system: Linux\n\n\n## 🛠️ Dependencies and Installation\n\nBegin by cloning the repository:\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002FTencent-Hunyuan\u002FHunyuanVideo-Avatar.git\ncd HunyuanVideo-Avatar\n```\n\n### Installation Guide for Linux\n\nWe recommend CUDA versions 12.4 or 11.8 for the manual installation.\n\nConda's installation instructions are available [here](https:\u002F\u002Fdocs.anaconda.com\u002Ffree\u002Fminiconda\u002Findex.html).\n\n```shell\n# 1. Create conda environment\nconda create -n HunyuanVideo-Avatar python==3.10.9\n\n# 2. Activate the environment\nconda activate HunyuanVideo-Avatar\n\n# 3. Install PyTorch and other dependencies using conda\n# For CUDA 11.8\nconda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia\n# For CUDA 12.4\nconda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia\n\n# 4. Install pip dependencies\npython -m pip install -r requirements.txt\n# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)\npython -m pip install ninja\npython -m pip install git+https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention.git@v2.6.3\n```\n\nIn case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions:\n\n```shell\n# Option 1: Making sure you have installed CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00 (or simply using our CUDA 12 docker image).\npip install nvidia-cublas-cu12==12.4.5.8\nexport LD_LIBRARY_PATH=\u002Fopt\u002Fconda\u002Flib\u002Fpython3.8\u002Fsite-packages\u002Fnvidia\u002Fcublas\u002Flib\u002F\n\n# Option 2: Forcing to explicitly use the CUDA 11.8 compiled version of Pytorch and all the other packages\npip uninstall -r requirements.txt  # uninstall all packages\npip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\npip install -r requirements.txt\npip install ninja\npip install git+https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention.git@v2.6.3\n```\n\nAdditionally, you can also use HunyuanVideo Docker image. Use the following command to pull and run the docker image.\n\n```shell\n# For CUDA 12.4 (updated to avoid float point exception)\ndocker pull hunyuanvideo\u002Fhunyuanvideo:cuda_12\ndocker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo\u002Fhunyuanvideo:cuda_12\npip install gradio==3.39.0 diffusers==0.33.0 transformers==4.41.2\n\n# For CUDA 11.8\ndocker pull hunyuanvideo\u002Fhunyuanvideo:cuda_11\ndocker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo\u002Fhunyuanvideo:cuda_11\npip install gradio==3.39.0 diffusers==0.33.0 transformers==4.41.2\n```\n\n\n## 🧱 Download Pretrained Models\n\nThe details of download pretrained models are shown [here](weights\u002FREADME.md).\n\n## 🚀 Parallel Inference on Multiple GPUs\n\nFor example, to generate a video with 8 GPUs, you can use the following command:\n\n```bash\ncd HunyuanVideo-Avatar\n\nJOBS_DIR=$(dirname $(dirname \"$0\"))\nexport PYTHONPATH=.\u002F\nexport MODEL_BASE=\".\u002Fweights\"\ncheckpoint_path=${MODEL_BASE}\u002Fckpts\u002Fhunyuan-video-t2v-720p\u002Ftransformers\u002Fmp_rank_00_model_states.pt\n\ntorchrun --nnodes=1 --nproc_per_node=8 --master_port 29605 hymm_sp\u002Fsample_batch.py \\\n    --input 'assets\u002Ftest.csv' \\\n    --ckpt ${checkpoint_path} \\\n    --sample-n-frames 129 \\\n    --seed 128 \\\n    --image-size 704 \\\n    --cfg-scale 7.5 \\\n    --infer-steps 50 \\\n    --use-deepcache 1 \\\n    --flow-shift-eval-video 5.0 \\\n    --save-path ${OUTPUT_BASEPATH} \n```\n\n## 🔑 Single-gpu Inference\n\nFor example, to generate a video with 1 GPU, you can use the following command:\n\n```bash\ncd HunyuanVideo-Avatar\n\nJOBS_DIR=$(dirname $(dirname \"$0\"))\nexport PYTHONPATH=.\u002F\n\nexport MODEL_BASE=.\u002Fweights\nOUTPUT_BASEPATH=.\u002Fresults-single\ncheckpoint_path=${MODEL_BASE}\u002Fckpts\u002Fhunyuan-video-t2v-720p\u002Ftransformers\u002Fmp_rank_00_model_states_fp8.pt\n\nexport DISABLE_SP=1 \nCUDA_VISIBLE_DEVICES=0 python3 hymm_sp\u002Fsample_gpu_poor.py \\\n    --input 'assets\u002Ftest.csv' \\\n    --ckpt ${checkpoint_path} \\\n    --sample-n-frames 129 \\\n    --seed 128 \\\n    --image-size 704 \\\n    --cfg-scale 7.5 \\\n    --infer-steps 50 \\\n    --use-deepcache 1 \\\n    --flow-shift-eval-video 5.0 \\\n    --save-path ${OUTPUT_BASEPATH} \\\n    --use-fp8 \\\n    --infer-min\n```\n\n### Run with very low VRAM\n\n```bash\ncd HunyuanVideo-Avatar\n\nJOBS_DIR=$(dirname $(dirname \"$0\"))\nexport PYTHONPATH=.\u002F\n\nexport MODEL_BASE=.\u002Fweights\nOUTPUT_BASEPATH=.\u002Fresults-poor\n\ncheckpoint_path=${MODEL_BASE}\u002Fckpts\u002Fhunyuan-video-t2v-720p\u002Ftransformers\u002Fmp_rank_00_model_states_fp8.pt\n\nexport CPU_OFFLOAD=1\nCUDA_VISIBLE_DEVICES=0 python3 hymm_sp\u002Fsample_gpu_poor.py \\\n    --input 'assets\u002Ftest.csv' \\\n    --ckpt ${checkpoint_path} \\\n    --sample-n-frames 129 \\\n    --seed 128 \\\n    --image-size 704 \\\n    --cfg-scale 7.5 \\\n    --infer-steps 50 \\\n    --use-deepcache 1 \\\n    --flow-shift-eval-video 5.0 \\\n    --save-path ${OUTPUT_BASEPATH} \\\n    --use-fp8 \\\n    --cpu-offload \\\n    --infer-min\n```\n\n### Run with 10GB VRAM GPU (TeaCache supported)\n\nThanks to [Wan2GP](https:\u002F\u002Fgithub.com\u002Fdeepbeepmeep\u002FWan2GP), HunyuanVideo-Avatar now supports single GPU mode with even lower VRAM (10GB) without quality degradation. Check out this [great repo](https:\u002F\u002Fgithub.com\u002Fdeepbeepmeep\u002FWan2GP\u002Ftree\u002Fmain\u002Fhyvideo).\n\n\n## Run a Gradio Server\n```bash\ncd HunyuanVideo-Avatar\n\nbash .\u002Fscripts\u002Frun_gradio.sh\n\n```\n\n## 🔗 BibTeX\n\nIf you find [HunyuanVideo-Avatar](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.20156) useful for your research and applications, please cite using this BibTeX:\n\n```BibTeX\n@misc{hu2025HunyuanVideo-Avatar,\n      title={HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters}, \n      author={Yi Chen and Sen Liang and Zixiang Zhou and Ziyao Huang and Yifeng Ma and Junshu Tang and Qin Lin and Yuan Zhou and Qinglin Lu},\n      year={2025},\n      eprint={2505.20156},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.20156}, \n}\n```\n\n## Acknowledgements\n\nWe would like to thank the contributors to the [HunyuanVideo](https:\u002F\u002Fgithub.com\u002FTencent\u002FHunyuanVideo), [SD3](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-3-medium), [FLUX](https:\u002F\u002Fgithub.com\u002Fblack-forest-labs\u002Fflux), [Llama](https:\u002F\u002Fgithub.com\u002Fmeta-llama\u002Fllama), [LLaVA](https:\u002F\u002Fgithub.com\u002Fhaotian-liu\u002FLLaVA), [Xtuner](https:\u002F\u002Fgithub.com\u002FInternLM\u002Fxtuner), [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) and [HuggingFace](https:\u002F\u002Fhuggingface.co) repositories, for their open research and exploration. \n","HunyuanVideo-Avatar 是一个基于音频驱动的高保真人形动画生成项目。该项目通过深度学习技术，能够根据输入的音频生成多个人物的动画视频，支持单GPU运行且仅需10GB显存，大幅降低了硬件门槛。其核心技术包括高效的TeaCache机制和优化的模型结构，确保了在低资源环境下的高性能表现。HunyuanVideo-Avatar 适用于需要快速创建高质量人形动画的场景，如虚拟主播、游戏角色动画制作以及在线教育等领域，为内容创作者提供了强大的工具支持。",2,"2026-06-11 03:42:42","high_star"]