[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72115":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":16,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},72115,"AniPortrait","Zejun-Yang\u002FAniPortrait","Zejun-Yang","AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation",null,"Python",5021,619,60,76,0,5,15,77.38,"Apache License 2.0",false,"main",[],"2026-06-12 04:01:03","# AniPortrait\n\n**AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations**\n\nAuthor: Huawei Wei, Zejun Yang, Zhisheng Wang\n\nOrganization: Tencent Games Zhiji, Tencent\n\n![zhiji_logo](asset\u002Fzhiji_logo.png)\n\nHere we propose AniPortrait, a novel framework for generating high-quality animation driven by \naudio and a reference portrait image. You can also provide a video to achieve face reenacment.\n\n\u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.17694'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-red'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fhuggingface.co\u002FZJYang\u002FAniPortrait\u002Ftree\u002Fmain'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-orange'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZJYang\u002FAniPortrait_official'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Demo-green'>\u003C\u002Fa>\n\n## Pipeline\n\n![pipeline](asset\u002Fpipeline.png)\n\n## Updates \u002F TODO List\n\n- ✅ [2024\u002F03\u002F27] Now our paper is available on arXiv.\n\n- ✅ [2024\u002F03\u002F27] Update the code to generate pose_temp.npy for head pose control.\n\n- ✅ [2024\u002F04\u002F02] Update a new pose retarget strategy for vid2vid. Now we support substantial pose difference between ref_image and source video.\n\n- ✅ [2024\u002F04\u002F03] We release our Gradio [demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZJYang\u002FAniPortrait_official) on HuggingFace Spaces (thanks to the HF team for their free GPU support)!\n\n- ✅ [2024\u002F04\u002F07] Update a frame interpolation module to accelerate the inference process. Now you can add -acc in inference commands to get a faster video generation.\n\n- ✅ [2024\u002F04\u002F21] We have released the audio2pose model and [pre-trained weight](https:\u002F\u002Fhuggingface.co\u002FZJYang\u002FAniPortrait\u002Ftree\u002Fmain) for audio2video. Please update the code and download the weight file to experience.\n\n## Various Generated Videos\n\n### Self driven\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002FZejun-Yang\u002FAniPortrait\u002Fassets\u002F21038147\u002F82c0f0b0-9c7c-4aad-bf0e-27e6098ffbe1\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002FZejun-Yang\u002FAniPortrait\u002Fassets\u002F21038147\u002F51a502d9-1ce2-48d2-afbe-767a0b9b9166\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### Face reenacment\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002FZejun-Yang\u002FAniPortrait\u002Fassets\u002F21038147\u002Fd4e0add6-20a2-4f4b-808c-530a6f4d3331\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002FZejun-Yang\u002FAniPortrait\u002Fassets\u002F21038147\u002F849fce22-0db1-4257-a75f-a5dc655e6b9e\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\nVideo Source: [鹿火CAVY from bilibili](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1H4421F7dE\u002F?spm_id_from=333.337.search-card.all.click)\n\n### Audio driven\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002FZejun-Yang\u002FAniPortrait\u002Fassets\u002F21038147\u002F63171e5a-e4c1-4383-8f20-9764524928d0\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002FZejun-Yang\u002FAniPortrait\u002Fassets\u002F21038147\u002F6fd74024-ba19-4f6b-b37a-10df5cf2c934\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003Ctr>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002FZejun-Yang\u002FAniPortrait\u002Fassets\u002F21038147\u002F9e516cc5-bf09-4d45-b5e3-820030764982\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002FZejun-Yang\u002FAniPortrait\u002Fassets\u002F21038147\u002F7c68148b-8022-453f-be9a-c69590038197\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## Installation\n\n### Build environment\n\nWe recommend a python version >=3.10 and cuda version =11.7. Then build environment as follows:\n\n```shell\npip install -r requirements.txt\n```\n\n### Download weights\n\nAll the weights should be placed under the `.\u002Fpretrained_weights` direcotry. You can download weights manually as follows:\n\n1. Download our trained [weights](https:\u002F\u002Fhuggingface.co\u002FZJYang\u002FAniPortrait\u002Ftree\u002Fmain), which include the following parts: `denoising_unet.pth`, `reference_unet.pth`, `pose_guider.pth`, `motion_module.pth`, `audio2mesh.pt`, `audio2pose.pt` and `film_net_fp16.pt`. You can also download from [wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002Fzjyang8510\u002FAniPortrait).\n\n2. Download pretrained weight of based models and other components: \n    - [StableDiffusion V1.5](https:\u002F\u002Fhuggingface.co\u002Frunwayml\u002Fstable-diffusion-v1-5)\n    - [sd-vae-ft-mse](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fsd-vae-ft-mse)\n    - [image_encoder](https:\u002F\u002Fhuggingface.co\u002Flambdalabs\u002Fsd-image-variations-diffusers\u002Ftree\u002Fmain\u002Fimage_encoder)\n    - [wav2vec2-base-960h](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fwav2vec2-base-960h)\n\nFinally, these weights should be orgnized as follows:\n\n```text\n.\u002Fpretrained_weights\u002F\n|-- image_encoder\n|   |-- config.json\n|   `-- pytorch_model.bin\n|-- sd-vae-ft-mse\n|   |-- config.json\n|   |-- diffusion_pytorch_model.bin\n|   `-- diffusion_pytorch_model.safetensors\n|-- stable-diffusion-v1-5\n|   |-- feature_extractor\n|   |   `-- preprocessor_config.json\n|   |-- model_index.json\n|   |-- unet\n|   |   |-- config.json\n|   |   `-- diffusion_pytorch_model.bin\n|   `-- v1-inference.yaml\n|-- wav2vec2-base-960h\n|   |-- config.json\n|   |-- feature_extractor_config.json\n|   |-- preprocessor_config.json\n|   |-- pytorch_model.bin\n|   |-- README.md\n|   |-- special_tokens_map.json\n|   |-- tokenizer_config.json\n|   `-- vocab.json\n|-- audio2mesh.pt\n|-- audio2pose.pt\n|-- denoising_unet.pth\n|-- film_net_fp16.pt\n|-- motion_module.pth\n|-- pose_guider.pth\n`-- reference_unet.pth\n```\n\nNote: If you have installed some of the pretrained models, such as `StableDiffusion V1.5`, you can specify their paths in the config file (e.g. `.\u002Fconfig\u002Fprompts\u002Fanimation.yaml`).\n\n\n## Gradio Web UI\n\nYou can try out our web demo by the following command. We alse provide online demo \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZJYang\u002FAniPortrait_official'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Demo-green'>\u003C\u002Fa> in Huggingface Spaces.\n\n\n```shell\npython -m scripts.app\n```\n\n## Inference\n\nKindly note that you can set -L to the desired number of generating frames in the command, for example, `-L 300`.\n\n**Acceleration method**: If it takes long time to generate a video, you can download [film_net_fp16.pt](https:\u002F\u002Fhuggingface.co\u002FZJYang\u002FAniPortrait\u002Ftree\u002Fmain) and put it under the `.\u002Fpretrained_weights` direcotry. Then add `-acc` in the command.\n\nHere are the cli commands for running inference scripts:\n\n### Self driven\n\n```shell\npython -m scripts.pose2vid --config .\u002Fconfigs\u002Fprompts\u002Fanimation.yaml -W 512 -H 512 -acc\n```\n\nYou can refer the format of animation.yaml to add your own reference images or pose videos. To convert the raw video into a pose video (keypoint sequence), you can run with the following command:\n\n```shell\npython -m scripts.vid2pose --video_path pose_video_path.mp4\n```\n\n### Face reenacment\n\n```shell\npython -m scripts.vid2vid --config .\u002Fconfigs\u002Fprompts\u002Fanimation_facereenac.yaml -W 512 -H 512 -acc\n```\n\nAdd source face videos and reference images in the animation_facereenac.yaml.\n\n### Audio driven\n\n```shell\npython -m scripts.audio2vid --config .\u002Fconfigs\u002Fprompts\u002Fanimation_audio.yaml -W 512 -H 512 -acc\n```\n\nAdd audios and reference images in the animation_audio.yaml.\n\nDelete `pose_temp` in `.\u002Fconfigs\u002Fprompts\u002Fanimation_audio.yaml` can enable the audio2pose model.\n\nYou can also use this command to generate a pose_temp.npy for head pose control:\n\n```shell\npython -m scripts.generate_ref_pose --ref_video .\u002Fconfigs\u002Finference\u002Fhead_pose_temp\u002Fpose_ref_video.mp4 --save_path .\u002Fconfigs\u002Finference\u002Fhead_pose_temp\u002Fpose.npy\n```\n\n## Training\n\n### Data preparation\nDownload [VFHQ](https:\u002F\u002Fliangbinxie.github.io\u002Fprojects\u002Fvfhq\u002F) and [CelebV-HQ](https:\u002F\u002Fgithub.com\u002FCelebV-HQ\u002FCelebV-HQ) \n\nExtract keypoints from raw videos and write training json file (here is an example of processing VFHQ): \n\n```shell\npython -m scripts.preprocess_dataset --input_dir VFHQ_PATH --output_dir SAVE_PATH --training_json JSON_PATH\n```\n\nUpdate lines in the training config file: \n\n```yaml\ndata:\n  json_path: JSON_PATH\n```\n\n### Stage1\n\nRun command:\n\n```shell\naccelerate launch train_stage_1.py --config .\u002Fconfigs\u002Ftrain\u002Fstage1.yaml\n```\n\n### Stage2\n\nPut the pretrained motion module weights `mm_sd_v15_v2.ckpt` ([download link](https:\u002F\u002Fhuggingface.co\u002Fguoyww\u002Fanimatediff\u002Fblob\u002Fmain\u002Fmm_sd_v15_v2.ckpt)) under `.\u002Fpretrained_weights`. \n\nSpecify the stage1 training weights in the config file `stage2.yaml`, for example:\n\n```yaml\nstage1_ckpt_dir: '.\u002Fexp_output\u002Fstage1'\nstage1_ckpt_step: 30000 \n```\n\nRun command:\n\n```shell\naccelerate launch train_stage_2.py --config .\u002Fconfigs\u002Ftrain\u002Fstage2.yaml\n```\n\n## Acknowledgements\n\nWe first thank the authors of [EMO](https:\u002F\u002Fgithub.com\u002FHumanAIGC\u002FEMO), and part of the images and audios in our demos are from EMO. Additionally, we would like to thank the contributors to the [Moore-AnimateAnyone](https:\u002F\u002Fgithub.com\u002FMooreThreads\u002FMoore-AnimateAnyone), [majic-animate](https:\u002F\u002Fgithub.com\u002Fmagic-research\u002Fmagic-animate), [animatediff](https:\u002F\u002Fgithub.com\u002Fguoyww\u002FAnimateDiff) and [Open-AnimateAnyone](https:\u002F\u002Fgithub.com\u002Fguoqincode\u002FOpen-AnimateAnyone) repositories, for their open research and exploration.\n\n## Citation\n\n```\n@misc{wei2024aniportrait,\n      title={AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations}, \n      author={Huawei Wei and Zejun Yang and Zhisheng Wang},\n      year={2024},\n      eprint={2403.17694},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```","AniPortrait 是一个基于音频驱动的高真实感肖像动画合成框架。该项目通过结合音频输入和参考肖像图像，生成高质量的面部动画，并支持视频输入以实现面部重现。其核心功能包括音频到姿态模型、帧插值加速模块以及灵活的姿态重定向策略，使得在不同姿势差异下的动画生成更为流畅自然。采用Python开发，具有良好的可扩展性和易用性。适用于需要高质量虚拟角色或数字人动态表现的场景，如游戏开发、影视制作及虚拟主播等。",2,"2026-06-11 03:40:27","high_star"]