[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72178":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},72178,"echomimic","antgroup\u002Fechomimic","antgroup","[AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning","https:\u002F\u002Fantgroup.github.io\u002Fai\u002Fechomimic\u002F",null,"Python",4250,461,48,111,0,6,8,22,18,29.99,"Apache License 2.0",false,"main",[26,27,28,29,30,31],"aaai2025","audio-driven-portrait-animations","audio-driven-talking-face","human-animation","talking-face-generation","talking-head","2026-06-12 02:02:59","\u003Ch1 align='center'>EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning\u003C\u002Fh1>\n\n\u003Cdiv align='center'>\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fyuange250' target='_blank'>Zhiyuan Chen\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002FJoeFannie' target='_blank'>Jiajiong Cao\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002FoctavianChen' target='_blank'>Zhiquan Chen\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Flymhust.github.io\u002F' target='_blank'>Yuming Li\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Chenguang_Ma3' target='_blank'>Chenguang Ma\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>\n\u003C\u002Fdiv>\n\u003Cdiv align='center'>\n    \u003Csup>1\u003C\u002Fsup>Equal Contribution&emsp;\n    \u003Csup>2\u003C\u002Fsup>Corresponding Authors\n\u003C\u002Fdiv>\n\n\u003Cdiv align='center'>\nTerminal Technology Department, Alipay, Ant Group.\n\u003C\u002Fdiv>\n\u003Cbr>\n\u003Cdiv align='center'>\n    \u003Ca href='https:\u002F\u002Fantgroup.github.io\u002Fai\u002Fechomimic\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-blue'>\u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Fhuggingface.co\u002FBadToBest\u002FEchoMimic'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20HuggingFace-Model-yellow'>\u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FBadToBest\u002FEchoMimic'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20HuggingFace-Demo-yellow'>\u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FBadToBest\u002FEchoMimic'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Model-purple'>\u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002FBadToBest\u002FBadToBest'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Demo-purple'>\u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.08136'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-red'>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n## &#x1F680; EchoMimic Series\n* EchoMimicV1: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning. [GitHub](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic)\n* EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. [GitHub](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2)\n* EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation. [GitHub](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v3)\n\n## &#x1F4E3; Updates\n* [2024.12.10] 🔥 EchoMimic is accepted by AAAI 2025.\n* [2024.11.21] 🔥🔥🔥 We release our [EchoMimicV2](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2) codes and models.\n* [2024.08.02] 🔥 EchoMimic is now available on [huggingface](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FBadToBest\u002FEchoMimic) with A100 GPU. Thanks Wenmeng Zhou@ModelScope.\n* [2024.07.25] 🔥🔥🔥 Accelerated models and pipe on **Audio Driven** are released. The inference speed can be improved by **10x** (from ~7mins\u002F240frames to ~50s\u002F240frames on V100 GPU)\n* [2024.07.23] 🔥 EchoMimic gradio demo on [modelscope](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002FBadToBest\u002FBadToBest) is ready.\n* [2024.07.23] 🔥 EchoMimic gradio demo on [huggingface](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffffiloni\u002FEchoMimic) is ready. Thanks Sylvain Filoni@fffiloni.\n* [2024.07.17] 🔥🔥🔥 Accelerated models and pipe on **Audio + Selected Landmarks** are released. The inference speed can be improved by **10x** (from ~7mins\u002F240frames to ~50s\u002F240frames on V100 GPU)\n* [2024.07.14] 🔥 [ComfyUI](https:\u002F\u002Fgithub.com\u002Fsmthemex\u002FComfyUI_EchoMimic) is now available. Thanks @smthemex for the contribution. \n* [2024.07.13] 🔥 Thanks [NewGenAI](https:\u002F\u002Fwww.youtube.com\u002F@StableAIHub) for the [video installation tutorial](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8R0lTIY7tfI).\n* [2024.07.13] 🔥 We release our pose&audio driven codes and models.\n* [2024.07.12] 🔥 WebUI and GradioUI versions are released. We thank @greengerong @Robin021 and @O-O1024 for their contributions.\n* [2024.07.12] 🔥 Our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.08136) is in public on arxiv.\n* [2024.07.09] 🔥 We release our audio driven codes and models.\n\n## &#x1F305; Gallery\n### Audio Driven (Sing)\n\n\u003Ctable class=\"center\">\n    \n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002Fd014d921-9f94-4640-97ad-035b00effbfe\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002F877603a5-a4f9-4486-a19f-8888422daf78\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002Fe0cb5afb-40a6-4365-84f8-cb2834c4cfe7\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003C\u002Ftable>\n\n### Audio Driven (English)\n\n\u003Ctable class=\"center\">\n    \n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002F386982cd-3ff8-470d-a6d9-b621e112f8a5\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002F5c60bb91-1776-434e-a720-8857a00b1501\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002F1f15adc5-0f33-4afa-b96a-2011886a4a06\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003C\u002Ftable>\n\n### Audio Driven (Chinese)\n\n\u003Ctable class=\"center\">\n    \n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002Fa8092f9a-a5dc-4cd6-95be-1831afaccf00\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002Fc8b5c59f-0483-42ef-b3ee-4cffae6c7a52\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002F532a3e60-2bac-4039-a06c-ff6bf06cb4a4\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003C\u002Ftable>\n\n### Landmark Driven\n\n\u003Ctable class=\"center\">\n    \n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002F1da6c46f-4532-4375-a0dc-0a4d6fd30a39\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002Fd4f4d5c1-e228-463a-b383-27fb90ed6172\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002F18bd2c93-319e-4d1c-8255-3f02ba717475\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003C\u002Ftable>\n\n### Audio + Selected Landmark Driven\n\n\u003Ctable class=\"center\">\n    \n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002F4a29d735-ec1b-474d-b843-3ff0bdf85f55\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002Fb994c8f5-8dae-4dd8-870f-962b50dc091f\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic\u002Fassets\u002F11451501\u002F955c1d51-07b2-494d-ab93-895b9c43b896\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003C\u002Ftable>\n\n**（Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.）**\n\n## ⚒️ Installation\n\n### Download the Codes\n\n```bash\n  git clone https:\u002F\u002Fgithub.com\u002FBadToBest\u002FEchoMimic\n  cd EchoMimic\n```\n\n### Python Environment Setup\n\n- Tested System Environment: Centos 7.2\u002FUbuntu 22.04, Cuda >= 11.7\n- Tested GPUs: A100(80G) \u002F RTX4090D (24G) \u002F V100(16G)\n- Tested Python Version: 3.8 \u002F 3.10 \u002F 3.11\n\nCreate conda environment (Recommended):\n\n```bash\n  conda create -n echomimic python=3.8\n  conda activate echomimic\n```\n\nInstall packages with `pip`\n```bash\n  pip install -r requirements.txt\n```\n\n### Download ffmpeg-static\nDownload and decompress [ffmpeg-static](https:\u002F\u002Fwww.johnvansickle.com\u002Fffmpeg\u002Fold-releases\u002Fffmpeg-4.4-amd64-static.tar.xz), then\n```\nexport FFMPEG_PATH=\u002Fpath\u002Fto\u002Fffmpeg-4.4-amd64-static\n```\n\n### Download pretrained weights\n\n```shell\ngit lfs install\ngit clone https:\u002F\u002Fhuggingface.co\u002FBadToBest\u002FEchoMimic pretrained_weights\n```\n\nThe **pretrained_weights** is organized as follows.\n\n```\n.\u002Fpretrained_weights\u002F\n├── denoising_unet.pth\n├── reference_unet.pth\n├── motion_module.pth\n├── face_locator.pth\n├── sd-vae-ft-mse\n│   └── ...\n├── sd-image-variations-diffusers\n│   └── ...\n└── audio_processor\n    └── whisper_tiny.pt\n```\n\nIn which **denoising_unet.pth** \u002F **reference_unet.pth** \u002F **motion_module.pth** \u002F **face_locator.pth** are the main checkpoints of **EchoMimic**. Other models in this hub can be also downloaded from it's original hub, thanks to their brilliant works:\n- [sd-vae-ft-mse](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fsd-vae-ft-mse)\n- [sd-image-variations-diffusers](https:\u002F\u002Fhuggingface.co\u002Flambdalabs\u002Fsd-image-variations-diffusers)\n- [audio_processor(whisper)](https:\u002F\u002Fopenaipublic.azureedge.net\u002Fmain\u002Fwhisper\u002Fmodels\u002F65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9\u002Ftiny.pt)\n\n### Audio-Drived Algo Inference \nRun the python inference script:\n\n```bash\n  python -u infer_audio2vid.py\n  python -u infer_audio2vid_pose.py\n```\n\n### Audio-Drived Algo Inference On Your Own Cases \n\nEdit the inference config file **.\u002Fconfigs\u002Fprompts\u002Fanimation.yaml**, and add your own case:\n\n```bash\ntest_cases:\n  \"path\u002Fto\u002Fyour\u002Fimage\":\n    - \"path\u002Fto\u002Fyour\u002Faudio\"\n```\n\nThe run the python inference script:\n```bash\n  python -u infer_audio2vid.py\n```\n\n### Motion Alignment between Ref. Img. and Driven Vid.\n\n(Firstly download the checkpoints with '_pose.pth' postfix from huggingface)\n\nEdit driver_video and ref_image to your path in demo_motion_sync.py, then run\n```bash\n  python -u demo_motion_sync.py\n```\n\n### Audio&Pose-Drived Algo Inference\nEdit .\u002Fconfigs\u002Fprompts\u002Fanimation_pose.yaml, then run\n```bash\n  python -u infer_audio2vid_pose.py\n```\n\n### Pose-Drived Algo Inference\nSet draw_mouse=True in line 135 of infer_audio2vid_pose.py. Edit .\u002Fconfigs\u002Fprompts\u002Fanimation_pose.yaml, then run\n```bash\n  python -u infer_audio2vid_pose.py\n```\n\n### Run the Gradio UI\n\nThanks to the contribution from @Robin021:\n\n```bash\n\npython -u webgui.py --server_port=3000\n\n```\n\n## 📝 Release Plans\n\n|  Status  | Milestone                                                                | ETA |\n|:--------:|:-------------------------------------------------------------------------|:--:|\n|    ✅    | The inference source code of the Audio-Driven algo meet everyone on GitHub   | 9th July, 2024 |\n|    ✅    | Pretrained models trained on English and Mandarin Chinese to be released | 9th July, 2024 |\n|    ✅    | The inference source code of the Pose-Driven algo meet everyone on GitHub   | 13th July, 2024 |\n|    ✅    | Pretrained models with better pose control to be released                | 13th July, 2024 |\n|    ✅    | Accelerated models to be released                                        | 17th July, 2024 |\n|    🚀    | Pretrained models with better sing performance to be released            | TBD |\n|    🚀    | Large-Scale and High-resolution Chinese-Based Talking Head Dataset       | TBD |\n\n## ⚖️ Disclaimer\nThis project is intended for academic research, and we explicitly disclaim any responsibility for user-generated content. Users are solely liable for their actions while using the generative model. The project contributors have no legal affiliation with, nor accountability for, users' behaviors. It is imperative to use the generative model responsibly, adhering to both ethical and legal standards.\n\n## 🙏🏻 Acknowledgements\n\nWe would like to thank the contributors to the [FollowYourEmoji](https:\u002F\u002Fgithub.com\u002Fmayuelala\u002FFollowYourEmoji), [AnimateDiff](https:\u002F\u002Fgithub.com\u002Fguoyww\u002FAnimateDiff), [Moore-AnimateAnyone](https:\u002F\u002Fgithub.com\u002FMooreThreads\u002FMoore-AnimateAnyone) and [MuseTalk](https:\u002F\u002Fgithub.com\u002FTMElyralab\u002FMuseTalk) repositories, for their open research and exploration. \n\nWe are also grateful to [V-Express](https:\u002F\u002Fgithub.com\u002Ftencent-ailab\u002FV-Express) and [hallo](https:\u002F\u002Fgithub.com\u002Ffudan-generative-vision\u002Fhallo) for their outstanding work in the area of diffusion-based talking heads.\n\nIf we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.\n\n## 📒 Citation\n\nIf you find our work useful for your research, please consider citing the paper :\n\n```\n@misc{chen2024echomimic,\n  title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},\n  author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},\n  year={2024},\n  eprint={2407.08136},\n  archivePrefix={arXiv},\n  primaryClass={cs.CV}\n}\n```\n\n## 🌟 Star History\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=antgroup\u002Fechomimic&type=Date)](https:\u002F\u002Fstar-history.com\u002F#antgroup\u002Fechomimic&Date)\n","EchoMimic 是一个通过可编辑的关键点条件生成逼真的音频驱动肖像动画的项目。其核心功能包括基于音频输入生成自然的人脸动画，并支持用户自定义关键点以调整面部表情，从而实现更精细的表情控制。该项目采用Python开发，利用先进的深度学习技术来提高生成动画的真实感和流畅度。特别适合用于虚拟人物创建、视频制作以及需要高质量语音驱动人脸动画的应用场景中。此外，EchoMimic 还提供了加速模型和优化管道，显著提升了推理速度，使得在资源受限环境下也能高效运行。",2,"2026-06-11 03:40:43","high_star"]