[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72316":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":29,"discoverSource":30},72316,"Sonic","jixiaozhong\u002FSonic","jixiaozhong","Official implementation of \"Sonic: Shifting Focus to Global Audio Perception in Portrait Animation\"",null,"Python",3250,289,56,66,0,2,4,9,6,66.29,"Other",false,"main",true,[],"2026-06-12 04:01:04","# Sonic\nSonic: Shifting Focus to Global Audio Perception in Portrait Animation, CVPR 2025.\n\n\n\u003Ca href='https:\u002F\u002Fjixiaozhong.github.io\u002FSonic\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa>\n\u003Ca href=\"http:\u002F\u002Fdemo.sonic.jixiaozhong.online\u002F\" style=\"margin: 0 2px;\">\n    \u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-Gradio-gold?style=flat&logo=Gradio&logoColor=red' alt='Demo'>\n  \u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2025\u002Fpapers\u002FJi_Sonic_Shifting_Focus_to_Global_Audio_Perception_in_Portrait_Animation_CVPR_2025_paper.pdf'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-red'>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fxiaozhongji\u002FSonic\" style=\"margin: 0 2px;\">\n    \u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSpace-ZeroGPU-orange?style=flat&logo=Gradio&logoColor=red' alt='Demo'>\n    \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fjixiaozhong\u002FSonic\u002Frefs\u002Fheads\u002Fmain\u002FLICENSE\" style=\"margin: 0 2px;\">\n    \u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'>\n  \u003C\u002Fa>\n\n\u003Cp align=\"center\">\n    👋 Join our \u003Ca href=\"examples\u002Fimage\u002FQQ2.jpg\" target=\"_blank\">QQ Chat Group\u003C\u002Fa> \n\u003C\u002Fp>\n\u003Cp align=\"center\">\n\n\n## 🔥🔥🔥 NEWS\n**`2025\u002F05\u002F06`**: We have open-sourced [**​​DICE-Talk**](https:\u002F\u002Fgithub.com\u002Ftoto222\u002FDICE-Talk)​​, a portrait-driven system with emotional expression. Welcome to try it out!\n\n**`2025\u002F03\u002F14`**: Super stoked to share that our Sonic is accpted by the CVPR 2025! See you Nashville!!\n\n**`2025\u002F02\u002F08`**: Many thanks to the open-source community contributors for making the ComfyUI version of Sonic a reality. Your efforts are truly appreciated! [**ComfyUI version of Sonic**](https:\u002F\u002Fgithub.com\u002Fsmthemex\u002FComfyUI_Sonic)\n\n**`2025\u002F02\u002F06`**: Commercialization: Note that our license is **non-commercial**. If commercialization is required, please use Tencent Cloud Video Creation Large Model: [**Introduction**](https:\u002F\u002Fcloud.tencent.com\u002Fproduct\u002Fvclm) \u002F [**API documentation**](https:\u002F\u002Fcloud.tencent.com\u002Fdocument\u002Fapi\u002F1616\u002F109378)\n\n**`2025\u002F01\u002F17`**: Our [**Online huggingface Demo**](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fxiaozhongji\u002FSonic\u002F) is released.\n\n**`2025\u002F01\u002F17`**: Thank you to NewGenAI for promoting our Sonic and creating a Windows-based tutorial on [**YouTube**](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=KiDDtcvQyS0).\n\n**`2024\u002F12\u002F16`**: Our [**Online Demo**](http:\u002F\u002Fdemo.sonic.jixiaozhong.online\u002F) is released.\n\n\n## 🎥 Demo\n| Input                | Output                | Input                | Output                |\n|----------------------|-----------------------|----------------------|-----------------------|\n|\u003Cimg src=\"examples\u002Fimage\u002Fanime1.png\" width=\"360\">|\u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F636c3ff5-210e-44b8-b901-acf828071133\" width=\"360\"> \u003C\u002Fvideo>|\u003Cimg src=\"examples\u002Fimage\u002Ffemale_diaosu.png\" width=\"360\">|\u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe8207300-2569-47d1-9ad4-4b4c9b0f0bd4\" width=\"360\"> \u003C\u002Fvideo>|\n|\u003Cimg src=\"examples\u002Fimage\u002Fhair.png\" width=\"360\">|\u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fdcb755c1-de01-4afe-8b4f-0e0b2c2439c1\" width=\"360\"> \u003C\u002Fvideo>|\u003Cimg src=\"examples\u002Fimage\u002Fleonnado.jpg\" width=\"360\">|\u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fb50e61bb-62d4-469d-b402-b37cda3fbd27\" width=\"360\"> \u003C\u002Fvideo>|\n\n\nFor more visual demos, please visit our [**Page**](https:\u002F\u002Fjixiaozhong.github.io\u002FSonic\u002F).\n\n## 🧩 Community Contributions\nIf you develop\u002Fuse Sonic in your projects, welcome to let us know.\n\n- ComfyUI version of Sonic: [**ComfyUI_Sonic**](https:\u002F\u002Fgithub.com\u002Fsmthemex\u002FComfyUI_Sonic)\n\n\n## 📑 Updates\n**`2025\u002F01\u002F14`**: Our inference code and weights are released. Stay tuned, we will continue to polish the model.\n\n\n## 📜 Requirements\n* An NVIDIA GPU with CUDA support is required. \n  * The model is tested on a single 32G GPU.\n* Tested operating system: Linux\n\n## 🔑 Inference\n\n### Installtion\n\n- install pytorch\n```shell\n  pip3 install -r requirements.txt\n```\n- All models are stored in `checkpoints` by default, and the file structure is as follows\n```shell\nSonic\n  ├──checkpoints\n  │  ├──Sonic\n  │  │  ├──audio2bucket.pth\n  │  │  ├──audio2token.pth\n  │  │  ├──unet.pth\n  │  ├──stable-video-diffusion-img2vid-xt\n  │  │  ├──...\n  │  ├──whisper-tiny\n  │  │  ├──...\n  │  ├──RIFE\n  │  │  ├──flownet.pkl\n  │  ├──yoloface_v5m.pt\n  ├──...\n```\nDownload by `huggingface-cli` follow\n```shell\n  python3 -m pip install \"huggingface_hub[cli]\"\n  huggingface-cli download LeonJoe13\u002FSonic --local-dir  checkpoints\n  huggingface-cli download stabilityai\u002Fstable-video-diffusion-img2vid-xt --local-dir  checkpoints\u002Fstable-video-diffusion-img2vid-xt\n  huggingface-cli download openai\u002Fwhisper-tiny --local-dir checkpoints\u002Fwhisper-tiny\n```\n\nor manully download [pretrain model](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1oe8VTPUy0-MHHW2a_NJ1F8xL-0VN5G7W?usp=drive_link), [svd-xt](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-video-diffusion-img2vid-xt) and [whisper-tiny](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-tiny) to checkpoints\u002F \n\n\n### Run demo\n```shell\n  python3 demo.py \\\n  '\u002Fpath\u002Fto\u002Finput_image' \\\n  '\u002Fpath\u002Fto\u002Finput_audio' \\\n  '\u002Fpath\u002Fto\u002Foutput_video'\n```\n\n\n\n \n## 🔗 Citation\n\nIf you find our work helpful for your research, please consider citing our work.   \n\n```bibtex\n@inproceedings{ji2025sonic,\n  title={Sonic: Shifting focus to global audio perception in portrait animation},\n  author={Ji, Xiaozhong and Hu, Xiaobin and Xu, Zhihong and Zhu, Junwei and Lin, Chuming and He, Qingdong and Zhang, Jiangning and Luo, Donghao and Chen, Yi and Lin, Qin and others},\n  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},\n  pages={193--203},\n  year={2025}\n}\n\n@article{ji2024realtalk,\n  title={Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network},\n  author={Ji, Xiaozhong and Lin, Chuming and Ding, Zhonggan and Tai, Ying and Zhu, Junwei and Hu, Xiaobin and Luo, Donghao and Ge, Yanhao and Wang, Chengjie},\n  journal={arXiv preprint arXiv:2406.18284},\n  year={2024}\n}\n\n@article{tan2025dicetalk,\n  title={Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation}, \n  author={Tan, Weipeng and Lin, Chuming and Xu, Chengming and Xu, FeiFan and Hu, Xiaobin and Ji, Xiaozhong and Zhu, Junwei and Wang, Chengjie and Fu, Yanwei},\n  journal={arXiv preprint arXiv:2504.18087},\n  year={2025}\n}\n```\n\n## 📜 Related Works\n\nExplore our related researches:\n-  **[Super-fast talk：real-time and less GPU computation]** [Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.18284)\n\n## 📈 Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=jixiaozhong\u002FSonic&type=Date)](https:\u002F\u002Fstar-history.com\u002F#jixiaozhong\u002FSonic&Date)\n","Sonic 是一个专注于在肖像动画中实现全局音频感知的项目。其核心功能是通过分析音频输入来驱动肖像动画，使得动画角色的表情和口型能够自然地与音频同步。该项目使用 Python 开发，并基于深度学习技术，支持多种输入源，包括但不限于语音和音乐。此外，Sonic 提供了在线演示和Gradio界面，方便用户测试和体验。由于采用了非商业许可（CC BY-NC-SA 4.0），它非常适合用于学术研究、个人项目以及教育领域内的探索与应用。对于寻求将音频与视觉内容结合以创造更加生动交互体验的研究者和开发者来说，Sonic 是一个值得考虑的选择。","2026-06-11 03:41:20","high_star"]