[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-70992":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},70992,"AudioGPT","AIGC-Audio\u002FAudioGPT","AIGC-Audio","AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head","https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAIGC-Audio\u002FAudioGPT",null,"Python",10178,856,131,44,0,1,43.8,"Other",false,"main",[23,24,25,26,27,28],"audio","gpt","music","sound","speech","talking-head","2026-06-12 02:02:46","# AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-\u003CCOLOR>.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12995)\n[![GitHub Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAIGC-Audio\u002FAudioGPT?style=social)](https:\u002F\u002Fgithub.com\u002FAIGC-Audio\u002FAudioGPT)\n![visitors](https:\u002F\u002Fvisitor-badge.glitch.me\u002Fbadge?page_id=AIGC-Audio.AudioGPT)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAIGC-Audio\u002FAudioGPT)\n\n\nWe provide our implementation and pretrained models as open source in this repository.\n\n\n## Get Started\n\nPlease refer to [run.md](run.md)\n\n\n## Capabilities\n\nHere we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to [asset](assets\u002FREADME.md).\n\nCurrently not every model has repository.\n### Speech\n|            Task            |   Supported Foundation Models   | Status |\n|:--------------------------:|:-------------------------------:|:------:|\n|       Text-to-Speech       | [FastSpeech](https:\u002F\u002Fgithub.com\u002Fming024\u002FFastSpeech2), [SyntaSpeech](https:\u002F\u002Fgithub.com\u002Fyerfor\u002FSyntaSpeech), [VITS](https:\u002F\u002Fgithub.com\u002Fjaywalnut310\u002Fvits) |  Yes (WIP)   |\n|       Style Transfer       |         [GenerSpeech](https:\u002F\u002Fgithub.com\u002FRongjiehuang\u002FGenerSpeech)         |  Yes   |\n|     Speech Recognition     |           [whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper), [Conformer](https:\u002F\u002Fgithub.com\u002Fsooftware\u002Fconformer)           |  Yes   |\n|     Speech Enhancement     |          [ConvTasNet]()         |  Yes (WIP)   |\n|     Speech Separation      |          [TF-GridNet](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.12433.pdf)         |  Yes (WIP)   |\n|     Speech Translation     |          [Multi-decoder](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.12804.pdf)      |  WIP   |\n|      Mono-to-Binaural      |          [NeuralWarp](https:\u002F\u002Fgithub.com\u002Ffdarmon\u002FNeuralWarp)         |  Yes   |\n\n### Sing\n\n|           Task            |   Supported Foundation Models   | Status |\n|:-------------------------:|:-------------------------------:|:------:|\n|       Text-to-Sing        |         [DiffSinger](https:\u002F\u002Fgithub.com\u002FMoonInTheRiver\u002FDiffSinger), [VISinger](https:\u002F\u002Fgithub.com\u002Fjerryuhoo\u002FVISinger)          |  Yes (WIP)   |\n\n### Audio\n|          Task          | Supported Foundation Models | Status |\n|:----------------------:|:---------------------------:|:------:|\n|     Text-to-Audio      |      [Make-An-Audio]()      |  Yes   |\n|    Audio Inpainting    |      [Make-An-Audio]()      |  Yes   |\n|     Image-to-Audio     |      [Make-An-Audio]()      |  Yes   |\n|    Sound Detection     |    [Audio-transformer](https:\u002F\u002Fgithub.com\u002FRetroCirce\u002FHTS-Audio-Transformer)    | Yes    |\n| Target Sound Detection |    [TSDNet](https:\u002F\u002Fgithub.com\u002Fgy65896\u002FTSDNet)    |  Yes   |\n|    Sound Extraction    |    [LASSNet](https:\u002F\u002Fgithub.com\u002Fliuxubo717\u002FLASS)    |  Yes   |\n\n\n### Talking Head\n\n|           Task            |   Supported Foundation Models   |   Status   |\n|:-------------------------:|:-------------------------------:|:----------:|\n|  Talking Head Synthesis   |          [GeneFace](https:\u002F\u002Fgithub.com\u002Fyerfor\u002FGeneFace)           | Yes (WIP)  |\n\n\n## Acknowledgement\nWe appreciate the open source of the following projects:\n\n[ESPNet](https:\u002F\u002Fgithub.com\u002Fespnet\u002Fespnet) &#8194;\n[NATSpeech](https:\u002F\u002Fgithub.com\u002FNATSpeech\u002FNATSpeech) &#8194;\n[Visual ChatGPT](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvisual-chatgpt) &#8194;\n[Hugging Face](https:\u002F\u002Fgithub.com\u002Fhuggingface) &#8194;\n[LangChain](https:\u002F\u002Fgithub.com\u002Fhwchase17\u002Flangchain) &#8194;\n[Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) &#8194;\n\n","AudioGPT 是一个用于理解和生成语音、音乐、声音以及说话头像的项目。它集成了多种预训练模型，支持文本到语音、风格转换、语音识别、音频修复等功能，并且能够处理从文本到音频、图像到音频等多种任务。技术上，AudioGPT 基于 Python 开发，利用了 FastSpeech2、VITS 等先进的语音合成模型和 Whisper 语音识别模型等。适用于需要高质量音频内容生成或处理的应用场景，如虚拟助手、媒体制作、教育工具等领域。",2,"2026-06-11 03:35:21","high_star"]