[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72304":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},72304,"SoulX-Podcast","Soul-AILab\u002FSoulX-Podcast","Soul-AILab","SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.",null,"Python",3430,441,16,38,0,14,36,96,42,107.54,"Apache License 2.0",false,"main",[],"2026-06-12 04:01:04","\u003Cdiv align=\"center\">\n    \u003Ch1>\n    SoulX-Podcast\n    \u003C\u002Fh1>\n    \u003Cp>\n    Official inference code for \u003Cbr>\n    \u003Cb>\u003Cem>SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity\u003C\u002Fem>\u003C\u002Fb>\n    \u003C\u002Fp>\n    \u003Cp>\n    \u003C!-- \u003Cimg src=\"assets\u002FXiaoHongShu_Logo.png\" alt=\"Institution 4\" style=\"width: 102px; height: 48px;\"> -->\n    \u003Cimg src=\"assets\u002FSoulX-Podcast-log.jpg\" alt=\"SoulX-Podcast_Logo\" style=\"width: 200px; height: 68px;\">\n    \u003C\u002Fp>\n    \u003Cp>\n    \u003C\u002Fp>\n    \u003Ca href=\"https:\u002F\u002Fsoul-ailab.github.io\u002Fsoulx-podcast\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-Page-lightgrey\" alt=\"version\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FSoul-AILab\u002Fsoulx-podcast\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-blue' alt=\"HF-model\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.23541\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FReport-Github?label=Technical&color=red' alt=\"technical report\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FSoul-AILab\u002Fspaces\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Demo-blue' alt=\"HF-demo\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FSoul-AILab\u002FSoulX-Podcast\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-blue.svg\" alt=\"Apache-2.0\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\n\u003Cp align=\"center\">\n   \u003Ch1>SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity\u003C\u002Fh1>\n\u003Cp>\n\n##  Overview\nSoulX-Podcast is designed for podcast-style multi-turn, multi-speaker dialogic speech generation, while also achieving superior performance in the conventional monologue TTS task.\n\nTo meet the higher naturalness demands of multi-turn spoken dialogue, SoulX-Podcast integrates a range of paralinguistic controls and supports both Mandarin and English, as well as several Chinese dialects, including Sichuanese, Henanese, and Cantonese, enabling more personalized podcast-style speech generation.\n\n\n## Key Features 🔥\n\n- **Long-form, multi-turn, multi-speaker dialogic speech generation**: SoulX-Podcast excels in generating high-quality, natural-sounding dialogic speech for multi-turn, multi-speaker scenarios.\n\n- **Cross-dialectal, zero-shot voice cloning**: SoulX-Podcast supports zero-shot voice cloning across different Chinese dialects, enabling the generation of high-quality, personalized speech in any of the supported dialects.\n\n- **Paralinguistic controls**: SoulX-Podcast supports a variety of paralinguistic events, as as ***laugher*** and ***sighs*** to enhance the realism of synthesized results.\n- **Paralinguistic tags**: \u003C|laughter|>, \u003C|sigh|>, \u003C|breathing|>, \u003C|coughing|>, \u003C|throat_clearing|> .\n\n\u003Ctable align=\"center\">\n  \u003Ctr>\n    \u003Ctd align=\"center\">\u003Cbr>\u003Cimg src=\"assets\u002Fperformance_radar.png\" width=\"80%\" \u002F>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n## Demo Examples\n\n**Zero-Shot Podcast Generation**\n\u003Cdiv align=\"center\">\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa9d3da2a-aaff-49d0-a3c7-2bd3c0b6d5eb>\n\n\u003C\u002Fdiv>\n\n\n**Cross-Dialectal Zero-Shot Podcast Generation**\n\n🎙️ All prompt audio samples used in the following generations are in Mandarin.\n\n🎙️ 以下音频生成采用的参考音频全部为普通话。\n\n\u003Cdiv align=\"center\">\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F982d799b-9f91-40a3-ab64-9e165166f788>\n\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fd0a59d7b-27c9-4b47-8242-f7630814c1e9>\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa53ff35c-1e2b-42d9-9ef4-279164574646>\n\n\u003C\u002Fdiv>\n\nFor more examples, see [demo page](https:\u002F\u002Fsoul-ailab.github.io\u002Fsoulx-podcast\u002F).\n\n\n## 🚀 News\n- **[2025-11-03]** Support vllm with docker.\n- **[2025-10-31]** Deploy an online demo on [Hugging Face Spaces](https:\u002F\u002Fhuggingface.co\u002FSoul-AILab\u002Fspaces).\n\n- **[2025-10-30]** Add example scripts for monologue TTS and support a WebUI for easy inference.\n\n- **[2025-10-29]** We are excited to announce that the latest SoulX-Podcast checkpoint is now available on Hugging Face! You can access it directly from [SoulX-Podcast-hugging-face](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FSoul-AILab\u002Fsoulx-podcast).\n\n- **[2025-10-28]** Our paper on this project has been published! You can read it here: [SoulX-Podcast](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.23541).\n\n## Install\n\n### Clone and Install\nHere are instructions for installing on Linux.\n- Clone the repo\n```\ngit clone git@github.com:Soul-AILab\u002FSoulX-Podcast.git\ncd SoulX-Podcast\n```\n- Install Conda: please see https:\u002F\u002Fdocs.conda.io\u002Fen\u002Flatest\u002Fminiconda.html\n- Create Conda env:\n```\nconda create -n soulxpodcast -y python=3.11\nconda activate soulxpodcast\npip install -r requirements.txt\n# If you are in mainland China, you can set the mirror as follows:\npip install -r requirements.txt -i https:\u002F\u002Fmirrors.aliyun.com\u002Fpypi\u002Fsimple\u002F --trusted-host=mirrors.aliyun.com\n```\n- [Optional] VLLM accleration(Modified version from vllm 0.10.1)\n```\ncd runtime\u002Fvllm\ndocker build -t soulxpodcast:v1.0 .\n# Mounts the host directory at LOCAL_RESOURCE_PATH to CONTAINER_RESOURCE_PATH in the container, enabling file sharing between the host system and container. To access the web application, add -p LOCAL_PORT:CONTAINER_PORT\n# example: docker run -it --runtime=nvidia  --name soulxpodcast  -v \u002Fmnt\u002Fdata:\u002Fmnt\u002Fdata -p 7860:7860 soulxpodcast:v1.0\ndocker run -it --runtime=nvidia  --name soulxpodcast  -v LOCAL_RESOURCE_PATH:CONTAINER_RESOURCE_PATH soulxpodcast:v1.0\n```\n\n### Model Download\n\n```sh\npip install -U huggingface_hub\n\n# base model\nhuggingface-cli download --resume-download Soul-AILab\u002FSoulX-Podcast-1.7B --local-dir pretrained_models\u002FSoulX-Podcast-1.7B\n\n# dialectal model\nhuggingface-cli download --resume-download Soul-AILab\u002FSoulX-Podcast-1.7B-dialect --local-dir pretrained_models\u002FSoulX-Podcast-1.7B-dialect\n```\n\n\nDownload via python:\n```python\nfrom huggingface_hub import snapshot_download\n\n# base model\nsnapshot_download(\"Soul-AILab\u002FSoulX-Podcast-1.7B\", local_dir=\"pretrained_models\u002FSoulX-Podcast-1.7B\") \n\n# dialectal model\nsnapshot_download(\"Soul-AILab\u002FSoulX-Podcast-1.7B-dialect\", local_dir=\"pretrained_models\u002FSoulX-Podcast-1.7B-dialect\") \n\n```\n\nDownload via git clone:\n```sh\nmkdir -p pretrained_models\n\n# Make sure you have git-lfs installed (https:\u002F\u002Fgit-lfs.com)\ngit lfs install\n\n# base model\ngit clone https:\u002F\u002Fhuggingface.co\u002FSoul-AILab\u002FSoulX-Podcast-1.7B pretrained_models\u002FSoulX-Podcast-1.7B\n\n# dialectal model\ngit clone https:\u002F\u002Fhuggingface.co\u002FSoul-AILab\u002FSoulX-Podcast-1.7B-dialect pretrained_models\u002FSoulX-Podcast-1.7B-dialect\n```\n\n\n### Basic Usage\n\nYou can simply run the demo with the following commands:\n``` sh\n# dialectal inference\nbash example\u002Finfer_dialogue.sh\n```\n\n### WebUI\n\nYou can simply run the webui with the following commands:\n``` sh\n# Base Model:\npython3 webui.py --model_path pretrained_models\u002FSoulX-Podcast-1.7B\n\n# If you want to experience dialect podcast generation, use the dialectal model:\npython3 webui.py --model_path pretrained_models\u002FSoulX-Podcast-1.7B-dialect\n\n\n```\n\n\n## TODOs\n- [x] Add example scripts for monologue TTS.\n- [x] Publish the [technical report](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.23541).\n- [x] Develop a WebUI for easy inference.\n- [x] Deploy an online demo on [Hugging Face Spaces](https:\u002F\u002Fhuggingface.co\u002FSoul-AILab\u002Fspaces).\n- [x] Dockerize the project with vLLM support.\n- [ ] Add support for streaming inference.\n\n## Citation\n\n```bibtex\n@misc{SoulXPodcast,\n  title        = {SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity},\n  author       = {Hanke Xie and Haopeng Lin and Wenxiao Cao and Dake Guo and Wenjie Tian and Jun Wu and Hanlin Wen and Ruixuan Shang and Hongmei Liu and Zhiqi Jiang and Yuepeng Jiang and Wenxi Chen and Ruiqi Yan and Jiale Qian and Yichao Yan and Shunshun Yin and Ming Tao and Xie Chen and Lei Xie and Xinsheng Wang},\n  year         = {2025},\n  archivePrefix={arXiv},\n  url          = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.23541}\n}\n\n```\n\n## License\n\nWe use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Podcast. Check the license at [LICENSE](LICENSE) for more details.\n\n\n## Acknowledge\n- This repo benefits from [FlashCosyVoice](https:\u002F\u002Fgithub.com\u002Fxingchensong\u002FFlashCosyVoice\u002Ftree\u002Fmain)\n\n\n##  Usage Disclaimer\nThis project provides a speech synthesis model for podcast generation capable of zero-shot voice cloning, intended for academic research, educational purposes, and legitimate applications, such as personalized speech synthesis, assistive technologies, and linguistic research.\n\nPlease note:\n\nDo not use this model for unauthorized voice cloning, impersonation, fraud, scams, deepfakes, or any illegal activities.\n\nEnsure compliance with local laws and regulations when using this model and uphold ethical standards.\n\nThe developers assume no liability for any misuse of this model.\n\nWe advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles in AI research and applications. If you have any concerns regarding ethics or misuse, please contact us.\n\n## Contact Us\nIf you are interested in leaving a message to our work, feel free to email hkxie@mail.nwpu.edu.cn or linhaopeng@soulapp.cn or lxie@nwpu.edu.cn or wangxinsheng@soulapp.cn\n\nYou’re welcome to join our WeChat group for technical discussions, updates.\n\u003Cp align=\"center\">\n  \u003C!-- \u003Cem>Due to group limits, if you can't scan the QR code, please add my WeChat for group access  -->\n      \u003C!-- : \u003Cstrong>Tiamo James\u003C\u002Fstrong>\u003C\u002Fem> -->\n  \u003Cbr>\n  \u003Cspan style=\"display: inline-block; margin-right: 10px;\">\n    \u003Cimg src=\"assets\u002Fwechat-5.jpg\" width=\"300\" alt=\"WeChat Group QR Code\"\u002F>\n  \u003C\u002Fspan>\n  \u003C!-- \u003Cspan style=\"display: inline-block;\">\n    \u003Cimg src=\"assets\u002Fwechat_tiamo.jpg\" width=\"300\" alt=\"WeChat QR Code\"\u002F>\n  \u003C\u002Fspan> -->\n\u003C\u002Fp>\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=Soul-AILab\u002FSoulX-Podcast&type=date&legend=top-left)](https:\u002F\u002Fwww.star-history.com\u002F#Soul-AILab\u002FSoulX-Podcast&type=date&legend=top-left)\n","SoulX-Podcast 是由 Soul AI 团队开发的一个用于从文本生成高保真播客的推理代码库。该项目支持长篇、多轮次、多说话人的对话式语音合成，并具备跨方言零样本语音克隆能力，能够处理普通话、英语及多种中国方言如四川话、河南话和粤语。此外，SoulX-Podcast 还集成了多种副语言控制功能，例如笑声、叹气等，以增强合成语音的真实感。此项目适用于需要高质量自然对话或个性化语音内容生成的应用场景，比如在线教育、虚拟助手以及娱乐媒体领域。",2,"2026-06-11 03:41:18","high_star"]