[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71032":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":39,"readmeContent":40,"aiSummary":41,"trendingCount":16,"starSnapshotCount":16,"syncStatus":18,"lastSyncTime":42,"discoverSource":43},71032,"EmotiVoice","netease-youdao\u002FEmotiVoice","netease-youdao","EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine","",null,"Python",8470,754,71,131,0,1,2,3,67.83,"Apache License 2.0",false,"main",true,[26,27,28,29,30,31,32,33,34,35,36,37,38],"ai","deep-learning","emotion","emotivoice","multi-speaker","prompt","python","pytorch","speech","speech-synthesis","style","text-to-speech","tts","2026-06-12 04:00:58","\u003Cdiv align=\"center\">\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F4833\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Ftrendshift.io\u002Fapi\u002Fbadge\u002Frepositories\u002F4833\" alt=\"netease-youdao%2FEmotiVoice | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n\u003Cfont size=4> README: EN | \u003Ca href=\".\u002FREADME.zh.md\">中文\u003C\u002Fa>  \u003C\u002Ffont>\n    \u003Ch1>EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine\u003C\u002Fh1>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n    \u003Ca href=\".\u002FREADME.zh.md\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FREADME-中文版本-red\">\u003C\u002Fa>\n    &nbsp;&nbsp;&nbsp;&nbsp;\n    \u003Ca href=\".\u002FLICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache--2.0-yellow\">\u003C\u002Fa>\n    &nbsp;&nbsp;&nbsp;&nbsp;\n    \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002FYDopensource\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ffollow-%40YDOpenSource-1DA1F2?logo=twitter&style={style}\">\u003C\u002Fa>\n    &nbsp;&nbsp;&nbsp;&nbsp;\n\u003C\u002Fdiv>\n\u003Cbr>\n\n**EmotiVoice** is a powerful and modern open-source text-to-speech engine that is available to you at no cost. EmotiVoice speaks both English and Chinese, and with over 2000 different voices (refer to the [List of Voices](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fwiki\u002F😊-voice-wiki-page) for details). The most prominent feature is **emotional synthesis**, allowing you to create speech with a wide range of emotions, including happy, excited, sad, angry and others.\n\nAn easy-to-use web interface is provided. There is also a scripting interface for batch generation of results. \n\nHere are a few samples that EmotiVoice generates:\n\n\n- [Chinese audio sample](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fassets\u002F3909232\u002F6426d7c1-d620-4bfc-ba03-cd7fc046a4fb)\n  \n- [English audio sample](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fassets\u002F3909232\u002F8f272eba-49db-493b-b479-2d9e5a419e26)\n  \n- [Fun Chinese English audio sample](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fassets\u002F3909232\u002Fa0709012-c3ef-4182-bb0e-b7a2ba386f1c)\n\n## Demo\n\nA demo is hosted on Replicate, [EmotiVoice](https:\u002F\u002Freplicate.com\u002Fbramhooimeijer\u002Femotivoice).\n\n## Hot News\n\n- [x] Tuning voice speed is now supported in 'OpenAI-compatible-TTS API', thanks to [@john9405](https:\u002F\u002Fgithub.com\u002Fjohn9405). [#90](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fpull\u002F90) [#67](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fissues\u002F67) [#77](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fissues\u002F77)\n\n- [x] [The EmotiVoice app for Mac](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Freleases\u002Fdownload\u002Fv0.3\u002Femotivoice-1.0.0-arm64.dmg) was released on December 28th, 2023. Just download and taste EmotiVoice's offerings!\n\n- [x] [The EmotiVoice HTTP API](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fwiki\u002FHTTP-API) was released on December 6th, 2023. Easier to start, faster to use, and with **over 13,000 free calls**. Additionally, users can explore more captivating voices provided by [Zhiyun](https:\u002F\u002Fai.youdao.com\u002F).\n- [x] [Voice Cloning with your personal data](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fwiki\u002FVoice-Cloning-with-your-personal-data) has been released on December 13th, 2023, along with [DataBaker Recipe](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Ftree\u002Fmain\u002Fdata\u002FDataBaker) and [LJSpeech Recipe](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Ftree\u002Fmain\u002Fdata\u002FLJspeech). \n\n## Features under development\n\n- [ ] Support for more languages, such as Japanese and Korean. [#19](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fissues\u002F19) [#22](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fissues\u002F22)\n\nEmotiVoice prioritizes community input and user requests. We welcome your feedback!\n\n## Quickstart\n\n### EmotiVoice Docker image\n\nThe easiest way to try EmotiVoice is by running the docker image. You need a machine with a NVidia GPU. If you have not done so, set up NVidia container toolkit by following the instructions for [Linux](https:\u002F\u002Fwww.server-world.info\u002Fen\u002Fnote?os=Ubuntu_22.04&p=nvidia&f=2) or [Windows WSL2](https:\u002F\u002Fgithub.com\u002Fnyp-sit\u002Fit3103\u002Fblob\u002Fmain\u002Fnvidia-docker-wsl2.md). Then EmotiVoice can be run with,\n\n```sh\ndocker run -dp 127.0.0.1:8501:8501 syq163\u002Femoti-voice:latest\n```\nThe Docker image was updated on January 4th, 2024. If you have an older version, please update it by running the following commands:\n```sh\ndocker pull syq163\u002Femoti-voice:latest\ndocker run -dp 127.0.0.1:8501:8501 -p 127.0.0.1:8000:8000 syq163\u002Femoti-voice:latest\n```\nNow open your browser and navigate to http:\u002F\u002Flocalhost:8501 to start using EmotiVoice's powerful TTS capabilities.\n\nStarting from this version, the 'OpenAI-compatible-TTS API' is now accessible via http:\u002F\u002Flocalhost:8000\u002F.\n\n### Full installation\n\n```sh\nconda create -n EmotiVoice python=3.8 -y\nconda activate EmotiVoice\npip install torch torchaudio\npip install numpy numba scipy transformers soundfile yacs g2p_en jieba pypinyin pypinyin_dict\npython -m nltk.downloader \"averaged_perceptron_tagger_eng\"\n```\n\n### Prepare model files\n\nWe recommend that users refer to the wiki page [How to download the pretrained model files](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fwiki\u002FPretrained-models) if they encounter any issues.\n\n```sh\ngit lfs install\ngit lfs clone https:\u002F\u002Fhuggingface.co\u002FWangZeJun\u002Fsimbert-base-chinese WangZeJun\u002Fsimbert-base-chinese\n```\nor, you can run:\n```sh\ngit clone https:\u002F\u002Fwww.modelscope.cn\u002Fsyq163\u002FWangZeJun.git\n```\n\n### Inference\n\n1. You can download the [pretrained models](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1y6Xwj_GG9ulsAonca_unSGbJ4lxbNymM?usp=sharing) by simply running the following command:\n```sh\ngit clone https:\u002F\u002Fwww.modelscope.cn\u002Fsyq163\u002Foutputs.git\n```\n2. The inference text format is `\u003Cspeaker>|\u003Cstyle_prompt\u002Femotion_prompt\u002Fcontent>|\u003Cphoneme>|\u003Ccontent>`. \n\n  - inference text example: `8051|Happy|\u003Csos\u002Feos> [IH0] [M] [AA1] [T] engsp4 [V] [OY1] [S] engsp4 [AH0] engsp1 [M] [AH1] [L] [T] [IY0] engsp4 [V] [OY1] [S] engsp1 [AE1] [N] [D] engsp1 [P] [R] [AA1] [M] [P] [T] engsp4 [K] [AH0] [N] [T] [R] [OW1] [L] [D] engsp1 [T] [IY1] engsp4 [T] [IY1] engsp4 [EH1] [S] engsp1 [EH1] [N] [JH] [AH0] [N] . \u003Csos\u002Feos>|Emoti-Voice - a Multi-Voice and Prompt-Controlled T-T-S Engine`.\n4. You can get phonemes by `python frontend.py data\u002Fmy_text.txt > data\u002Fmy_text_for_tts.txt`.\n\n5. Then run:\n```sh\nTEXT=data\u002Finference\u002Ftext\npython inference_am_vocoder_joint.py \\\n--logdir prompt_tts_open_source_joint \\\n--config_folder config\u002Fjoint \\\n--checkpoint g_00140000 \\\n--test_file $TEXT\n```\nthe synthesized speech is under `outputs\u002Fprompt_tts_open_source_joint\u002Ftest_audio`.\n\n1. Or if you just want to use the interactive TTS demo page, run:\n```sh\npip install streamlit\nstreamlit run demo_page.py\n```\n\n### OpenAI-compatible-TTS API\n\nThanks to @lewangdev for adding an OpenAI compatible API [#60](..\u002F..\u002Fissues\u002F60). To set it up, use the following command:\n\n```sh\npip install fastapi pydub uvicorn[standard] pyrubberband\nuvicorn openaiapi:app --reload\n```\n\n### Wiki page\n\nYou may find more information from our [wiki](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fwiki) page.\n\n## Training\n\n[Voice Cloning with your personal data](https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fwiki\u002FVoice-Cloning-with-your-personal-data) has been released on December 13th, 2023.\n\n\n## Roadmap & Future work\n\n- Our future plan can be found in the [ROADMAP](.\u002FROADMAP.md) file.\n- The current implementation focuses on emotion\u002Fstyle control by prompts. It uses only pitch, speed, energy, and emotion as style factors, and does not use gender. But it is not complicated to change it to style\u002Ftimbre control.\n- Suggestions are welcome. You can file issues or [@ydopensource](https:\u002F\u002Ftwitter.com\u002FYDopensource) on twitter.\n\n\n## WeChat group\nWelcome to scan the QR code below and join the WeChat group.\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fnetease-youdao\u002FEmotiVoice\u002Fassets\u002F49354974\u002Fcc3f4c8b-8369-4e50-89cc-e40d27a6bdeb\" alt=\"qr\" width=\"150\"\u002F>\n\n## Credits\n\n- [PromptTTS](https:\u002F\u002Fspeechresearch.github.io\u002Fprompttts\u002F). The PromptTTS paper is a key basis of this project.\n- [LibriTTS](https:\u002F\u002Fwww.openslr.org\u002F60\u002F). The LibriTTS dataset is used in training of EmotiVoice.\n- [HiFiTTS](https:\u002F\u002Fwww.openslr.org\u002F109\u002F). The HiFi TTS dataset is used in training of EmotiVoice.\n- [ESPnet](https:\u002F\u002Fgithub.com\u002Fespnet\u002Fespnet). \n- [WeTTS](https:\u002F\u002Fgithub.com\u002Fwenet-e2e\u002Fwetts)\n- [HiFi-GAN](https:\u002F\u002Fgithub.com\u002Fjik876\u002Fhifi-gan)\n- [Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)\n- [tacotron](https:\u002F\u002Fgithub.com\u002Fkeithito\u002Ftacotron)\n- [KAN-TTS](https:\u002F\u002Fgithub.com\u002Falibaba-damo-academy\u002FKAN-TTS)\n- [StyleTTS](https:\u002F\u002Fgithub.com\u002Fyl4579\u002FStyleTTS)\n- [Simbert](https:\u002F\u002Fgithub.com\u002FZhuiyiTechnology\u002Fsimbert)\n- [cn2an](https:\u002F\u002Fgithub.com\u002FAilln\u002Fcn2an). EmotiVoice incorporates cn2an for number processing.\n\n## License\n\nEmotiVoice is provided under the Apache-2.0 License - see the [LICENSE](.\u002FLICENSE) file for details.\n\nThe interactive page is provided under the [User Agreement](.\u002FEmotiVoice_UserAgreement_易魔声用户协议.pdf) file.\n","EmotiVoice是一款多声音和提示控制的文本转语音引擎，支持超过2000种不同声音。其核心功能包括情感合成，能够生成带有多种情绪（如快乐、兴奋、悲伤、愤怒等）的语音。该项目使用Python编写，并基于PyTorch框架实现深度学习模型。EmotiVoice提供了易于使用的Web界面以及适用于批量处理的脚本接口，非常适合需要高质量、多样化语音输出的应用场景，例如有声读物制作、虚拟助手开发或游戏配音等。此外，它还支持通过HTTP API调用，并为用户提供一定数量的免费调用额度。","2026-06-11 03:35:34","high_star"]