[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72751":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},72751,"tortoise-tts","neonbjb\u002Ftortoise-tts","neonbjb","A multi-voice TTS system trained with an emphasis on quality",null,"Jupyter Notebook",14857,2046,178,324,0,3,5,12,9,79.63,"Apache License 2.0",false,"main",[],"2026-06-12 04:01:07","# TorToiSe\n\nTortoise is a text-to-speech program built with the following priorities:\n\n1. Strong multi-voice capabilities.\n2. Highly realistic prosody and intonation.\n   \nThis repo contains all the code needed to run Tortoise TTS in inference mode.\n\nManuscript: https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.07243\n## Hugging Face space\n\nA live demo is hosted on Hugging Face Spaces. If you'd like to avoid a queue, please duplicate the Space and add a GPU. Please note that CPU-only spaces do not work for this demo.\n\nhttps:\u002F\u002Fhuggingface.co\u002Fspaces\u002FManmay\u002Ftortoise-tts\n\n## Install via pip\n```bash\npip install tortoise-tts\n```\n\nIf you would like to install the latest development version, you can also install it directly from the git repository:\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts\n```\n\n## What's in a name?\n\nI'm naming my speech-related repos after Mojave desert flora and fauna. Tortoise is a bit tongue in cheek: this model\nis insanely slow. It leverages both an autoregressive decoder **and** a diffusion decoder; both known for their low\nsampling rates. On a K80, expect to generate a medium sized sentence every 2 minutes.\n\nwell..... not so slow anymore now we can get a **0.25-0.3 RTF** on 4GB vram and with streaming we can get \u003C **500 ms** latency !!! \n\n## Demos\n\nSee [this page](http:\u002F\u002Fnonint.com\u002Fstatic\u002Ftortoise_v2_examples.html) for a large list of example outputs.\n\nA cool application of Tortoise + GPT-3 (not affiliated with this repository): https:\u002F\u002Ftwitter.com\u002Flexman_ai. Unfortunately, this project seems no longer to be active.\n\n## Usage guide\n\n### Local installation\n\nIf you want to use this on your own computer, you must have an NVIDIA GPU.\n\n> [!TIP]\n> On Windows, I **highly** recommend using the Conda installation method. I have been told that if you do not do this, you will spend a lot of time chasing dependency problems.\n\nFirst, install miniconda: https:\u002F\u002Fdocs.conda.io\u002Fen\u002Flatest\u002Fminiconda.html\n\nThen run the following commands, using anaconda prompt as the terminal (or any other terminal configured to work with conda)\n\nThis will:\n1. create conda environment with minimal dependencies specified\n1. activate the environment\n1. install pytorch with the command provided here: https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F\n1. clone tortoise-tts\n1. change the current directory to tortoise-tts\n1. run tortoise python setup install script\n\n```shell\nconda create --name tortoise python=3.9 numba inflect\nconda activate tortoise\nconda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia\nconda install transformers=4.29.2\ngit clone https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts.git\ncd tortoise-tts\npython setup.py install\n```\n\nOptionally, pytorch can be installed in the base environment, so that other conda environments can use it too. To do this, simply send the `conda install pytorch...` line before activating the tortoise environment.\n\n> [!NOTE]  \n> When you want to use tortoise-tts, you will always have to ensure the `tortoise` conda environment is activated.\n\nIf you are on windows, you may also need to install pysoundfile: `conda install -c conda-forge pysoundfile`\n\n### Docker\n\nAn easy way to hit the ground running and a good jumping off point depending on your use case.\n\n```sh\ngit clone https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts.git\ncd tortoise-tts\n\ndocker build . -t tts\n\ndocker run --gpus all \\\n    -e TORTOISE_MODELS_DIR=\u002Fmodels \\\n    -v \u002Fmnt\u002Fuser\u002Fdata\u002Ftortoise_tts\u002Fmodels:\u002Fmodels \\\n    -v \u002Fmnt\u002Fuser\u002Fdata\u002Ftortoise_tts\u002Fresults:\u002Fresults \\\n    -v \u002Fmnt\u002Fuser\u002Fdata\u002F.cache\u002Fhuggingface:\u002Froot\u002F.cache\u002Fhuggingface \\\n    -v \u002Froot:\u002Fwork \\\n    -it tts\n```\nThis gives you an interactive terminal in an environment that's ready to do some tts. Now you can explore the different interfaces that tortoise exposes for tts.\n\nFor example:\n\n```sh\ncd app\nconda activate tortoise\ntime python tortoise\u002Fdo_tts.py \\\n    --output_path \u002Fresults \\\n    --preset ultra_fast \\\n    --voice geralt \\\n    --text \"Time flies like an arrow; fruit flies like a bananna.\"\n```\n\n## Apple Silicon\n\nOn macOS 13+ with M1\u002FM2 chips you need to install the nighly version of PyTorch, as stated in the official page you can do:\n\n```shell\npip3 install --pre torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcpu\n```\n\nBe sure to do that after you activate the environment. If you don't use conda the commands would look like this:\n\n```shell\npython3.10 -m venv .venv\nsource .venv\u002Fbin\u002Factivate\npip install numba inflect psutil\npip install --pre torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcpu\npip install transformers\ngit clone https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts.git\ncd tortoise-tts\npip install .\n```\n\nBe aware that DeepSpeed is disabled on Apple Silicon since it does not work. The flag `--use_deepspeed` is ignored.\nYou may need to prepend `PYTORCH_ENABLE_MPS_FALLBACK=1` to the commands below to make them work since MPS does not support all the operations in Pytorch.\n\n\n### do_tts.py\n\nThis script allows you to speak a single phrase with one or more voices.\n```shell\npython tortoise\u002Fdo_tts.py --text \"I'm going to speak this\" --voice random --preset fast\n```\n### do socket streaming\n```socket server\npython tortoise\u002Fsocket_server.py \n```\nwill listen at port 5000\n\n\n### faster inference read.py\n\nThis script provides tools for reading large amounts of text.\n\n```shell\npython tortoise\u002Fread_fast.py --textfile \u003Cyour text to be read> --voice random\n```\n\n### read.py\n\nThis script provides tools for reading large amounts of text.\n\n```shell\npython tortoise\u002Fread.py --textfile \u003Cyour text to be read> --voice random\n```\n\nThis will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series\nof spoken clips as they are generated. Once all the clips are generated, it will combine them into a single file and\noutput that as well.\n\nSometimes Tortoise screws up an output. You can re-generate any bad clips by re-running `read.py` with the --regenerate\nargument.\n\n### API\n\nTortoise can be used programmatically, like so:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech()\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\n\nTo use deepspeed:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech(use_deepspeed=True)\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\n\nTo use kv cache:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech(kv_cache=True)\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\n\nTo run model in float16:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech(half=True)\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\nfor Faster runs use all three:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech(use_deepspeed=True, kv_cache=True, half=True)\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\n\n## Acknowledgements\n\nThis project has garnered more praise than I expected. I am standing on the shoulders of giants, though, and I want to\ncredit a few of the amazing folks in the community that have helped make this happen:\n\n- Hugging Face, who wrote the GPT model and the generate API used by Tortoise, and who hosts the model weights.\n- [Ramesh et al](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2102.12092.pdf) who authored the DALLE paper, which is the inspiration behind Tortoise.\n- [Nichol and Dhariwal](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2102.09672.pdf) who authored the (revision of) the code that drives the diffusion model.\n- [Jang et al](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.07889.pdf) who developed and open-sourced univnet, the vocoder this repo uses.\n- [Kim and Jung](https:\u002F\u002Fgithub.com\u002Fmindslab-ai\u002Funivnet) who implemented univnet pytorch model.\n- [lucidrains](https:\u002F\u002Fgithub.com\u002Flucidrains) who writes awesome open source pytorch models, many of which are used here.\n- [Patrick von Platen](https:\u002F\u002Fhuggingface.co\u002Fpatrickvonplaten) whose guides on setting up wav2vec were invaluable to building my dataset.\n\n## Notice\n\nTortoise was built entirely by the author (James Betker) using their own hardware. Their employer was not involved in any facet of Tortoise's development.\n\n## License\n\nTortoise TTS is licensed under the Apache 2.0 license.\n\nIf you use this repo or the ideas therein for your research, please cite it! A bibtex entree can be found in the right pane on GitHub.\n","Tortoise TTS 是一个注重语音质量和多样性的多声音文本转语音系统。该项目的核心功能包括强大的多声音支持和高度逼真的韵律与语调，通过结合自回归解码器和扩散解码器来实现高质量的语音合成。尽管早期版本运行速度较慢，但最新版本在4GB显存条件下可达到0.25-0.3实时因子，并且支持流式处理以降低延迟至500毫秒以下。Tortoise TTS 适用于需要高质量、自然流畅的人工智能语音的应用场景，如虚拟助手、有声读物制作或任何要求高真实度语音输出的情况。安装与使用需NVIDIA GPU支持，推荐使用Conda环境进行配置以简化依赖管理。",2,"2026-06-11 03:43:29","high_star"]