[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71915":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},71915,"F5-TTS","SWivid\u002FF5-TTS","SWivid","Official code for \"F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching\"","https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.06885",null,"Python",14712,2151,130,45,0,27,69,216,81,120,"MIT License",false,"main",[],"2026-06-12 04:01:02","# F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching\n\n[![python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10-brightgreen)](https:\u002F\u002Fgithub.com\u002FSWivid\u002FF5-TTS)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2410.06885-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.06885)\n[![demo](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-Demo-orange.svg)](https:\u002F\u002Fswivid.github.io\u002FF5-TTS\u002F)\n[![hfspace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HF%20Space-yellow)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmrfakename\u002FE2-F5-TTS)\n[![msspace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤖-MS%20Space-blue)](https:\u002F\u002Fmodelscope.cn\u002Fstudios\u002FAI-ModelScope\u002FE2-F5-TTS)\n[![lab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🏫-X--LANCE-grey?labelColor=lightgrey)](https:\u002F\u002Fx-lance.sjtu.edu.cn\u002F)\n[![lab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🏫-SII-grey?labelColor=lightgrey)](https:\u002F\u002Fwww.sii.edu.cn\u002F)\n[![lab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🏫-PCL-grey?labelColor=lightgrey)](https:\u002F\u002Fwww.pcl.ac.cn)\n\u003C!-- \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F12d7749c-071a-427c-81bf-b87b91def670\" alt=\"Watermark\" style=\"width: 40px; height: auto\"> -->\n\n**F5-TTS**: Diffusion Transformer with ConvNeXt V2, faster trained and inference.\n\n**E2 TTS**: Flat-UNet Transformer, closest reproduction from [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.18009).\n\n**Sway Sampling**: Inference-time flow step sampling strategy, greatly improves performance\n\n### Thanks to all the contributors !\n\n## News\n- **2025\u002F03\u002F12**: 🔥 F5-TTS v1 base model with better training and inference performance. [Few demo](https:\u002F\u002Fswivid.github.io\u002FF5-TTS_updates).\n- **2024\u002F10\u002F08**: F5-TTS & E2 TTS base models on [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002FSWivid\u002FF5-TTS), [🤖 Model Scope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FSWivid\u002FF5-TTS_Emilia-ZH-EN), [🟣 Wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002FSJTU_X-LANCE\u002FF5-TTS_Emilia-ZH-EN).\n\n## Installation\n\n### Create a separate environment if needed\n\n```bash\n# Create a conda env with python_version>=3.10  (you could also use virtualenv)\nconda create -n f5-tts python=3.11\nconda activate f5-tts\n\n# Install FFmpeg if you haven't yet\nconda install ffmpeg\n```\n\n### Install PyTorch with matched device\n\n\u003Cdetails>\n\u003Csummary>NVIDIA GPU\u003C\u002Fsummary>\n\n> ```bash\n> # Install pytorch with your CUDA version, e.g.\n> pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n> \n> # And also possible previous versions, e.g.\n> pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n> # etc.\n> ```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>AMD GPU\u003C\u002Fsummary>\n\n> ```bash\n> # Install pytorch with your ROCm version (Linux only), e.g.\n> pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Frocm6.2\n> ```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Intel GPU\u003C\u002Fsummary>\n\n> ```bash\n> # Install pytorch with your XPU version, e.g.\n> # Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit must be installed\n> pip install torch torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Ftest\u002Fxpu\n> \n> # Intel GPU support is also available through IPEX (Intel® Extension for PyTorch)\n> # IPEX does not require the Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit\n> # See: https:\u002F\u002Fpytorch-extension.intel.com\u002Finstallation?request=platform\n> ```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Apple Silicon\u003C\u002Fsummary>\n\n> ```bash\n> # Install the stable pytorch, e.g.\n> pip install torch torchaudio\n> ```\n\n\u003C\u002Fdetails>\n\n### Then you can choose one from below:\n\n> ### 1. As a pip package (if just for inference)\n> \n> ```bash\n> pip install f5-tts\n> ```\n> \n> ### 2. Local editable (if also do training, finetuning)\n> \n> ```bash\n> git clone https:\u002F\u002Fgithub.com\u002FSWivid\u002FF5-TTS.git\n> cd F5-TTS\n> # git submodule update --init --recursive  # (optional, if use bigvgan as vocoder)\n> pip install -e .\n> ```\n\n### Docker usage also available\n```bash\n# Build from Dockerfile\ndocker build -t f5tts:v1 .\n\n# Run from GitHub Container Registry\ndocker container run --rm -it --gpus=all --mount 'type=volume,source=f5-tts,target=\u002Froot\u002F.cache\u002Fhuggingface\u002Fhub\u002F' -p 7860:7860 ghcr.io\u002Fswivid\u002Ff5-tts:main\n\n# Quickstart if you want to just run the web interface (not CLI)\ndocker container run --rm -it --gpus=all --mount 'type=volume,source=f5-tts,target=\u002Froot\u002F.cache\u002Fhuggingface\u002Fhub\u002F' -p 7860:7860 ghcr.io\u002Fswivid\u002Ff5-tts:main f5-tts_infer-gradio --host 0.0.0.0\n```\n\n### Runtime\n\nDeployment solution with Triton and TensorRT-LLM.\n\n#### Benchmark Results\nDecoding on a single L20 GPU, using 26 different prompt_audio & target_text pairs, 16 NFE.\n\n| Model               | Concurrency    | Avg Latency | RTF    | Mode            |\n|---------------------|----------------|-------------|--------|-----------------|\n| F5-TTS Base (Vocos) | 2              | 253 ms      | 0.0394 | Client-Server   |\n| F5-TTS Base (Vocos) | 1 (Batch_size) | -           | 0.0402 | Offline TRT-LLM |\n| F5-TTS Base (Vocos) | 1 (Batch_size) | -           | 0.1467 | Offline Pytorch |\n\nSee [detailed instructions](src\u002Ff5_tts\u002Fruntime\u002Ftriton_trtllm\u002FREADME.md) for more information.\n\n\n## Inference\n\n- In order to achieve desired performance, take a moment to read [detailed guidance](src\u002Ff5_tts\u002Finfer).\n- By properly searching the keywords of problem encountered, [issues](https:\u002F\u002Fgithub.com\u002FSWivid\u002FF5-TTS\u002Fissues?q=is%3Aissue) are very helpful.\n\n### 1. Gradio App\n\nCurrently supported features:\n\n- Basic TTS with Chunk Inference\n- Multi-Style \u002F Multi-Speaker Generation\n- Voice Chat powered by Qwen2.5-3B-Instruct\n- [Custom inference with more language support](src\u002Ff5_tts\u002Finfer\u002FSHARED.md)\n\n```bash\n# Launch a Gradio app (web interface)\nf5-tts_infer-gradio\n\n# Specify the port\u002Fhost\nf5-tts_infer-gradio --port 7860 --host 0.0.0.0\n\n# Launch a share link\nf5-tts_infer-gradio --share\n```\n\n\u003Cdetails>\n\u003Csummary>NVIDIA device docker compose file example\u003C\u002Fsummary>\n\n```yaml\nservices:\n  f5-tts:\n    image: ghcr.io\u002Fswivid\u002Ff5-tts:main\n    ports:\n      - \"7860:7860\"\n    environment:\n      GRADIO_SERVER_PORT: 7860\n    entrypoint: [\"f5-tts_infer-gradio\", \"--port\", \"7860\", \"--host\", \"0.0.0.0\"]\n    deploy:\n      resources:\n        reservations:\n          devices:\n            - driver: nvidia\n              count: 1\n              capabilities: [gpu]\n\nvolumes:\n  f5-tts:\n    driver: local\n```\n\n\u003C\u002Fdetails>\n\n### 2. CLI Inference\n\n```bash\n# Run with flags\n# Leave --ref_text \"\" will have ASR model transcribe (extra GPU memory usage)\nf5-tts_infer-cli --model F5TTS_v1_Base \\\n--ref_audio \"provide_prompt_wav_path_here.wav\" \\\n--ref_text \"The content, subtitle or transcription of reference audio.\" \\\n--gen_text \"Some text you want TTS model generate for you.\"\n\n# Run with default setting. src\u002Ff5_tts\u002Finfer\u002Fexamples\u002Fbasic\u002Fbasic.toml\nf5-tts_infer-cli\n# Or with your own .toml file\nf5-tts_infer-cli -c custom.toml\n\n# Multi voice. See src\u002Ff5_tts\u002Finfer\u002FREADME.md\nf5-tts_infer-cli -c src\u002Ff5_tts\u002Finfer\u002Fexamples\u002Fmulti\u002Fstory.toml\n```\n\n\n## Training\n\n### 1. With Hugging Face Accelerate\n\nRefer to [training & finetuning guidance](src\u002Ff5_tts\u002Ftrain) for best practice.\n\n### 2. With Gradio App\n\n```bash\n# Quick start with Gradio web interface\nf5-tts_finetune-gradio\n```\n\nRead [training & finetuning guidance](src\u002Ff5_tts\u002Ftrain) for more instructions.\n\n\n## [Evaluation](src\u002Ff5_tts\u002Feval)\n\n\n## Development\n\nUse pre-commit to ensure code quality (will run linters and formatters automatically):\n\n```bash\npip install pre-commit\npre-commit install\n```\n\nWhen making a pull request, before each commit, run: \n\n```bash\npre-commit run --all-files\n```\n\nNote: Some model components have linting exceptions for E722 to accommodate tensor notation.\n\n\n## Acknowledgements\n\n- [E2-TTS](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.18009) brilliant work, simple and effective\n- [Emilia](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.05361), [WenetSpeech4TTS](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.05763), [LibriTTS](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.02882), [LJSpeech](https:\u002F\u002Fkeithito.com\u002FLJ-Speech-Dataset\u002F) valuable datasets\n- [lucidrains](https:\u002F\u002Fgithub.com\u002Flucidrains) initial CFM structure with also [bfs18](https:\u002F\u002Fgithub.com\u002Fbfs18) for discussion\n- [SD3](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.03206) & [Hugging Face diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) DiT and MMDiT code structure\n- [torchdiffeq](https:\u002F\u002Fgithub.com\u002Frtqichen\u002Ftorchdiffeq) as ODE solver, [Vocos](https:\u002F\u002Fhuggingface.co\u002Fcharactr\u002Fvocos-mel-24khz) and [BigVGAN](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FBigVGAN) as vocoder\n- [FunASR](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FFunASR), [faster-whisper](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper), [UniSpeech](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FUniSpeech), [SpeechMOS](https:\u002F\u002Fgithub.com\u002Ftarepan\u002FSpeechMOS) for evaluation tools\n- [ctc-forced-aligner](https:\u002F\u002Fgithub.com\u002FMahmoudAshraf97\u002Fctc-forced-aligner) for speech edit test\n- [mrfakename](https:\u002F\u002Fx.com\u002Frealmrfakename) huggingface space demo ~\n- [f5-tts-mlx](https:\u002F\u002Fgithub.com\u002Flucasnewman\u002Ff5-tts-mlx\u002Ftree\u002Fmain) Implementation with MLX framework by [Lucas Newman](https:\u002F\u002Fgithub.com\u002Flucasnewman)\n- [F5-TTS-ONNX](https:\u002F\u002Fgithub.com\u002FDakeQQ\u002FF5-TTS-ONNX) ONNX Runtime version by [DakeQQ](https:\u002F\u002Fgithub.com\u002FDakeQQ)\n- [Yuekai Zhang](https:\u002F\u002Fgithub.com\u002Fyuekaizhang) Triton and TensorRT-LLM support ~\n\n## Citation\nIf our work and codebase is useful for you, please cite as:\n```\n@article{chen-etal-2024-f5tts,\n      title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, \n      author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},\n      journal={arXiv preprint arXiv:2410.06885},\n      year={2024},\n}\n```\n## License\n\nOur code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.\n","F5-TTS是一个基于流匹配技术生成流畅且逼真语音的文本转语音系统。该项目采用了Diffusion Transformer与ConvNeXt V2相结合的技术方案，显著提升了训练速度和推理效率；同时引入了Sway Sampling策略，在推理阶段通过优化流步采样进一步增强性能。适用于需要高质量语音合成的应用场景，如虚拟助手、有声读物制作以及游戏配音等。项目采用Python语言开发，并在GitHub上获得了广泛的关注和支持，遵循MIT许可协议开放源代码。",2,"2026-06-11 03:39:28","high_star"]