[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-10895":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":9,"totalLinesOfCode":9,"stars":12,"forks":13,"watchers":14,"openIssues":14,"contributorsCount":9,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":9,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},10895,"pocket-tts","kyutai-labs\u002Fpocket-tts","kyutai-labs","A TTS that fits in your CPU (and pocket)",null,"https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Fpocket-tts","Python",4580,512,43,0,19,44,228,57,30.13,false,"main","2026-06-12 02:02:28","# Pocket TTS\n\n\u003Cimg width=\"1446\" height=\"622\" alt=\"pocket-tts-logo-v2-transparent\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F637b5ed6-831f-4023-9b4c-741be21ab238\" \u002F>\n\nA lightweight text-to-speech (TTS) application designed to run efficiently on CPUs.\nForget about the hassle of using GPUs and web APIs serving TTS models. With Kyutai's Pocket TTS, generating audio is just a pip install and a function call away.\n\nSupports Python 3.10, 3.11, 3.12, 3.13 and 3.14. Requires PyTorch 2.5+. Does not require the gpu version of PyTorch.\n\n[🔊 Demo](https:\u002F\u002Fkyutai.org\u002Fpocket-tts) | \n[🐱‍💻GitHub Repository](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Fpocket-tts) | \n[🤗 Hugging Face Model Card](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Fpocket-tts) | \n[⚙️ Tech report](https:\u002F\u002Fkyutai.org\u002Fblog\u002F2026-01-13-pocket-tts) |\n[📄 Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.06926) | \n[📚 Documentation](https:\u002F\u002Fkyutai-labs.github.io\u002Fpocket-tts\u002F)\n\n\n## Main takeaways\n* Runs on CPU\n* Small model size, 100M parameters\n* Audio streaming\n* Low latency, ~200ms to get the first audio chunk\n* Faster than real-time, ~6x real-time on a CPU of MacBook Air M4\n* Uses only 2 CPU cores\n* Python API and CLI\n* Voice cloning\n* Multi-language support: english, french, german, portuguese, italian, spanish\n* Can handle infinitely long text inputs\n* [Can run on client-side in the browser](#in-browser-implementations)\n\nAdditional languages may be added in the future.\n\n## Trying it from the website, without installing anything\n\nNavigate to the [Kyutai website](https:\u002F\u002Fkyutai.org\u002Fpocket-tts) to try it out directly in your browser. You can input text, select different voices, and generate speech without any installation.\n\n## Trying it with the CLI\n\n### The `generate` command\nYou can use pocket-tts directly from the command line. We recommend using\n`uv` as it installs any dependencies on the fly in an isolated environment (uv installation instructions [here](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fgetting-started\u002Finstallation\u002F#standalone-installer)).\nYou can also use `pip install pocket-tts` to install it manually.\n\nThis will generate a wav file `.\u002Ftts_output.wav` saying the default text with the default voice, and display some speed statistics.\n```bash\nuvx pocket-tts generate\n# or if you installed it manually with pip:\npocket-tts generate\n```\nModify the voice with `--voice` and the text with `--text`. We provide a small catalog of voices.\nChoose a pretrained language model with `--language` when running `generate`, `export-voice`, or `serve` (default: `english`). Non-english languages have also biggers 24 layers variants that are higher quality but slower. You can select them by using for example `--language italian_24l`.\nThe `--config` option accepts only a local YAML path for custom weights.\n\nYou can take a look at [this page](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices) which details the licenses\nfor each voice.\n\n* [alba](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Falba-mackenna\u002Fcasual.wav) (en)\n* [giovanni](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Fpocket-tts\u002Fblob\u002Fadd_lang_not_documented\u002Fcommon_voice_it_36520747-enhanced-v2.mp3) (it)\n* [lola](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Fpocket-tts\u002Fblob\u002Fadd_lang_not_documented\u002Fcommon_voice_es_19762977-enhanced-v2.mp3) (es)\n* [juergen](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Fpocket-tts\u002Fblob\u002Fadd_lang_not_documented\u002Fde-DE-juergen.mp3) (de)\n* [rafael](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Fpocket-tts\u002Fblob\u002Fadd_lang_not_documented\u002Fg-Vi8PgmSY0-enhanced-v2.wav) (pt)\n* [estelle](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Funmute-prod-website\u002Fdeveloppeuse-3.wav) (fr)\n* [anna](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp228_023_enhanced.wav) (en)\n* [azelma](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp303_023_enhanced.wav) (en)\n* [bill_boerst](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvoice-zero\u002Fbill_boerst.wav) (en)\n* [caro_davy](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvoice-zero\u002Fcaro_davy.wav) (en)\n* [charles](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp254_023_enhanced.wav) (en)\n* [cosette](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fexpresso\u002Fex04-ex02_confused_001_channel1_499s.wav) (en)\n* [eponine](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp262_023_enhanced.wav) (en)\n* [eve](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp361_023_enhanced.wav) (en)\n* [fantine](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp244_023_enhanced.wav) (en)\n* [george](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp315_023_enhanced.wav) (en)\n* [jane](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp339_023_enhanced.wav) (en)\n* [jean](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fears\u002Fp010\u002Ffreeform_speech_01_enhanced.wav) (en)\n* [javert](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvoice-donations\u002FButter.wav) (en)\n* [marius](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvoice-donations\u002FSelfie.wav) (en)\n* [mary](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp333_023_enhanced.wav) (en)\n* [michael](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp360_023_enhanced.wav) (en)\n* [paul](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp259_023_enhanced.wav) (en)\n* [peter_yearsley](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvoice-zero\u002Fpeter_yearsley.wav) (en)\n* [stuart_bell](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvoice-zero\u002Fstuart_bell.wav) (en)\n* [vera](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvctk\u002Fp229_023_enhanced.wav) (en)\n\nThe `--voice` argument can also take a plain wav file as input for voice cloning.\nYou can use your own or check out our [voice repository](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices).\nWe recommend [cleaning the sample](https:\u002F\u002Fpodcast.adobe.com\u002Fen\u002Fenhance) before using it with Pocket TTS, because the audio quality of the sample is also reproduced.\n\nFeel free to check out the [generate documentation](https:\u002F\u002Fkyutai-labs.github.io\u002Fpocket-tts\u002FCLI%20Commands\u002Fgenerate\u002F) for more details and examples.\nFor trying multiple voices and prompts quickly, prefer using the `serve` command.\n\n### The `serve` command\n\nYou can also run a local server to generate audio via HTTP requests.\n```bash\nuvx pocket-tts serve\n# or if you installed it manually with pip:\npocket-tts serve\n```\nNavigate to `http:\u002F\u002Flocalhost:8000` to try the web interface, it's faster than the command line as the model is kept in memory between requests.\n\nYou can check out the [serve documentation](https:\u002F\u002Fkyutai-labs.github.io\u002Fpocket-tts\u002FCLI%20Commands\u002Fserve\u002F) for more details and examples.\n\n### The `export-voice` command\n\nProcessing an audio file (e.g., a .wav or .mp3) for voice cloning is relatively slow, but loading a safetensors file -- a voice embedding converted from an audio file -- is very fast. You can use the `export-voice` command to do this conversion. See the [export-voice documentation](https:\u002F\u002Fkyutai-labs.github.io\u002Fpocket-tts\u002FCLI%20Commands\u002Fexport_voice\u002F) for more details and examples.\n\n\n## Using it as a Python library\n\nYou can try out the Python library on Colab [here](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fkyutai-labs\u002Fpocket-tts\u002Fblob\u002Fmain\u002Fdocs\u002Fpocket-tts-example.ipynb).\n\nInstall the package with\n```bash\npip install pocket-tts\n# or\nuv add pocket-tts\n```\n\nYou can use this package as a simple Python library to generate audio from text.\n```python\nfrom pocket_tts import TTSModel\nimport scipy.io.wavfile\n\ntts_model = TTSModel.load_model()\nvoice_state = tts_model.get_state_for_audio_prompt(\n    \"alba\"  # One of the pre-made voices, see above\n    # You can also use any voice file you have locally or from Hugging Face:\n    # \".\u002Fsome_audio.wav\"\n    # or \"hf:\u002F\u002Fkyutai\u002Ftts-voices\u002Fexpresso\u002Fex01-ex02_default_001_channel2_198s.wav\"\n)\naudio = tts_model.generate_audio(voice_state, \"Hello world, this is a test.\")\n# Audio is a 1D torch tensor containing PCM data.\nscipy.io.wavfile.write(\"output.wav\", tts_model.sample_rate, audio.numpy())\n```\n\nYou can have multiple voice states around if\nyou have multiple voices you want to use. `load_model()`\nand `get_state_for_audio_prompt()` are relatively slow operations,\nso we recommend to keep the model and voice states in memory if you can.\n\nFor faster voice loading, you can export voice states to safetensors files:\n```python\nfrom pocket_tts import TTSModel, export_model_state\n\nmodel = TTSModel.load_model()\n\n# Export a voice state for fast loading later\nmodel_state = model.get_state_for_audio_prompt(\"some_voice.wav\")\nexport_model_state(model_state, \".\u002Fsome_voice.safetensors\")\n\n# Later, load it quickly, this is quite fast as it's just reading the kvcache\n# from disk and doesn't do any others computations.\nmodel_state_copy = model.get_state_for_audio_prompt(\".\u002Fsome_voice.safetensors\")\n\naudio = model.generate_audio(model_state_copy, \"Hello world!\")\n```\n\nYou can check out the [Python API documentation](https:\u002F\u002Fkyutai-labs.github.io\u002Fpocket-tts\u002FAPI%20Reference\u002Fpython-api\u002F) for more details and examples.\n\n## Unsupported features\n\nAt the moment, we do not support (but would love pull requests adding):\n\n- [Adding silence in the text input to generate pauses.](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Fpocket-tts\u002Fissues\u002F6)\n- [Quantization to run the computation in int8.](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Fpocket-tts\u002Fissues\u002F7)\n\nWe tried running this TTS model on the GPU but did not observe a speedup compared to CPU execution,\nnotably because we use a batch size of 1 and a very small model.\n\n## Development and local setup\n\nWe accept contributions! Feel free to open issues or pull requests on GitHub.\n\nYou can find development instructions in the [CONTRIBUTING.md](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Fpocket-tts\u002Ftree\u002Fmain\u002FCONTRIBUTING.md) file. You'll also find there how to have an editable install of the package for local development.\n\n## In-browser implementations\n\nPocket TTS is small enough to run directly in your browser in WebAssembly\u002FJavaScript.\nWe don't have official support for this yet, but you can try out one of these community implementations:\n- [wasm-pocket-tts](https:\u002F\u002Fgithub.com\u002FLaurentMazare\u002Fxn\u002Ftree\u002Fmain\u002Fwasm-pocket-tts) by @LaurentMazare: Rust port of pocket TTS with XN. Demo [here](https:\u002F\u002Flaurentmazare.github.io\u002Fpocket-tts\u002F)\n- [pocket-tts-onnx-export](https:\u002F\u002Fgithub.com\u002FKevinAHM\u002Fpocket-tts-onnx-export) by @KevinAHM: Model exported to .onnx and run using [ONNX Runtime Web](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Ftutorials\u002Fweb\u002F). Demo [here](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FKevinAHM\u002Fpocket-tts-web)\n- [pocket-tts](https:\u002F\u002Fgithub.com\u002Fbabybirdprd\u002Fpocket-tts) by @babybirdprd: Candle version (Rust) with WebAssembly and PyO3 bindings, meaning it can run on the web too.\n- [jax-js](https:\u002F\u002Fgithub.com\u002Fekzhang\u002Fjax-js\u002Ftree\u002Fmain\u002Fwebsite\u002Fsrc\u002Froutes\u002Ftts) by @ekzhang: Using jax-js, a ML library for the web. Demo [here](https:\u002F\u002Fjax-js.com\u002Ftts)\n\n\n## Alterative implementations\n- [pocket-tts-mlx](https:\u002F\u002Fgithub.com\u002Fjishnuvenugopal\u002Fpocket-tts-mlx) by @jishnuvenugopal - MLX backend optimized for Apple Silicon\n- [pocket-tts-xn](https:\u002F\u002Fgithub.com\u002FLaurentMazare\u002Fxn\u002Ftree\u002Fmain\u002Fpocket-tts) by @LaurentMazare - A Rust port of Pocket TTS implemented with XN.\n- [pocket-tts-candle](https:\u002F\u002Fgithub.com\u002Fbabybirdprd\u002Fpocket-tts) by @babybirdprd - Candle version (Rust) with WebAssembly and PyO3 bindings.\n- [PocketTTS.cpp](https:\u002F\u002Fgithub.com\u002FVolgaGerm\u002FPocketTTS.cpp) by @VolgaGerm - Single-file C++ runtime using ONNX Runtime, with CLI, HTTP server, and FFI C API.\n- [sherpa-onnx](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002Fsherpa-onnx) by @csukuangfj - Run PocketTTS on **Windows, macOS, Linux**, and embedded boards (Raspberry Pi, Jetson, RK3588, etc.) with bindings for 12 programming languages: **C++, C, Python, JavaScript, Java, C#, Kotlin, Swift, Go, Dart, Rust, Pascal**, plus [WebAssembly](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fk2-fsa\u002Fweb-assembly-en-tts-pocket).\n- [pocket-tts-csharp](https:\u002F\u002Fgithub.com\u002FTheAjaykrishnanR\u002Fpocket-tts-csharp) by @TheAjaykrishnanR - A C# port of Pocket TTS implemented using [TorchSharp](https:\u002F\u002Fgithub.com\u002Fdotnet\u002FTorchSharp) and [TorchSharp.PyBridge](https:\u002F\u002Fgithub.com\u002Fshaltielshmid\u002FTorchSharp.PyBridge) for ease of use as a library in .NET projects.\n\n## Projects using Pocket TTS\n\n- [pocket-reader](https:\u002F\u002Fgithub.com\u002Flukasmwerner\u002Fpocket-reader) by @lukasmwerner- Browser screen reader\n- [pocket-tts-wyoming](https:\u002F\u002Fgithub.com\u002Fikidd\u002Fpocket-tts-wyoming) by @ikidd - Docker container for pocket-tts using Wyoming protocol, ready for Home Assistant Voice use.\n- [Sonorus](https:\u002F\u002Fwww.nexusmods.com\u002Fhogwartslegacy\u002Fmods\u002F2409) by @KevinAHM - Talk to any named character in Hogwarts Legacy with their original voice.\n- [Mac pocket-tts](https:\u002F\u002Fgithub.com\u002Fslaughters85j\u002Fpocket-tts) by @slaughters85j - Mac Desktop App + macOS Quick Action\n- [pocket-tts-openai_streaming_server](https:\u002F\u002Fgithub.com\u002Fteddybear082\u002Fpocket-tts-openai_streaming_server) by @teddybear082 - OpenAI-compatible streaming server, dockerized and with an `.exe` release\n- [pocket-tts-unity](https:\u002F\u002Fgithub.com\u002Flookbe\u002Fpocket-tts-unity) by @lookbe - A Unity 6 integration for Pocket-TTS.\n- [ComfyUI-Pocket-TTS](https:\u002F\u002Fgithub.com\u002Fai-joe-git\u002FComfyUI-Pocket-TTS) by @ai-joe-git Lightweight CPU-based Text-to-Speech for ComfyUI\n- [pocket-tts-server](https:\u002F\u002Fgithub.com\u002Fai-joe-git\u002Fpocket-tts-server) by @ai-joe-git A lightweight, real-time voice cloning and chat server with OpenAI-compatible API. Clone any voice with just 20 seconds of audio and chat with AI using that voice instantly.\n- [discord-tts](https:\u002F\u002Fgithub.com\u002Falkmei\u002Fdiscord-tts) by @alkmei - Multivoice Discord text-to-speech bot that uses Pocket TTS.\n- [cursed-codex](https:\u002F\u002Fgithub.com\u002Fdooart\u002Fcursed-codex) by @dooart - AI coding agent with unhinged live football commentary\n- [pocket-tts-deno](https:\u002F\u002Fgithub.com\u002Fohmstone\u002Fpocket-tts-deno) Port of [pocket-tts-server](https:\u002F\u002Fgithub.com\u002Fai-joe-git\u002Fpocket-tts-server) as a wasm + onnx deno server with voice TTS API.\n- [FrontPocket](https:\u002F\u002Fgithub.com\u002Fmarkd89\u002FFrontPocket) by @markd89 - Front-end for Pocket-TTS to speak text from clipboard, file, CLI (hotkeys) & GUI toolbar. Change playback speed, voice, and move forward\u002Fbackward between sentences instantaneously. \n- [openclaw-pockettts](https:\u002F\u002Fgithub.com\u002Fdodgyrabbit\u002Fopenclaw-pockettts) by @dodgyrabbit - A Docker container with the Python implementation but exposed as an OpenAI TTS API for easy integration with OpenClaw.\n- [openclaw-pocketts.cpp](https:\u002F\u002Fgithub.com\u002Fdodgyrabbit\u002Fopenclaw-pockettts.cpp) by @dodgyrabbit - A Docker container with the PocketTTS.cpp version, packaged for easy integration with OpenClaw.\n- [tts-audiobook-tool](https:\u002F\u002Fgithub.com\u002Fzeropointnine\u002Ftts-audiobook-tool) by @zeropointnine - Multi-model audiobook generator with automatic error detection, 48khz upscaling, synced browser reader, stand-alone server-mode.\n\n\n## Prohibited use\n\nUse of our model must comply with all applicable laws and regulations and must not result in, involve, or facilitate any illegal, harmful, deceptive, fraudulent, or unauthorized activity. Prohibited uses include, without limitation, voice impersonation or cloning without explicit and lawful consent; misinformation, disinformation, or deception (including fake news, fraudulent calls, or presenting generated content as genuine recordings of real people or events); and the generation of unlawful, harmful, libelous, abusive, harassing, discriminatory, hateful, or privacy-invasive content. We disclaim all liability for any non-compliant use.\n\n\n## Authors\n\nManu Orsini*, Simon Rouard*, Gabriel De Marmiesse*, Václav Volhejn, Neil Zeghidour, Alexandre Défossez\n\n*equal contribution\n","Pocket TTS 是一个轻量级的文本转语音（TTS）应用程序，专为在CPU上高效运行而设计。其核心功能包括支持多语言、语音克隆及音频流处理，模型大小仅为100M参数，能在不依赖GPU的情况下实现低延迟（约200ms生成首段音频）和接近实时6倍速的音频生成能力，仅需使用2个CPU核心。该软件提供Python API与命令行接口，便于集成到各种项目中。适用于需要在资源受限环境下快速部署TTS服务的场景，如移动应用开发、嵌入式系统或任何希望减少对云服务依赖的情况。",2,"2026-06-11 03:30:41","trending"]