[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74057":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":14,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},74057,"LuxTTS","ysharma3501\u002FLuxTTS","ysharma3501","A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.",null,"Python",4011,524,38,21,0,7,26,128,97.16,"Apache License 2.0",false,"master",true,[],"2026-06-12 04:01:13","# LuxTTS\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FYatharthS\u002FLuxTTS\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-FFD21E\" alt=\"Hugging Face Model\">\n  \u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FYatharthS\u002FLuxTTS\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Space-blue\" alt=\"Hugging Face Space\">\n  \u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1cDaxtbSDLRmu6tRV_781Of_GSjHSo1Cu?usp=sharing\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FColab-Notebook-F9AB00?logo=googlecolab&logoColor=white\" alt=\"Colab Notebook\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\nLuxTTS is an lightweight zipvoice based text-to-speech model designed for high quality voice cloning and realistic generation at speeds exceeding 150x realtime.\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa3b57152-8d97-43ce-bd99-26dc9a145c29\n\n\n### The main features are\n- Voice cloning: SOTA voice cloning on par with models 10x larger.\n- Clarity: Clear 48khz speech generation unlike most TTS models which are limited to 24khz.\n- Speed: Reaches speeds of 150x realtime on a single GPU and faster then realtime on CPU's as well.\n- Efficiency: Fits within 1gb vram meaning it can fit in any local gpu.\n\n## Usage\nYou can try it locally, colab, or spaces.\n\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1cDaxtbSDLRmu6tRV_781Of_GSjHSo1Cu?usp=sharing)\n[![Open in Spaces](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fbadges\u002Fresolve\u002Fmain\u002Fopen-in-hf-spaces-sm.svg)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FYatharthS\u002FLuxTTS)\n\n#### Simple installation:\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fysharma3501\u002FLuxTTS.git\ncd LuxTTS\npip install -r requirements.txt\n```\n\n#### Load model:\n```python\nfrom zipvoice.luxvoice import LuxTTS\n\n# load model on GPU\nlux_tts = LuxTTS('YatharthS\u002FLuxTTS', device='cuda')\n\n# load model on CPU\n# lux_tts = LuxTTS('YatharthS\u002FLuxTTS', device='cpu', threads=2)\n\n# load model on MPS for macs\n# lux_tts = LuxTTS('YatharthS\u002FLuxTTS', device='mps')\n```\n\n#### Simple inference\n```python\nimport soundfile as sf\nfrom IPython.display import Audio\n\ntext = \"Hey, what's up? I'm feeling really great if you ask me honestly!\"\n\n## change this to your reference file path, can be wav\u002Fmp3\nprompt_audio = 'audio_file.wav'\n\n## encode audio(takes 10s to init because of librosa first time)\nencoded_prompt = lux_tts.encode_prompt(prompt_audio, rms=0.01)\n\n## generate speech\nfinal_wav = lux_tts.generate_speech(text, encoded_prompt, num_steps=4)\n\n## save audio\nfinal_wav = final_wav.numpy().squeeze()\nsf.write('output.wav', final_wav, 48000)\n\n## display speech\nif display is not None:\n  display(Audio(final_wav, rate=48000))\n```\n\n#### Inference with sampling params:\n```python\nimport soundfile as sf\nfrom IPython.display import Audio\n\ntext = \"Hey, what's up? I'm feeling really great if you ask me honestly!\"\n\n## change this to your reference file path, can be wav\u002Fmp3\nprompt_audio = 'audio_file.wav'\n\nrms = 0.01 ## higher makes it sound louder(0.01 or so recommended)\nt_shift = 0.9 ## sampling param, higher can sound better but worse WER\nnum_steps = 4 ## sampling param, higher sounds better but takes longer(3-4 is best for efficiency)\nspeed = 1.0 ## sampling param, controls speed of audio(lower=slower)\nreturn_smooth = False ## sampling param, makes it sound smoother possibly but less cleaner\nref_duration = 5 ## Setting it lower can speedup inference, set to 1000 if you find artifacts.\n\n## encode audio(takes 10s to init because of librosa first time)\nencoded_prompt = lux_tts.encode_prompt(prompt_audio, duration=ref_duration, rms=rms)\n\n## generate speech\nfinal_wav = lux_tts.generate_speech(text, encoded_prompt, num_steps=num_steps, t_shift=t_shift, speed=speed, return_smooth=return_smooth)\n\n## save audio\nfinal_wav = final_wav.numpy().squeeze()\nsf.write('output.wav', final_wav, 48000)\n\n## display speech\nif display is not None:\n  display(Audio(final_wav, rate=48000))\n```\n## Tips\n- Please use at minimum a 3 second audio file for voice cloning.\n- You can use return_smooth = True if you hear metallic sounds.\n- Lower t_shift for less possible pronunciation errors but worse quality and vice versa.\n\n## Community\n- [Lux-TTS-Gradio](https:\u002F\u002Fgithub.com\u002FNidAll\u002FLuxTTS-Gradio): A gradio app to use LuxTTS.\n- [OptiSpeech](https:\u002F\u002Fgithub.com\u002Fycharfi09\u002FOptiClone): Clean UI app to use LuxTTS.\n- [LuxTTS-Comfyui](https:\u002F\u002Fgithub.com\u002FDragonDiffusionbyBoyo\u002FBoyoLuxTTS-Comfyui.git): Nodes to use LuxTTS in comfyui.\n\nThanks to all community contributions!\n\n## Info\n\nQ: How is this different from ZipVoice?\n\nA: LuxTTS uses the same architecture but distilled to 4 steps with an improved sampling technique. It also uses a custom 48khz vocoder instead of the default 24khz version.\n\nQ: Can it be even faster?\n\nA: Yes, currently it uses float32. Float16 should be significantly faster(almost 2x).\n\n## Roadmap\n\n- [x] Release model and code\n- [x] Huggingface spaces demo\n- [x] Release MPS support (thanks to @builtbybasit)\n- [ ] Release LuxTTS v1.5\n- [ ] Release code for float16 inference\n\n## Acknowledgments\n\n- [ZipVoice](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002FZipVoice) for their excellent code and model.\n- [Vocos](https:\u002F\u002Fgithub.com\u002Fgemelo-ai\u002Fvocos.git) for their great vocoder.\n  \n## Final Notes\n\nThe model and code are licensed under the Apache-2.0 license. See LICENSE for details.\n\nStars\u002FLikes would be appreciated, thank you.\n\nEmail: yatharthsharma350@gmail.com\n","LuxTTS 是一个高质量的快速语音克隆模型，能够以超过实时150倍的速度生成逼真的语音。该项目的核心功能包括顶级的语音克隆能力、清晰度达到48kHz的语音合成以及极高的运行效率，在单个GPU上即可实现高速处理，同时仅需约1GB显存。此外，LuxTTS支持多种部署方式，如本地环境、Google Colab或Hugging Face Spaces，适用于需要高效且高质量文本转语音解决方案的各种场景，比如个人项目开发、教育工具或是专业音频制作等。",2,"2026-06-11 03:48:36","high_star"]