[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2509":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},2509,"dia","nari-labs\u002Fdia","nari-labs","A TTS model capable of generating ultra-realistic dialogue in one pass.","",null,"Python",19315,1686,162,68,0,4,12,28,18,86.48,"Apache License 2.0",false,"main",[26,27,28],"ai","open-weight","text-to-speech","2026-06-12 04:00:14","\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia\">\n\u003Cimg src=\".\u002Fdia\u002Fstatic\u002Fimages\u002Fbanner.png\">\n\u003C\u002Fa>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Ftally.so\u002Fr\u002Fmeokbo\" target=\"_blank\">\u003Cimg alt=\"Static Badge\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJoin-Waitlist-white?style=for-the-badge\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FbJq6vjRRKv\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join%20Chat-7289DA?logo=discord&style=for-the-badge\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia\u002Fblob\u002Fmain\u002FLICENSE\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg?style=for-the-badge\" alt=\"LICENSE\">\u003C\u002Fa>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fnari-labs\u002FDia-1.6B-0626\">\u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fbadges\u002Fresolve\u002Fmain\u002Fmodel-on-hf-lg-dark.svg\" alt=\"Model on HuggingFace\" height=42 >\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fnari-labs\u002FDia-1.6B\">\u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fbadges\u002Fresolve\u002Fmain\u002Fopen-in-hf-spaces-lg-dark.svg\" alt=\"Space on HuggingFace\" height=38>\u003C\u002Fa>\n\u003C\u002Fp>\n\nDia is a 1.6B parameter text to speech model created by Nari Labs.\n\n**UPDATE 🤗(06\u002F27)**: Dia is now available through [Hugging Face Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)!\n\n**UPDATE 🚀(11\u002F19)**: Dia2 is released on Github and HuggingFace [link](https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia2)!\n\nDia **directly generates highly realistic dialogue from a transcript**. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.\n\nTo accelerate research, we are providing access to pretrained model checkpoints and inference code. The model weights are hosted on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fnari-labs\u002FDia-1.6B-0626). The model only supports English generation at the moment.\n\nWe also provide a [demo page](https:\u002F\u002Fyummy-fir-7a4.notion.site\u002Fdia) comparing our model to [ElevenLabs Studio](https:\u002F\u002Felevenlabs.io\u002Fstudio) and [Sesame CSM-1B](https:\u002F\u002Fgithub.com\u002FSesameAILabs\u002Fcsm).\n\n- We have a ZeroGPU Space running! Try it now [here](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fnari-labs\u002FDia-1.6B-0626). Thanks to the HF team for the support :)\n- Join our [discord server](https:\u002F\u002Fdiscord.gg\u002FbJq6vjRRKv) for community support and access to new features.\n- Play with a larger version of Dia: generate fun conversations, remix content, and share with friends. 🔮 Join the [waitlist](https:\u002F\u002Ftally.so\u002Fr\u002Fmeokbo) for early access.\n\n## Generation Guidelines\n\n- Keep input text length moderate \n    - Short input (corresponding to under 5s of audio) will sound unnatural\n    - Very long input (corresponding to over 20s of audio) will make the speech unnaturally fast.\n- Use non-verbal tags sparingly, from the list in the README. Overusing or using unlisted non-verbals may cause weird artifacts.\n- Always begin input text with `[S1]`, and always alternate between `[S1]` and `[S2]` (i.e. `[S1]`... `[S1]`... is not good)\n- When using audio prompts (voice cloning), follow these instructions carefully:\n    - Provide the transcript of the to-be cloned audio before the generation text.\n    - Transcript must use `[S1]`, `[S2]` speaker tags correctly (i.e. single speaker: `[S1]`..., two speakers: `[S1]`... `[S2]`...)\n    - Duration of the to-be cloned audio should be 5~10 seconds for the best results.\n        (Keep in mind: 1 second ≈ 86 tokens)\n- Put `[S1]` or `[S2]` (the second-to-last speaker's tag) at the end of the audio to improve audio quality at the end\n\n## Quickstart\n\n### Transformers Support\n\nWe now have a [Hugging Face Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) implementation of Dia! You should install `main` branch of `transformers` to use it. See [hf.py](hf.py) for more information.\n\n\u003Cdetails>\n\u003Csummary>View more details\u003C\u002Fsummary>\n\nInstall `main` branch of `transformers`\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers.git\n# or install with uv\nuv pip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers.git\n```\n\nRun `hf.py`. The file is as below.\n\n```python\nfrom transformers import AutoProcessor, DiaForConditionalGeneration\n\n\ntorch_device = \"cuda\"\nmodel_checkpoint = \"nari-labs\u002FDia-1.6B-0626\"\n\ntext = [\n    \"[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face.\"\n]\nprocessor = AutoProcessor.from_pretrained(model_checkpoint)\ninputs = processor(text=text, padding=True, return_tensors=\"pt\").to(torch_device)\n\nmodel = DiaForConditionalGeneration.from_pretrained(model_checkpoint).to(torch_device)\noutputs = model.generate(\n    **inputs, max_new_tokens=3072, guidance_scale=3.0, temperature=1.8, top_p=0.90, top_k=45\n)\n\noutputs = processor.batch_decode(outputs)\nprocessor.save_audio(outputs, \"example.mp3\")\n```\n\n\u003C\u002Fdetails>\n\n### Run with this repo\n\n\u003Cdetails>\n\u003Csummary> Install via pip \u003C\u002Fsummary>\n\n```bash\n# Clone this repository\ngit clone https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia.git\ncd dia\n\n# Optionally\npython -m venv .venv && source .venv\u002Fbin\u002Factivate\n\n# Install dia\npip install -e .\n```\n\nOr you can install without cloning.\n\n```bash\n# Install directly from GitHub\npip install git+https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia.git\n```\n\nNow, run some examples.\n\n```bash\npython example\u002Fsimple.py\n```\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>Install via uv\u003C\u002Fsummary>\n\nYou need [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) to be installed.\n\n```bash\n# Clone this repository\ngit clone https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia.git\ncd dia\n```\n\nRun some examples directly.\n\n```bash\nuv run example\u002Fsimple.py\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Run Gradio UI\u003C\u002Fsummary>\n\n```bash\npython app.py\n\n# Or if you have uv installed\nuv run app.py\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Run with CLI\u003C\u002Fsummary>\n\n```bash\npython cli.py --help\n\n# Or if you have uv installed\nuv run cli.py --help\n```\n\n\u003C\u002Fdetails>\n\n> [!NOTE]\n> The model was not fine-tuned on a specific voice. Hence, you will get different voices every time you run the model.\n> You can keep speaker consistency by either adding an audio prompt, or fixing the seed.\n\n> [!IMPORTANT]\n> If you are using 5000 series GPU, you should use torch 2.8 nightly. Look at the issue [#26](https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia\u002Fissues\u002F26) for more details.\n\n## Features\n\n- Generate dialogue via `[S1]` and `[S2]` tag\n- Generate non-verbal like `(laughs)`, `(coughs)`, etc.\n  - Below verbal tags will be recognized, but might result in unexpected output.\n  - `(laughs), (clears throat), (sighs), (gasps), (coughs), (singing), (sings), (mumbles), (beep), (groans), (sniffs), (claps), (screams), (inhales), (exhales), (applause), (burps), (humming), (sneezes), (chuckle), (whistles)`\n- Voice cloning. See [`example\u002Fvoice_clone.py`](example\u002Fvoice_clone.py) for more information.\n  - In the Hugging Face space, you can upload the audio you want to clone and place its transcript before your script. Make sure the transcript follows the required format. The model will then output only the content of your script.\n\n\n## 💻 Hardware and Inference Speed\n\nDia has been tested on only GPUs (pytorch 2.0+, CUDA 12.6). CPU support is to be added soon.\nThe initial run will take longer as the Descript Audio Codec also needs to be downloaded.\n\nThese are the speed we benchmarked in RTX 4090.\n\n| precision | realtime factor w\u002F compile | realtime factor w\u002Fo compile | VRAM |\n|:-:|:-:|:-:|:-:|\n| `bfloat16` | x2.1 | x1.5 | ~4.4GB |\n| `float16` | x2.2 | x1.3 | ~4.4GB |\n| `float32` | x1 | x0.9 | ~7.9GB |\n\nWe will be adding a quantized version in the future.\n\nIf you don't have hardware available or if you want to play with bigger versions of our models, join the waitlist [here](https:\u002F\u002Ftally.so\u002Fr\u002Fmeokbo).\n\n## 🪪 License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## ⚠️ Disclaimer\n\nThis project offers a high-fidelity speech generation model intended for research and educational use. The following uses are **strictly forbidden**:\n\n- **Identity Misuse**: Do not produce audio resembling real individuals without permission.\n- **Deceptive Content**: Do not use this model to generate misleading content (e.g. fake news)\n- **Illegal or Malicious Use**: Do not use this model for activities that are illegal or intended to cause harm.\n\nBy using this model, you agree to uphold relevant legal standards and ethical responsibilities. We **are not responsible** for any misuse and firmly oppose any unethical usage of this technology.\n\n## 🔭 TODO \u002F Future Work\n\n- Docker support for ARM architecture and MacOS.\n- Optimize inference speed.\n- Add quantization for memory efficiency.\n\n## 🤝 Contributing\n\nWe are a tiny team of 1 full-time and 1 part-time research-engineers. We are extra-welcome to any contributions!\nJoin our [Discord Server](https:\u002F\u002Fdiscord.gg\u002FbJq6vjRRKv) for discussions.\n\n## 🤗 Acknowledgements\n\n- We thank the [Google TPU Research Cloud program](https:\u002F\u002Fsites.research.google\u002Ftrc\u002Fabout\u002F) for providing computation resources.\n- Our work was heavily inspired by [SoundStorm](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.09636), [Parakeet](https:\u002F\u002Fjordandarefsky.com\u002Fblog\u002F2024\u002Fparakeet\u002F), and [Descript Audio Codec](https:\u002F\u002Fgithub.com\u002Fdescriptinc\u002Fdescript-audio-codec).\n- Hugging Face for providing the ZeroGPU Grant.\n- \"Nari\" is a pure Korean word for lily.\n- We thank Jason Y. for providing help with data filtering.\n\n\n## ⭐ Star History\n\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F#nari-labs\u002Fdia&Date\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=nari-labs\u002Fdia&type=Date&theme=dark\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=nari-labs\u002Fdia&type=Date\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=nari-labs\u002Fdia&type=Date\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n","Dia 是一个由 Nari Labs 开发的1.6B参数文本转语音模型，能够一次性生成高度逼真的对话。其核心功能包括直接从文字脚本生成自然对话，并支持通过音频条件控制情感和语调，还能产生如笑声、咳嗽等非言语交流。该模型基于Python开发，采用Hugging Face Transformers框架，目前仅支持英语。适用于需要高质量语音合成的应用场景，例如虚拟助手、有声读物制作或游戏角色配音等。开发者提供了预训练模型权重及推理代码以加速研究进度，同时在Hugging Face平台上开放了在线试用空间。",2,"2026-06-11 02:50:12","top_language"]