[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72268":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72268,"mini-omni","gpt-omni\u002Fmini-omni","gpt-omni","open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. ","https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.16725",null,"Python",3558,309,79,37,0,3,6,12,9,69.67,"MIT License",false,"main",[],"2026-06-12 04:01:04","\r\n# Mini-Omni\r\n\r\n\u003Cp align=\"center\">\u003Cstrong style=\"font-size: 18px;\">\r\nMini-Omni: Language Models Can Hear, Talk While Thinking in Streaming\r\n\u003C\u002Fstrong>\r\n\u003C\u002Fp>\r\n\r\n\u003Cp align=\"center\">\r\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fgpt-omni\u002Fmini-omni\">Hugging Face\u003C\u002Fa>   | 📖 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgpt-omni\u002Fmini-omni\">Github\u003C\u002Fa> \r\n|     📑 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.16725\">Technical report\u003C\u002Fa> |\r\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgpt-omni\u002FVoiceAssistant-400K\">Datasets\u003C\u002Fa>\r\n\u003C\u002Fp>\r\n\r\nMini-Omni is an open-source multimodal large language model that can **hear, talk while thinking**. Featuring real-time end-to-end speech input and **streaming audio output** conversational capabilities.\r\n\r\n\u003Cp align=\"center\">\r\n    \u003Cimg src=\"data\u002Ffigures\u002Fframeworkv3.jpg\" width=\"100%\"\u002F>\r\n\u003C\u002Fp>\r\n\r\n\r\n## Updates\r\n\r\n- **2024.10:** We released [Mini-Omni2](https:\u002F\u002Fgithub.com\u002Fgpt-omni\u002Fmini-omni2) with vision and audio capabilities. \r\n- **2024.09:** Amazing online [interactive gradio demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fgradio\u002Fomni-mini) by 🤗 gradio team.\r\n- **2024.09:** **VoiceAssistant-400K** is uploaded to [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgpt-omni\u002FVoiceAssistant-400K).\r\n\r\n## Features\r\n\r\n✅ **Real-time speech-to-speech** conversational capabilities. No extra ASR or TTS models required.\r\n\r\n✅ **Talking while thinking**, with the ability to generate text and audio at the same time.\r\n\r\n✅ **Streaming audio output** capabilities.\r\n\r\n✅ With \"Audio-to-Text\" and \"Audio-to-Audio\" **batch inference** to further boost the performance.\r\n\r\n## Demo\r\n\r\nNOTE: need to unmute first.\r\n\r\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F03bdde05-9514-4748-b527-003bea57f118\r\n\r\n\r\n## Install\r\n\r\nCreate a new conda environment and install the required packages:\r\n\r\n```sh\r\nconda create -n omni python=3.10\r\nconda activate omni\r\n\r\ngit clone https:\u002F\u002Fgithub.com\u002Fgpt-omni\u002Fmini-omni.git\r\ncd mini-omni\r\npip install -r requirements.txt\r\n```\r\n\r\n## Quick start\r\n\r\n**Interactive demo**\r\n\r\n- start server\r\n\r\nNOTE: you need to start the server before running the streamlit or gradio demo with API_URL set to the server address.\r\n\r\n```sh\r\nsudo apt-get install ffmpeg\r\nconda activate omni\r\ncd mini-omni\r\npython3 server.py --ip '0.0.0.0' --port 60808\r\n```\r\n\r\n\r\n- run streamlit demo\r\n\r\nNOTE: you need to run streamlit **locally** with PyAudio installed. For error: `ModuleNotFoundError: No module named 'utils.vad'`, please run `export PYTHONPATH=.\u002F` first.\r\n\r\n```sh\r\npip install PyAudio==0.2.14\r\nAPI_URL=http:\u002F\u002F0.0.0.0:60808\u002Fchat streamlit run webui\u002Fomni_streamlit.py\r\n```\r\n\r\n- run gradio demo\r\n```sh\r\nAPI_URL=http:\u002F\u002F0.0.0.0:60808\u002Fchat python3 webui\u002Fomni_gradio.py\r\n```\r\n\r\nexample:\r\n\r\nNOTE: need to unmute first. Gradio seems can not play audio stream instantly, so the latency feels a bit longer.\r\n\r\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F29187680-4c42-47ff-b352-f0ea333496d9\r\n\r\n\r\n**Local test**\r\n\r\n```sh\r\nconda activate omni\r\ncd mini-omni\r\n# test run the preset audio samples and questions\r\npython inference.py\r\n```\r\n\r\n## FAQ\r\n\r\n**1. Does the model support other languages?**\r\n\r\nNo, the model is only trained on English. However, as we use whisper as the audio encoder, the model can understand other languages which is supported by whisper (like chinese), but the output is only in English.\r\n\r\n**2. What is `post_adapter` in the code? does the open-source version support tts-adapter?**\r\n\r\nThe `post_adapter` is `tts-adapter` in the model.py, but the open-source version does not support `tts-adapter`.\r\n\r\n**3. Error: `ModuleNotFoundError: No module named 'utils.xxxx'`**\r\n\r\nRun `export PYTHONPATH=.\u002F` first. No need to run `pip install utils`, or just try: `pip uninstall utils`\r\n\r\n**4. Error: can not run streamlit in local browser, with remote streamlit server**, issue: https:\u002F\u002Fgithub.com\u002Fgpt-omni\u002Fmini-omni\u002Fissues\u002F37\r\n    \r\nYou need start streamlit **locally** with PyAudio installed.\r\n\r\n\r\n## Acknowledgements \r\n\r\n- [Qwen2](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen2\u002F) as the LLM backbone.\r\n- [litGPT](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flitgpt\u002F) for training and inference.\r\n- [whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper\u002F)  for audio encoding.\r\n- [snac](https:\u002F\u002Fgithub.com\u002Fhubertsiuzdak\u002Fsnac\u002F)  for audio decoding.\r\n- [CosyVoice](https:\u002F\u002Fgithub.com\u002FFunAudioLLM\u002FCosyVoice) for generating synthetic speech.\r\n- [OpenOrca](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOpen-Orca\u002FOpenOrca) and [MOSS](https:\u002F\u002Fgithub.com\u002FOpenMOSS\u002FMOSS\u002Ftree\u002Fmain) for alignment.\r\n\r\n## Star History\r\n\r\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=gpt-omni\u002Fmini-omni&type=Date)](https:\u002F\u002Fstar-history.com\u002F#gpt-omni\u002Fmini-omni&Date)\r\n","Mini-Omni 是一个开源的多模态大型语言模型，能够实时处理语音输入和输出，实现边听边说边思考的对话能力。其核心功能包括实时端到端的语音到语音交流、同步生成文本和音频以及流式音频输出，无需额外的自动语音识别或文本转语音模型。该模型适用于需要即时语音交互的应用场景，如智能助手、客户服务机器人等。使用 Python 编写，并在 MIT 许可下发布，项目已获得 3546 个星标和 311 个分叉，表明其在社区中的受欢迎程度和技术价值。",2,"2026-06-11 03:41:07","high_star"]