[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72190":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":38,"readmeContent":39,"aiSummary":40,"trendingCount":16,"starSnapshotCount":16,"syncStatus":41,"lastSyncTime":42,"discoverSource":43},72190,"ClearerVoice-Studio","modelscope\u002FClearerVoice-Studio","modelscope","An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.","",null,"Python",4229,347,40,81,0,14,35,91,42,29.62,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37],"audio","bandwidth-extension","deep-learning","noise-suppression","pytorch","speaker-extraction","speech","speech-enhancement","speech-quality-evaluation","speech-separation","speech-super-resolution","2026-06-12 02:02:59","\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa4ccbc60-5248-4dca-8cec-09a6385c6d0f\" width=\"768\" height=\"192\">\n\u003C\u002Fdiv>\n\n\u003Cstrong>ClearerVoice-Studio\u003C\u002Fstrong> is an open-source, AI-powered speech processing toolkit designed for researchers, developers, and end-users. It provides capabilities of speech enhancement, speech separation, speech super-resolution, target speaker extraction, and more. The toolkit provides state-of-the-art pre-trained models, along with training and inference scripts, all accessible from this repository.\n \n#### 👉🏻[HuggingFace Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Falibabasglab\u002FClearVoice)👈🏻  | 👉🏻[ModelScope Demo](https:\u002F\u002Fmodelscope.cn\u002Fstudios\u002Fiic\u002FClearerVoice-Studio) ｜ 👉🏻[SpeechScore Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Falibabasglab\u002FSpeechScore)👈🏻 ｜ 👉🏻[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.19398)👈🏻 \n\n---\n![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmodelscope\u002FClearerVoice-Studio) Please leave your ⭐ on our GitHub to support this community project！\n\n记得点击右上角的星星⭐来支持我们一下，您的支持是我们更新模型的最大动力！\n\n## News :fire:\n- Upcoming: More tasks will be added to ClearVoice.\n- [2025.6] Add an interface for [ClearVoice](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Fclearvoice) that allows passing a Numpy array into the model and receiving its output as a NumPy array. It allows a more flexible call of the models during a training or inference pipeline. Please check out [`demo_Numpy2Numpy.py`](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Fblob\u002Fmain\u002Fclearvoice\u002Fdemo_Numpy2Numpy.py).\n- [2025.5] Updated speechscore with more non-intrusive metrics: NISQA and DISTILL_MOS\n- [2025.4] Updated pip installation for [ClearVoice](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Fclearvoice). Now you can simply type `pip install clearvoice` to use all the pretrained models in ClearVoice, see project description in PyPi [link](https:\u002F\u002Fpypi.org\u002Fproject\u002Fclearvoice\u002F).\n- [2025.4] Added a training script for speech super-resolution, supporting both retraining and fine-tuning of models. For details, refer to the documentation [here](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Ftrain\u002Fspeech_super_resolution).\n- [2025.4] Added data generation scripts for training\u002Ffinetuning speech enhancement models. The scripts generate either noisy speech or noisy-reverberant speech. Please check [here](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Ftrain\u002Fdata_generation\u002Fspeech_enhancement).\n- [2025.1] ClearVoice demo is ready for try on both [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Falibabasglab\u002FClearVoice) and [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fstudios\u002Fiic\u002FClearerVoice-Studio). However, HuggingFace has limited GPU usage, and ModelScope has more GPU usage quota.\n- [2025.1] ClearVoice now offers **speech super-resolution**, also known as bandwidth extension. This feature improves the perceptual quality of speech by converting low-resolution audio (with an effective sampling rate of at least 16,000 Hz) into high-resolution audio with a sampling rate of 48,000 Hz. A full upscaled **LJSpeech-1.1-48kHz dataset** can be downloaded from [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Falibabasglab\u002FLJSpeech-1.1-48kHz) and [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fdatasets\u002Fiic\u002FLJSpeech-1.1-48kHz).\n- [2025.1] ClearVoice now supports more audio formats including **\"wav\", \"aac\", \"ac3\", \"aiff\", \"flac\", \"m4a\", \"mp3\", \"ogg\", \"opus\", \"wma\", \"webm\"**, etc. It also supports both mono and stereo channels with 16-bit or 32-bit precisions. A latest version of [ffmpeg](https:\u002F\u002Fgithub.com\u002FFFmpeg\u002FFFmpeg) is required for audio codecs.  \n- [2024.12] Upload pre-trained models on ModelScope. User now can download the models from either [ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002Fiic\u002FClearerVoice-Studio\u002Fsummary) or [Huggingface](https:\u002F\u002Fhuggingface.co\u002Falibabasglab)  \n- [2024.11] Our FRCRN speech denoiser has been used over **3.0 million** times on [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fiic\u002Fspeech_frcrn_ans_cirm_16k)\n- [2024.11] Our MossFormer speech separator has been used over **2.5 million** times on [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fiic\u002Fspeech_mossformer_separation_temporal_8k)\n- [2024.11] Release of this repository\n\n### 🌟 Why Choose ClearerVoice-Studio?\n\n- **Pre-Trained Models:** Includes cutting-edge pre-trained models, fine-tuned on extensive, high-quality datasets. No need to start from scratch!\n- **Ease of Use:** Designed for seamless integration with your projects, offering a simple yet flexible interface for inference and training.\n- **Comprehensive Features:** Combines advanced algorithms for multiple speech processing tasks in one platform.\n- **Community-Driven:** Built for researchers, developers, and enthusiasts to collaborate and innovate together.\n\n## Contents of this repository\nThis repository is organized into three main components: **[ClearVoice](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Fclearvoice)**, **[Train](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Ftrain)**, and **[SpeechScore](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Fspeechscore)**.\n\n### 1. **ClearVoice [[Readme](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Fblob\u002Fmain\u002Fclearvoice\u002FREADME.md)][[文档](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Fblob\u002Fmain\u002Fclearvoice\u002FREADME.md)]**  \nClearVoice offers a user-friendly  solution for speech processing tasks such as speech denoising, separation, super-resolution, audio-visual target speaker extraction, and more. It is designed as a unified inference platform leveraged pre-trained models (e.g., [FRCRN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.07293), [MossFormer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.11824)), all trained on extensive datasets. If you're looking for a tool to improve speech quality, ClearVoice is the perfect choice. Simply click on [`ClearVoice`](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Fclearvoice) and follow our detailed instructions to get started.\n\n### 2. **Train**  \nFor advanced researchers and developers, we provide model finetune and training scripts for all the tasks offerred in ClearVoice and more:\n\n- **Task 1: [Speech enhancement](train\u002Fspeech_enhancement)** (16kHz & 48kHz)\n- **Task 2: [Speech separation](train\u002Fspeech_separation)** (8kHz & 16kHz)\n- **Task 2: [Speech super-resolution](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Ftrain\u002Fspeech_super_resolution)** (48kHz) \n- **Task 4: [Target speaker extraction](train\u002Ftarget_speaker_extraction)** \n  - **Sub-Task 1: Audio-only Speaker Extraction Conditioned on a Reference Speech** (8kHz)\n  - **Sub-Task 2: Audio-visual Speaker Extraction Conditioned on Face (Lip) Recording** (16kHz)\n  - **Sub-Task 3: Audio-visual Speaker Extraction Conditioned on Body Gestures** (16kHz)\n  - **Sub-Task 4: Neuro-steered Speaker Extraction Conditioned on EEG Signals** (16kHz)\n\nContributors are welcomed to include more model architectures and tasks!\n\n### 3. **SpeechScore [[Readme](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Fblob\u002Fmain\u002Fspeechscore\u002FREADME.md)][[文档](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Fblob\u002Fmain\u002Fspeechscore\u002FREADME.md)]**  \n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FClearerVoice-Studio\u002Ftree\u002Fmain\u002Fspeechscore\">`SpeechScore`\u003Ca\u002F> is a speech quality assessment toolkit. We include it here to evaluate different model performance. SpeechScore includes many popular speech metrics:\n\n- Signal-to-Noise Ratio (SNR)\n- Perceptual Evaluation of Speech Quality (PESQ)\n- Short-Time Objective Intelligibility (STOI)\n- Deep Noise Suppression Mean Opinion Score (DNSMOS)\n- Scale-Invariant Signal-to-Distortion Ratio (SI-SDR)\n- and many more quality benchmarks  \n  \n## Contact\nIf you have any comments or questions about ClearerVoice-Studio, feel free to raise an issue in this repository or contact us directly at:\n- email: {shengkui.zhao, zexu.pan}@alibaba-inc.com\n\nAlternatively, welcome to join our DingTalk group to share and discuss algorithms, technology, and user experience feedback. You may scan the following QR codes to join our official chat group. \n\n\u003Cp align=\"center\">\n  \u003Ctable>\n    \u003Ctr>\n      \u003Ctd style=\"text-align:center;\">\n        \u003Ca href=\".\u002Fasset\u002FQR.jpg\">\u003Cimg alt=\"ClearVoice in DingTalk\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FClearVoice-DingTalk-d9d9d9\">\u003C\u002Fa>\n      \u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n       \u003Ctd style=\"text-align:center;\">\n      \u003Cimg alt=\"Light\" src=\".\u002Fasset\u002Fdingtalk.png\" width=\"68%\" \u002F>\n      \u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftable>\n\u003C\u002Fp>\n \n## Friend Links\nCheckout some awesome Github repositories from Speech Lab of Institute for Intelligent Computing, Alibaba Group.\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FFunAudioLLM\u002FInspireMusic\" target=\"_blank\">\n        \u003Cimg alt=\"Demo\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRepo | Space-InspireMusic?labelColor=&label=InspireMusic&color=green\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FFunASR\" target=\"_blank\">\n        \u003Cimg alt=\"Github\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRepo | Space-FunASR?labelColor=&label=FunASR&color=green\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FFunAudioLLM\" target=\"_blank\">\n        \u003Cimg alt=\"Demo\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRepo | Space-FunAudioLLM?labelColor=&label=FunAudioLLM&color=green\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmodelscope\u002F3D-Speaker\" target=\"_blank\">\n        \u003Cimg alt=\"Demo\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRepo | Space-3DSpeaker?labelColor=&label=3D-Speaker&color=green\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n## Acknowledge\nClearerVoice-Studio contains third-party components and code modified from some open-source repos, including: \u003Cbr>\n[Speechbrain](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain), [ESPnet](https:\u002F\u002Fgithub.com\u002Fespnet), [TalkNet-ASD\n](https:\u002F\u002Fgithub.com\u002FTaoRuijie\u002FTalkNet-ASD)\n","ClearerVoice-Studio 是一个基于人工智能的语音处理工具包，提供语音增强、分离、超分辨率和目标说话人提取等功能。该项目使用 Python 语言开发，依托深度学习技术特别是 PyTorch 框架，集成了一系列最先进的预训练模型，并附带了详细的训练与推理脚本，方便用户快速上手。ClearerVoice-Studio 适用于需要改善音频质量的各种场景，如会议系统中的噪音抑制、多人对话环境下的声音分离等，同时也为研究人员提供了丰富的实验资源。此外，项目还支持通过 pip 安装以及 Numpy 数组形式的数据输入输出接口，进一步提升了使用的灵活性与便捷性。",2,"2026-06-11 03:40:46","high_star"]