[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1463":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":36,"discoverSource":37},1463,"MockingBird","babysor\u002FMockingBird","babysor","🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time","",null,"Python",36904,5209,299,477,0,2,7,24,6,45,"Other",false,"main",true,[27,28,29,30,31,32],"ai","deep-learning","pytorch","speech","text-to-speech","tts","2026-06-12 02:00:28","> 🚧 While I no longer actively update this repo, you can find me continuously pushing this tech forward to good side and open-source. I'm also building an optimized and cloud hosted version: https:\u002F\u002Fnoiz.ai\u002F and [we're hiring](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F1029).\n>\n![mockingbird](https:\u002F\u002Fuser-images.githubusercontent.com\u002F12797292\u002F131216767-6eb251d6-14fc-4951-8324-2722f0cd4c63.jpg)\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F3869\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Ftrendshift.io\u002Fapi\u002Fbadge\u002Frepositories\u002F3869\" alt=\"babysor%2FMockingBird | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n[![MIT License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg?style=flat)](http:\u002F\u002Fchoosealicense.com\u002Flicenses\u002Fmit\u002F)\n\n> English | [中文](README-CN.md)| [中文Linux](README-LINUX-CN.md)\n\n## Features\n🌍 **Chinese** supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc.\n\n🤩 **PyTorch** worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060\n\n🌍 **Windows + Linux** run in both Windows OS and linux OS (even in M1 MACOS)\n\n🤩 **Easy & Awesome** effect with only newly-trained synthesizer, by reusing the pretrained encoder\u002Fvocoder\n\n🌍 **Webserver Ready** to serve your result with remote calling\n\n### [DEMO VIDEO](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV17Q4y1B7mY\u002F)\n\n## Quick Start\n\n### 1. Install Requirements\n#### 1.1 General Setup\n> Follow the original repo to test if you got all environment ready.\n**Python 3.7 or higher ** is needed to run the toolbox.\n\n* Install [PyTorch](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F).\n> If you get an `ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu102 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2 )` This error is probably due to a low version of python, try using 3.9 and it will install successfully\n* Install [ffmpeg](https:\u002F\u002Fffmpeg.org\u002Fdownload.html#get-packages).\n* Run `pip install -r requirements.txt` to install the remaining necessary packages.\n> The recommended environment here is `Repo Tag 0.0.1` `Pytorch1.9.0 with Torchvision0.10.0 and cudatoolkit10.2` `requirements.txt` `webrtcvad-wheels` because `requirements. txt` was exported a few months ago, so it doesn't work with newer versions\n* Install webrtcvad `pip install webrtcvad-wheels`(If you need)\n\nor\n- install dependencies with `conda` or `mamba`\n\n  ```conda env create -n env_name -f env.yml```\n\n  ```mamba env create -n env_name -f env.yml```\n\n  will create a virtual environment where necessary dependencies are installed. Switch to the new environment by `conda activate env_name` and enjoy it.\n  > env.yml only includes the necessary dependencies to run the project，temporarily without monotonic-align. You can check the official website to install the GPU version of pytorch.\n\n#### 1.2 Setup with a M1 Mac\n> The following steps are a workaround to directly use the original `demo_toolbox.py`without the changing of codes.\n>\n  >  Since the major issue comes with the PyQt5 packages used in `demo_toolbox.py` not compatible with M1 chips, were one to attempt on training models with the M1 chip, either that person can forgo `demo_toolbox.py`, or one can try the `web.py` in the project.\n\n##### 1.2.1 Install `PyQt5`, with [ref](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F68038451\u002F20455983) here.\n  * Create and open a Rosetta Terminal, with [ref](https:\u002F\u002Fdev.to\u002Fcourier\u002Ftips-and-tricks-to-setup-your-apple-m1-for-development-547g) here.\n  * Use system Python to create a virtual environment for the project\n    ```\n    \u002Fusr\u002Fbin\u002Fpython3 -m venv \u002FPathToMockingBird\u002Fvenv\n    source \u002FPathToMockingBird\u002Fvenv\u002Fbin\u002Factivate\n    ```\n  * Upgrade pip and install `PyQt5`\n    ```\n    pip install --upgrade pip\n    pip install pyqt5\n    ```\n##### 1.2.2 Install `pyworld` and `ctc-segmentation`\n\n> Both packages seem to be unique to this project and are not seen in the original [Real-Time Voice Cloning](https:\u002F\u002Fgithub.com\u002FCorentinJ\u002FReal-Time-Voice-Cloning) project. When installing with `pip install`, both packages lack wheels so the program tries to directly compile from c code and could not find `Python.h`.\n\n  * Install `pyworld`\n      * `brew install python` `Python.h` can come with Python installed by brew\n      * `export CPLUS_INCLUDE_PATH=\u002Fopt\u002Fhomebrew\u002FFrameworks\u002FPython.framework\u002FHeaders` The filepath of brew-installed `Python.h` is unique to M1 MacOS and listed above. One needs to manually add the path to the environment variables.\n      * `pip install pyworld` that should do.\n\n\n  * Install`ctc-segmentation`\n    > Same method does not apply to `ctc-segmentation`, and one needs to compile it from the source code on [github](https:\u002F\u002Fgithub.com\u002Flumaku\u002Fctc-segmentation).\n    * `git clone https:\u002F\u002Fgithub.com\u002Flumaku\u002Fctc-segmentation.git`\n    * `cd ctc-segmentation`\n    * `source \u002FPathToMockingBird\u002Fvenv\u002Fbin\u002Factivate` If the virtual environment hasn't been deployed, activate it.\n    * `cythonize -3 ctc_segmentation\u002Fctc_segmentation_dyn.pyx`\n    * `\u002Fusr\u002Fbin\u002Farch -x86_64 python setup.py build` Build with x86 architecture.\n    * `\u002Fusr\u002Fbin\u002Farch -x86_64 python setup.py install --optimize=1 --skip-build`Install with x86 architecture.\n\n##### 1.2.3 Other dependencies\n  * `\u002Fusr\u002Fbin\u002Farch -x86_64 pip install torch torchvision torchaudio` Pip installing `PyTorch` as an example, articulate that it's installed with x86 architecture\n  * `pip install ffmpeg`  Install ffmpeg\n  * `pip install -r requirements.txt` Install other requirements.\n\n##### 1.2.4 Run the Inference Time (with Toolbox)\n  > To run the project on x86 architecture. [ref](https:\u002F\u002Fyoutrack.jetbrains.com\u002Fissue\u002FPY-46290\u002FAllow-running-Python-under-Rosetta-2-in-PyCharm-for-Apple-Silicon).\n  * `vim \u002FPathToMockingBird\u002Fvenv\u002Fbin\u002FpythonM1` Create an executable file `pythonM1` to condition python interpreter at `\u002FPathToMockingBird\u002Fvenv\u002Fbin`.\n  * Write in the following content:\n    ```\n    #!\u002Fusr\u002Fbin\u002Fenv zsh\n    mydir=${0:a:h}\n    \u002Fusr\u002Fbin\u002Farch -x86_64 $mydir\u002Fpython \"$@\"\n    ```\n  * `chmod +x pythonM1` Set the file as executable.\n  * If using PyCharm IDE, configure project interpreter to `pythonM1`([steps here](https:\u002F\u002Fwww.jetbrains.com\u002Fhelp\u002Fpycharm\u002Fconfiguring-python-interpreter.html#add-existing-interpreter)), if using command line python, run `\u002FPathToMockingBird\u002Fvenv\u002Fbin\u002FpythonM1 demo_toolbox.py`\n\n\n### 2. Prepare your models\n> Note that we are using the pretrained encoder\u002Fvocoder but not synthesizer, since the original model is incompatible with the Chinese symbols. It means the demo_cli is not working at this moment, so additional synthesizer models are required.\n\nYou can either train your models or use existing ones:\n\n#### 2.1 Train encoder with your dataset (Optional)\n\n* Preprocess with the audios and the mel spectrograms:\n`python encoder_preprocess.py \u003Cdatasets_root>` Allowing parameter `--dataset {dataset}` to support the datasets you want to preprocess. Only the train set of these datasets will be used. Possible names: librispeech_other, voxceleb1, voxceleb2. Use comma to sperate multiple datasets.\n\n* Train the encoder: `python encoder_train.py my_run \u003Cdatasets_root>\u002FSV2TTS\u002Fencoder`\n> For training, the encoder uses visdom. You can disable it with `--no_visdom`, but it's nice to have. Run \"visdom\" in a separate CLI\u002Fprocess to start your visdom server.\n\n#### 2.2 Train synthesizer with your dataset\n* Download dataset and unzip: make sure you can access all .wav in folder\n* Preprocess with the audios and the mel spectrograms:\n`python pre.py \u003Cdatasets_root>`\nAllowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata, aishell3, data_aishell, etc.If this parameter is not passed, the default dataset will be aidatatang_200zh.\n\n* Train the synthesizer:\n`python train.py --type=synth mandarin \u003Cdatasets_root>\u002FSV2TTS\u002Fsynthesizer`\n\n* Go to next step when you see attention line show and loss meet your need in training folder *synthesizer\u002Fsaved_models\u002F*.\n\n#### 2.3 Use pretrained model of synthesizer\n> Thanks to the community, some models will be shared:\n\n| author | Download link | Preview Video | Info |\n| --- | ----------- | ----- |----- |\n| @author | https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1iONvRxmkI-t1nHqxKytY3g  [Baidu](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1iONvRxmkI-t1nHqxKytY3g) 4j5d  |  | 75k steps trained by multiple datasets\n| @author | https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1fMh9IlgKJlL2PIiRTYDUvw  [Baidu](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1fMh9IlgKJlL2PIiRTYDUvw) code：om7f  |  | 25k steps trained by multiple datasets, only works under version 0.0.1\n|@FawenYo | https:\u002F\u002Fyisiou-my.sharepoint.com\u002F:u:\u002Fg\u002Fpersonal\u002Flawrence_cheng_fawenyo_onmicrosoft_com\u002FEWFWDHzee-NNg9TWdKckCc4BC7bK2j9cCbOWn0-_tK0nOg?e=n0gGgC  | [input](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fwiki\u002Faudio\u002Fself_test.mp3) [output](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fwiki\u002Faudio\u002Fexport.wav) | 200k steps with local accent of Taiwan, only works under version 0.0.1\n|@miven| https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1PI-hM3sn5wbeChRryX-RCQ code: 2021 https:\u002F\u002Fwww.aliyundrive.com\u002Fs\u002FAwPsbo8mcSP code: z2m0 | https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1uh411B7AD\u002F | only works under version 0.0.1\n\n#### 2.4 Train vocoder (Optional)\n> note: vocoder has little difference in effect, so you may not need to train a new one.\n* Preprocess the data:\n`python vocoder_preprocess.py \u003Cdatasets_root> -m \u003Csynthesizer_model_path>`\n> `\u003Cdatasets_root>` replace with your dataset root，`\u003Csynthesizer_model_path>`replace with directory of your best trained models of sythensizer, e.g. *sythensizer\\saved_mode\\xxx*\n\n* Train the wavernn vocoder:\n`python vocoder_train.py mandarin \u003Cdatasets_root>`\n\n* Train the hifigan vocoder\n`python vocoder_train.py mandarin \u003Cdatasets_root> hifigan`\n\n### 3. Launch\n#### 3.1 Using the web server\nYou can then try to run:`python web.py` and open it in browser, default as `http:\u002F\u002Flocalhost:8080`\n\n#### 3.2 Using the Toolbox\nYou can then try the toolbox:\n`python demo_toolbox.py -d \u003Cdatasets_root>`\n\n#### 3.3 Using the command line\nYou can then try the command:\n`python gen_voice.py \u003Ctext_file.txt> your_wav_file.wav`\nyou may need to install cn2an by \"pip install cn2an\" for better digital number result.\n\n## Reference\n> This repository is forked from [Real-Time-Voice-Cloning](https:\u002F\u002Fgithub.com\u002FCorentinJ\u002FReal-Time-Voice-Cloning) which only support English.\n\n| URL | Designation | Title | Implementation source |\n| --- | ----------- | ----- | --------------------- |\n| [1803.09017](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.09017) | GlobalStyleToken (synthesizer)| Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis | This repo |\n| [2010.05646](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05646) | HiFi-GAN (vocoder)| Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | This repo |\n| [2106.02297](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.02297) | Fre-GAN (vocoder)| Fre-GAN: Adversarial Frequency-consistent Audio Synthesis | This repo |\n|[**1806.04558**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo |\n|[1802.08435](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord\u002FWaveRNN](https:\u002F\u002Fgithub.com\u002Ffatchord\u002FWaveRNN) |\n|[1703.10135](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord\u002FWaveRNN](https:\u002F\u002Fgithub.com\u002Ffatchord\u002FWaveRNN)\n|[1710.10467](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | This repo |\n\n## F Q&A\n#### 1.Where can I download the dataset?\n| Dataset | Original Source | Alternative Sources |\n| --- | ----------- | ---------------|\n| aidatatang_200zh | [OpenSLR](http:\u002F\u002Fwww.openslr.org\u002F62\u002F) | [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F110A11KZoVe7vy6kXlLb6zVPLb_J91I_t\u002Fview?usp=sharing) |\n| magicdata | [OpenSLR](http:\u002F\u002Fwww.openslr.org\u002F68\u002F) | [Google Drive (Dev set)](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1g5bWRUSNH68ycC6eNvtwh07nX3QhOOlo\u002Fview?usp=sharing) |\n| aishell3 | [OpenSLR](https:\u002F\u002Fwww.openslr.org\u002F93\u002F) | [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1shYp_o4Z0X0cZSKQDtFirct2luFUwKzZ\u002Fview?usp=sharing) |\n| data_aishell | [OpenSLR](https:\u002F\u002Fwww.openslr.org\u002F33\u002F) |  |\n> After unzip aidatatang_200zh, you need to unzip all the files under `aidatatang_200zh\\corpus\\train`\n\n#### 2.What is`\u003Cdatasets_root>`?\nIf the dataset path is `D:\\data\\aidatatang_200zh`,then `\u003Cdatasets_root>` is`D:\\data`\n\n#### 3.Not enough VRAM\nTrain the synthesizer：adjust the batch_size in `synthesizer\u002Fhparams.py`\n```\n\u002F\u002FBefore\ntts_schedule = [(2,  1e-3,  20_000,  12),   # Progressive training schedule\n                (2,  5e-4,  40_000,  12),   # (r, lr, step, batch_size)\n                (2,  2e-4,  80_000,  12),   #\n                (2,  1e-4, 160_000,  12),   # r = reduction factor (# of mel frames\n                (2,  3e-5, 320_000,  12),   #     synthesized for each decoder iteration)\n                (2,  1e-5, 640_000,  12)],  # lr = learning rate\n\u002F\u002FAfter\ntts_schedule = [(2,  1e-3,  20_000,  8),   # Progressive training schedule\n                (2,  5e-4,  40_000,  8),   # (r, lr, step, batch_size)\n                (2,  2e-4,  80_000,  8),   #\n                (2,  1e-4, 160_000,  8),   # r = reduction factor (# of mel frames\n                (2,  3e-5, 320_000,  8),   #     synthesized for each decoder iteration)\n                (2,  1e-5, 640_000,  8)],  # lr = learning rate\n```\n\nTrain Vocoder-Preprocess the data：adjust the batch_size in `synthesizer\u002Fhparams.py`\n```\n\u002F\u002FBefore\n### Data Preprocessing\n        max_mel_frames = 900,\n        rescale = True,\n        rescaling_max = 0.9,\n        synthesis_batch_size = 16,                  # For vocoder preprocessing and inference.\n\u002F\u002FAfter\n### Data Preprocessing\n        max_mel_frames = 900,\n        rescale = True,\n        rescaling_max = 0.9,\n        synthesis_batch_size = 8,                  # For vocoder preprocessing and inference.\n```\n\nTrain Vocoder-Train the vocoder：adjust the batch_size in `vocoder\u002Fwavernn\u002Fhparams.py`\n```\n\u002F\u002FBefore\n# Training\nvoc_batch_size = 100\nvoc_lr = 1e-4\nvoc_gen_at_checkpoint = 5\nvoc_pad = 2\n\n\u002F\u002FAfter\n# Training\nvoc_batch_size = 6\nvoc_lr = 1e-4\nvoc_gen_at_checkpoint = 5\nvoc_pad =2\n```\n\n#### 4.If it happens `RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).`\nPlease refer to issue [#37](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F37)\n\n#### 5. How to improve CPU and GPU occupancy rate?\nAdjust the batch_size as appropriate to improve\n\n\n#### 6. What if it happens `the page file is too small to complete the operation`\nPlease refer to this [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Oh6dga-Oy10&ab_channel=CodeProf) and change the virtual memory to 100G (102400), for example : When the file is placed in the D disk, the virtual memory of the D disk is changed.\n\n#### 7. When should I stop during training?\nFYI, my attention came after 18k steps and loss became lower than 0.4 after 50k steps.\n![attention_step_20500_sample_1](https:\u002F\u002Fuser-images.githubusercontent.com\u002F7423248\u002F128587252-f669f05a-f411-4811-8784-222156ea5e9d.png)\n![step-135500-mel-spectrogram_sample_1](https:\u002F\u002Fuser-images.githubusercontent.com\u002F7423248\u002F128587255-4945faa0-5517-46ea-b173-928eff999330.png)\n","MockingBird 是一个能够快速克隆人声并在实时生成任意语音的项目。它基于深度学习技术，特别是PyTorch框架，支持中文普通话等多种语言的数据集，并且在Windows、Linux（包括M1芯片的MacOS）上都能运行。项目通过复用预训练的编码器\u002F解码器简化了新模型的训练过程，同时提供了一个易于部署的Web服务器接口来远程调用生成结果。适合需要个性化语音合成的应用场景，如虚拟助手、在线教育等。尽管原作者不再积极维护此仓库，但相关技术仍在持续发展中。","2026-06-11 02:43:57","top_all"]