[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83839":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":12,"contributorsCount":12,"subscribersCount":12,"size":12,"stars1d":12,"stars7d":12,"stars30d":12,"stars90d":12,"forks30d":12,"starsTrendScore":12,"compositeScore":14,"rankGlobal":9,"rankLanguage":9,"license":15,"archived":16,"fork":16,"defaultBranch":17,"hasWiki":18,"hasPages":16,"topics":19,"createdAt":9,"pushedAt":9,"updatedAt":20,"readmeContent":21,"aiSummary":9,"trendingCount":12,"starSnapshotCount":12,"syncStatus":22,"lastSyncTime":23,"discoverSource":24},83839,"tachidubb","tachikomared\u002Ftachidubb","tachikomared","Local agent-controllable AI video dubbing — YouTube URL in, voice-cloned dub in 28 languages out. No cloud, no API keys.",null,"Python",55,0,157,40,"MIT License",false,"master",true,[],"2026-06-12 04:01:42","\u003Cdiv align=\"center\">\n\n# 🎙️ TachiDUBB Studio\n\n**Local, agent-controllable AI video dubbing.**\nYouTube link in → voice-cloned dub in 28 languages out. No cloud, no per-minute fees, no upload of your face to anyone's server.\n\n*by [@smolekoma](https:\u002F\u002Fx.com\u002Fsmolekoma) and [@smolemaru](https:\u002F\u002Fx.com\u002Fsmolemaru) &mdash; built with [Claude Opus 4.7](https:\u002F\u002Fclaude.ai)*\n\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-blue.svg)](LICENSE)\n[![Python 3.10+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n[![CUDA 12.0+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCUDA-12.0+-76B900.svg)](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads)\n[![MCP enabled](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMCP-enabled-7B61FF.svg)](https:\u002F\u002Fmodelcontextprotocol.io)\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTachikomaRed\u002Ftachidubb?style=social)](https:\u002F\u002Fgithub.com\u002FTachikomaRed\u002Ftachidubb\u002Fstargazers)\n\n[**Quickstart**](#-30-second-quickstart) ·\n[**Demo**](#-demo) ·\n[**MCP \u002F Agent use**](#-agent-control-mcp--cli) ·\n[**Languages**](#-supported-languages) ·\n[**FAQ**](#-faq) ·\n[**Troubleshooting**](#-troubleshooting)\n\n![demo](docs\u002Fdemo.gif)\n\n\u003C\u002Fdiv>\n\n---\n\n## ✨ Why TachiDUBB\n\n| | TachiDUBB | ElevenLabs Dubbing | Heygen | Rask |\n|---|---|---|---|---|\n| **Cost** | Free (your GPU) | $0.30\u002Fmin and up | $0.15+\u002Fmin | $0.07+\u002Fmin |\n| **Runs offline** | ✅ 100% local | ❌ cloud | ❌ cloud | ❌ cloud |\n| **Voice cloning** | ✅ VoxCPM2 | ✅ | ✅ | ✅ |\n| **Languages** | 28 | 29 | 40+ | 130+ |\n| **Multi-speaker diarization** | ✅ (pyannote) | ✅ | ✅ | ✅ |\n| **Background music preservation** | ✅ (audio-separator) | ✅ | ✅ | ✅ |\n| **YouTube URL → MP4** | ✅ in one step | ❌ | ❌ | ❌ |\n| **Stitched multilingual reel** | ✅ built-in | ❌ | ❌ | ❌ |\n| **MCP \u002F agent control** | ✅ first-class | ❌ | ❌ | ❌ |\n| **Open source** | ✅ MIT | ❌ | ❌ | ❌ |\n| **No upload of your data** | ✅ | ❌ | ❌ | ❌ |\n| **API key required** | ❌ none | ✅ paid | ✅ paid | ✅ paid |\n\nIf you're dubbing a 10-minute video weekly across 5 languages, this saves you about **$1,800\u002Fyear** vs cloud tools — and the dub never leaves your machine.\n\n---\n\n## 🚀 30-second quickstart\n\n### Windows (one click)\n\n```text\n1. Clone or unzip the repo\n2. Double-click install.bat   ← installs everything (~5-10 min)\n3. Double-click start.bat     ← browser opens at http:\u002F\u002Flocalhost:8910\n4. Paste YouTube URL → pick language → Start\n```\n\n### Linux \u002F macOS\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FTachikomaRed\u002Ftachidubb && cd tachidubb\nchmod +x install.sh\n.\u002Finstall.sh    # installs everything + creates start.sh\n.\u002Fstart.sh\n```\n\nFirst dubbing run downloads the VoxCPM2 model (~5 GB) — one time.\n\n---\n\n## 🤖 Agent control (MCP + CLI)\n\nThis is what makes TachiDUBB different. You don't have to touch the UI to use it.\n\n### Tell Claude Code (or any MCP-aware agent) what you want\n\n```text\nYou:    Dub https:\u002F\u002Fyoutu.be\u002Fabc into French, Spanish and Japanese,\n        then stitch them into one 60-second showcase reel.\n\nClaude: [calls tachidubb_showcase(...)]\n        [polls tachidubb_get_showcase(...)]\n        Done — http:\u002F\u002Flocalhost:8910\u002Foutputs\u002Fshowcase_sc_2f1a...\u002Fshowcase.mp4\n```\n\nAdd the MCP server in 10 seconds:\n\n```bash\nclaude mcp add tachidubb python \u002Fpath\u002Fto\u002Ftachidubb\u002Ftools\u002Ftachidubb_mcp.py\n```\n\nOr paste into `~\u002F.claude.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"tachidubb\": {\n      \"command\": \"\u002Fpath\u002Fto\u002Ftachidubb\u002Fvenv\u002FScripts\u002Fpython.exe\",\n      \"args\": [\"\u002Fpath\u002Fto\u002Ftachidubb\u002Ftools\u002Ftachidubb_mcp.py\"],\n      \"env\": { \"TACHIDUBB_URL\": \"http:\u002F\u002Flocalhost:8910\" }\n    }\n  }\n}\n```\n\nThe repo ships a Claude Code skill at [`.claude\u002Fskills\u002Ftachidubb\u002FSKILL.md`](.claude\u002Fskills\u002Ftachidubb\u002FSKILL.md). Copy it to `~\u002F.claude\u002Fskills\u002F` and Claude knows when and how to drive the pipeline.\n\n### CLI — works from any shell, any OS, any cron\n\n```bash\n# Single language, blocking\npython tools\u002Ftachidubb_cli.py dub https:\u002F\u002Fyoutu.be\u002Fabc --lang fr --wait\n\n# Compare 5 languages side-by-side\npython tools\u002Ftachidubb_cli.py compare .\u002Fclip.mp4 --langs es,fr,de,ja,pt --trim 60\n\n# Stitched multilingual showcase reel\npython tools\u002Ftachidubb_cli.py showcase https:\u002F\u002Fyoutu.be\u002Fabc \\\n  --langs es,fr,de,ja,pt --trim 60 --wait\n\n# Re-dub an existing job into new languages — skips re-upload\npython tools\u002Ftachidubb_cli.py redub 5038e404 --langs ja,it --mode showcase --wait\n\n# Health, status, history\npython tools\u002Ftachidubb_cli.py system\npython tools\u002Ftachidubb_cli.py jobs --limit 20\npython tools\u002Ftachidubb_cli.py status \u003Cjob_id>\n```\n\nDrive a remote box: `set TACHIDUBB_URL=http:\u002F\u002F192.168.0.10:8910`\n\nSee [`examples\u002F`](examples\u002F) for ready-to-run scripts.\n\n---\n\n## 🎬 Demo\n\n| What | Length | Languages | Time on RTX 3080 Ti |\n|---|---|---|---|\n| Single-speaker YouTube short → French | 60 s | 1 | ~2 min |\n| Compare 5 languages | 60 s × 5 | 5 | ~10-15 min |\n| Showcase reel (stitched) | 60 s | 5 | ~12-18 min |\n| Multi-speaker podcast (diarized) | 5 min | 1 | ~8-10 min |\n\n> 📺 [Watch the full demo](docs\u002Fdemo.mp4) (no audio, ~2 min) — submit a YouTube URL, pick 5 languages, get a stitched showcase reel.\n\n---\n\n## 🏗️ How it works\n\n```\nYouTube URL or local file\n        │\n        ▼\n   yt-dlp ───────────────────────► (downloads source)\n        │\n        ▼\n   FFmpeg ───────────────────────► (extracts audio)\n        │\n        ▼\n  faster-whisper ───────────────► (transcript + word timestamps)\n        │\n        ▼\n   pyannote ─────────────────────► (speaker diarization, optional)\n        │\n        ▼\n   Ollama (Qwen3 \u002F Gemma3 \u002F Aya) ► (translation, length-matched)\n        │\n        ▼\n   VoxCPM2 ──────────────────────► (voice cloning per speaker, 48 kHz)\n        │\n        ▼\n   FFmpeg ───────────────────────► (time-align, mix bg music, render)\n        │\n        ▼\n   Dubbed MP4 + SRT subtitles\n```\n\nEvery step is modular, swappable, and runs on your hardware.\n\n---\n\n## 🌍 Supported languages\n\n28 target languages out of the box (via VoxCPM2 + edge-tts fallback):\n\n| Code | Language |     | Code | Language |     | Code | Language |     | Code | Language |\n|---|---|---|---|---|---|---|---|---|---|---|\n| `en` | English | | `ru` | Russian | | `es` | Spanish | | `fr` | French |\n| `de` | German | | `it` | Italian | | `pt` | Portuguese | | `pl` | Polish |\n| `tr` | Turkish | | `ja` | Japanese | | `ko` | Korean | | `zh` | Chinese |\n| `ar` | Arabic | | `hi` | Hindi | | `nl` | Dutch | | `uk` | Ukrainian |\n| `sv` | Swedish | | `th` | Thai | | `vi` | Vietnamese | | `cs` | Czech |\n| `ro` | Romanian | | `hu` | Hungarian | | `bg` | Bulgarian | | `el` | Greek |\n| `fi` | Finnish | | `id` | Indonesian | | `no` | Norwegian | | `da` | Danish |\n\nSource detection is automatic (Whisper). Translation goes through whatever Ollama model you have — `aya-expanse:8b` is the default for best multilingual quality.\n\n---\n\n## 🖥️ Hardware\n\n| | Minimum | Recommended | Why |\n|---|---|---|---|\n| **VRAM** | 8 GB | 12 GB+ | VoxCPM2 + Whisper + a translation LLM coexist |\n| **RAM** | 16 GB | 32 GB | Audio-separator (background preservation) is hungry |\n| **Disk** | 20 GB | 40 GB+ | Models + outputs |\n| **GPU** | Any CUDA 12.0+ | RTX 30\u002F40 series | CPU fallback works but ~15× slower |\n| **Python** | 3.10–3.12 | 3.11 | |\n| **OS** | Win 10+, Linux, macOS | — | macOS requires CPU mode |\n\nNo GPU? It still runs — just expect long jobs. The pipeline auto-falls back to `edge-tts` (Microsoft cloud TTS) if VoxCPM2 won't load, which sacrifices voice cloning but produces intelligible output fast.\n\n### Disk budget (what gets downloaded)\n\n| Component | Size | When |\n|---|---|---|\n| Python deps (PyTorch + transformers + faster-whisper + ...) | ~4 GB | At `install.bat` \u002F `.\u002Finstall.sh` |\n| FFmpeg + yt-dlp (Windows static build) | ~100 MB | At install |\n| VoxCPM2 model weights | ~5 GB | First dubbing run, cached forever |\n| Whisper `large-v3` weights | ~3 GB | First dubbing run, cached forever |\n| Ollama translation model (e.g. `qwen3:8b`) | ~5 GB | At install (you pick it) |\n| pyannote diarization weights (optional) | ~500 MB | First multi-speaker run |\n| audio-separator UVR weights (optional) | ~250 MB | First background-preserve run |\n\n**Total for full setup: ~18 GB.** Skinny single-language setup without diarization or BGM preservation: ~12 GB.\n\n---\n\n## 🔑 Tokens & API keys\n\n**Required tokens: NONE.** The default install runs 100% offline once dependencies are downloaded. No OpenAI \u002F ElevenLabs \u002F Anthropic key needed — translation is local (Ollama), TTS is local (VoxCPM2), ASR is local (Whisper).\n\n| Token | Required? | What for | Where to get |\n|---|---|---|---|\n| Hugging Face token (`HF_TOKEN`) | Only for multi-speaker diarization | Downloading pyannote diarization weights — gated by free terms-of-use acceptance | [huggingface.co\u002Fsettings\u002Ftokens](https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftokens) — also accept terms at [pyannote\u002Fspeaker-diarization-3.1](https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fspeaker-diarization-3.1) and [pyannote\u002Fsegmentation-3.0](https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fsegmentation-3.0) |\n| YouTube cookies (`YT_DLP_COOKIES_FROM_BROWSER`) | Only for age-restricted \u002F member-only YouTube videos | yt-dlp downloads via your existing browser session | Auto — set to `chrome`, `firefox`, `edge` etc. |\n| OpenAI \u002F ElevenLabs \u002F Anthropic keys | **Never.** | — | — |\n\nWhat \"phones home\" by default:\n- `yt-dlp` reaches YouTube\u002FVimeo\u002Fetc. — only when you submit a URL\n- `huggingface.co` for model downloads — first run only, then cached\n- `ollama.com` for translation model pulls — first install only\n- `edge-tts` for the cloud TTS fallback — only triggers if VoxCPM2 fails to load on your GPU\n\nThere's no telemetry, no analytics, no phone-home from TachiDUBB itself. Audit the network calls: search the repo for `httpx.` \u002F `requests.` — only the integrations above.\n\n---\n\n## ⚙️ Configuration\n\nCopy `.env.example` to `.env` and edit as needed:\n\n```bash\n# Speaker diarization (multi-speaker videos)\nHF_TOKEN=hf_xxxxx                  # from huggingface.co\u002Fsettings\u002Ftokens\n\n# TTS model selection\nVOXCPM_MODEL=openbmb\u002FVoxCPM2       # or openbmb\u002FVoxCPM1.5 (lighter)\nVOXCPM_CFG=2.0                     # 1.5-3.0, higher = closer to reference voice\nVOXCPM_STEPS=10                    # 5-20, lower = faster\n\n# Translation backend\nOLLAMA_URL=http:\u002F\u002Flocalhost:11434\n\n# UI behavior\nTACHIDUBB_OPEN_BROWSER=1           # 0 to disable auto-open\nTACHIDUBB_QA_THRESHOLD=0.4         # stricter (lower) = more re-rolls on bad TTS\n```\n\n### Optional dependencies\n\n| Feature | Install | Notes |\n|---|---|---|\n| Multi-speaker diarization | `pip install pyannote.audio` + HF token | Auto-detects N speakers, clones each |\n| Background music preservation | `pip install audio-separator` | Demuxes vocals, keeps original BGM |\n| Faster Whisper on GPU | (already in requirements) | If CUDA isn't found, falls back to CPU |\n\n---\n\n## 🧠 The agent skill\n\nIf you use Claude Code, copy `.claude\u002Fskills\u002Ftachidubb\u002FSKILL.md` into your global skills folder (`~\u002F.claude\u002Fskills\u002Ftachidubb\u002F`). After that, just say:\n\n- *\"Dub this YouTube short into French and German\"*\n- *\"Make a showcase reel of this clip in 5 languages\"*\n- *\"Re-dub job 5038e404 into Japanese and Italian\"*\n- *\"What's the status of my dub?\"*\n\nThe skill teaches Claude which tool to call, what arguments to use, how to poll, how to recover from errors, and when to suggest a comparison vs a showcase. Read [`SKILL.md`](.claude\u002Fskills\u002Ftachidubb\u002FSKILL.md) for the full trigger map.\n\nWorks with any MCP-compatible agent — Cursor, Cline, Continue, custom agents. The MCP tool schema is auto-discovered.\n\n---\n\n## 🛟 Troubleshooting\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Ollama shows a red dot in the UI\u003C\u002Fb>\u003C\u002Fsummary>\n\nRun `ollama serve` in a separate terminal, or restart the app — `start.bat` auto-starts Ollama. If you've never installed Ollama, the System panel has an install button.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Ollama has no models installed\u003C\u002Fb>\u003C\u002Fsummary>\n\nOpen the System tab → Models → click \"Install\" on `aya-expanse:8b` (best multilingual, ~5 GB) or `qwen3:8b` (good general, ~5 GB). Or from CLI: `ollama pull aya-expanse:8b`.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>YouTube download fails \u002F SSL error\u003C\u002Fb>\u003C\u002Fsummary>\n\nUpdate yt-dlp: `venv\\Scripts\\activate && pip install -U yt-dlp`. If it's an age-restricted or region-blocked video, set `YT_DLP_COOKIES_FROM_BROWSER=chrome` in `.env`. For SSL errors, check firewall\u002FVPN\u002Fcorporate proxy.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>VoxCPM2 runs out of VRAM\u003C\u002Fb>\u003C\u002Fsummary>\n\nThree knobs, easiest first:\n\n1. System tab → switch Whisper to `small` (frees ~3 GB)\n2. `.env` → `VOXCPM_STEPS=6` (faster, less VRAM)\n3. `.env` → `VOXCPM_MODEL=openbmb\u002FVoxCPM1.5` (smaller model, slight quality drop)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Voice sounds like two different people mid-video\u003C\u002Fb>\u003C\u002Fsummary>\n\nThis was a real bug we fixed: in cross-lingual cloning, QA retries were mutating the random seed mid-job, producing different timbres for failed-then-retried segments. Make sure you're on the latest commit — the fix is in `pipeline\u002Ftts_worker.py`.\n\nIf you still hit it: try `VOXCPM_CFG=2.5` (more reference-anchored) or upload a longer, cleaner reference voice in the speaker tab.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>First VoxCPM2 call is slow\u003C\u002Fb>\u003C\u002Fsummary>\n\nNormal. The model downloads ~5 GB on first use; progress is in the terminal. Subsequent runs use the cached weights.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Hugging Face 401 \u002F \"access denied\"\u003C\u002Fb>\u003C\u002Fsummary>\n\nYou need to (1) create a token at https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftokens, (2) accept terms at https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fspeaker-diarization-3.1 (and https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fsegmentation-3.0), (3) put `HF_TOKEN=hf_…` in `.env`.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>No GPU detected even though I have one\u003C\u002Fb>\u003C\u002Fsummary>\n\nVerify CUDA is visible: `python -c \"import torch; print(torch.cuda.is_available())\"`. If it prints `False`, reinstall PyTorch matching your CUDA — see https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F. On Windows make sure you're using the venv Python, not the system one.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Audio is out of sync with video\u003C\u002Fb>\u003C\u002Fsummary>\n\nUsually a duration-mismatch in translation (target language is much longer\u002Fshorter than source). The pipeline time-aligns automatically, but extreme cases (German → Japanese, etc.) can drift. Try:\n\n- Translation prompt is length-aware by default — make sure you didn't disable it in the UI\n- Use a higher-quality translation model (`qwen3:14b` if you have the VRAM)\n- For very long videos, dub in 2-3 minute chunks\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>FFmpeg not found\u003C\u002Fb>\u003C\u002Fsummary>\n\nLinux\u002FmacOS: `sudo apt install ffmpeg` or `brew install ffmpeg`. Windows: the installer downloads a static build into `bin\u002F` automatically — if it failed, re-run `install.bat`.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Showcase reel renders all black \u002F no audio\u003C\u002Fb>\u003C\u002Fsummary>\n\nUsually one of the child dubs failed silently. `python tools\u002Ftachidubb_cli.py showcase-status \u003Cbatch_id>` shows which language failed. Rerun with `tachidubb showcase-rebuild \u003Cbatch_id>` after fixing the failing job — it skips re-dubbing the successful ones.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Background-preserve toggle does nothing\u003C\u002Fb>\u003C\u002Fsummary>\n\nInstall the optional dep: `pip install audio-separator`. The UI shows a yellow warning if it's missing. First demux is slow (~30 s on GPU); subsequent ones are cached.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Linux ALSA \u002F pulse errors during TTS\u003C\u002Fb>\u003C\u002Fsummary>\n\nWe don't play audio — these are warnings from a transitive dep. Ignore unless they actually break the run. `export ALSA_CARD=-1` silences them.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>The server is on a different machine — how do I point the CLI at it?\u003C\u002Fb>\u003C\u002Fsummary>\n\n`export TACHIDUBB_URL=http:\u002F\u002F192.168.0.10:8910` (or set `TACHIDUBB_URL` in your MCP config `env` block). The CLI and MCP server respect the same variable.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>How do I run it headless \u002F on a server?\u003C\u002Fb>\u003C\u002Fsummary>\n\n`python server.py --host 0.0.0.0 --port 8910` and point your browser (or CLI \u002F MCP) at it. Make sure port 8910 is accessible. There's no auth out of the box — put it behind nginx\u002FTailscale\u002FCloudflare Tunnel if exposed publicly.\n\n\u003C\u002Fdetails>\n\n---\n\n## ❓ FAQ\n\n**Is this really free?**\nYes. MIT licensed. The only \"cost\" is your electricity and GPU. No telemetry, no phone-home.\n\n**Do I need an NVIDIA GPU?**\nFor reasonable speeds, yes. CPU works but a 1-minute dub takes ~30 minutes instead of ~2.\n\n**Does it work on Apple Silicon (M1\u002FM2\u002FM3)?**\nYes via CPU + MPS fallback. Expect about 4-8× slower than a discrete GPU. PyTorch MPS support for VoxCPM2 is experimental — `edge-tts` fallback is reliable.\n\n**Can I voice-clone a specific person?**\nYes — drop a 5-30 second clean WAV\u002FMP3 into `presets\u002Fvoices\u002F` and pick it as the reference. Please don't do this without that person's consent. See [SECURITY.md](SECURITY.md).\n\n**What's the quality vs ElevenLabs?**\nOn clean source audio, VoxCPM2 is genuinely close. On noisy \u002F multi-speaker content, ElevenLabs still wins (their diarization is better). For 95% of one-speaker YouTube content, you won't tell the difference.\n\n**Does it preserve emotion \u002F tone?**\nPartially. VoxCPM2 picks up energy and pacing from the reference. It doesn't model fine emotion the way some closed models do. If the source is a calm explainer, the dub is calm; if it's a hype reel, the dub is hype.\n\n**Can I run multiple dubs in parallel?**\nThe server queues GPU work serially (one VoxCPM2 invocation at a time) to avoid OOM. CPU stages (download, transcribe with CPU Whisper, ffmpeg) overlap automatically.\n\n**Does it work for animated content \u002F games \u002F non-real voices?**\nYes — anything VoxCPM2 can fit as a reference (usually 5+ s of clean speech) clones fine. Singing is not supported.\n\n**Why VoxCPM2 instead of XTTS \u002F OpenVoice \u002F F5-TTS?**\nVoxCPM2 has the best cross-lingual cloning quality we tested at the 5 GB weight class. The architecture is swappable — `pipeline\u002Fsynthesizer.py` has a base class; PRs for other backends welcome.\n\n**Can agents trigger this without my approval?**\nEach MCP tool call requires user confirmation by default (per the MCP spec). Tachidubb doesn't bypass that.\n\n---\n\n## 🗺️ Roadmap\n\n- [x] MCP server + CLI\n- [x] Stitched multilingual showcase reels\n- [x] Multi-speaker diarization\n- [x] Background music preservation\n- [x] Deterministic voice across cross-lingual segments\n- [ ] Subtitle burn-in toggle (currently SRT sidecar only)\n- [ ] Speaker labelling UI (assign names to detected speakers)\n- [ ] Browser-only mode (no Ollama dependency, use llama.cpp WASM)\n- [ ] Batch processing folder watcher\n- [ ] Docker image with everything pre-baked\n- [ ] Hardware-accelerated diarization (NVIDIA NeMo)\n- [ ] Apple Silicon MLX backend\n\nVote \u002F suggest features in [Discussions](https:\u002F\u002Fgithub.com\u002FTachikomaRed\u002Ftachidubb\u002Fdiscussions).\n\n---\n\n## 🛡️ Responsible use\n\nVoice cloning is powerful and easily misused. **TachiDUBB is built for legitimate creators dubbing their own content or content they have rights to.** Please:\n\n- Don't clone someone's voice without their explicit, informed consent.\n- Don't impersonate real people (politicians, celebrities, your boss) for deception, fraud, or harassment.\n- Disclose AI-generated speech when publishing — most platforms now require this, and it's the right thing to do.\n- Comply with your local laws on synthetic media (EU AI Act, US state laws, etc.).\n\nWe refuse to add features that defeat watermarking, anti-cloning safeguards, or platform AI-disclosure requirements. See [SECURITY.md](SECURITY.md) for the threat model and how to report abuse.\n\n---\n\n## 🤝 Contributing\n\nPRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for setup, code style, and the modular pipeline design — most contributions are a single drop-in file in `pipeline\u002F`.\n\nGood first issues:\n- Add a TTS backend (XTTS, F5-TTS, OpenVoice)\n- Add a translation backend (OpenAI-compatible HTTP, vLLM, mlx_lm)\n- New language voices in the edge-tts fallback map\n- Improve the duration-matching prompt for hard language pairs\n\n---\n\n## 💖 Credits\n\nBuilt by **[TachikomaRed](https:\u002F\u002Fx.com\u002Fsmolekoma)** and **[smolemaru](https:\u002F\u002Fx.com\u002Fsmolemaru)** &mdash; in collaboration with **[Claude](https:\u002F\u002Fclaude.ai)** (Anthropic).\n\nFollow the build on X: [@smolekoma](https:\u002F\u002Fx.com\u002Fsmolekoma) &middot; [@smolemaru](https:\u002F\u002Fx.com\u002Fsmolemaru)\n\nStanding on shoulders:\n- [VoxCPM2](https:\u002F\u002Fhuggingface.co\u002Fopenbmb\u002FVoxCPM2) — voice cloning TTS (Apache-2.0)\n- [faster-whisper](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper) — ASR (MIT)\n- [pyannote.audio](https:\u002F\u002Fgithub.com\u002Fpyannote\u002Fpyannote-audio) — diarization (MIT)\n- [Ollama](https:\u002F\u002Follama.com) — local LLM serving (MIT)\n- [yt-dlp](https:\u002F\u002Fgithub.com\u002Fyt-dlp\u002Fyt-dlp) — universal downloader (Unlicense)\n- [edge-tts](https:\u002F\u002Fgithub.com\u002Frany2\u002Fedge-tts) — cloud TTS fallback (GPL-3.0)\n- [audio-separator](https:\u002F\u002Fgithub.com\u002Fkaraokenerds\u002Fpython-audio-separator) — stem separation (MIT)\n- [Model Context Protocol](https:\u002F\u002Fmodelcontextprotocol.io) — agent integration (Anthropic)\n\n## 📜 License\n\nMIT — see [LICENSE](LICENSE). VoxCPM2 is Apache-2.0. edge-tts is GPL-3.0; using it doesn't require this project to be GPL because it's a runtime dependency invoked as a process.\n\n---\n\n\u003Cdiv align=\"center\">\n\n**If TachiDUBB saved you a Heygen subscription, smash that ⭐ — that's how more people find it.**\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=TachikomaRed\u002Ftachidubb&type=Date)](https:\u002F\u002Fstar-history.com\u002F#TachikomaRed\u002Ftachidubb&Date)\n\n\u003C\u002Fdiv>\n",2,"2026-06-11 04:11:36","CREATED_QUERY"]