[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-960":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":42,"readmeContent":43,"aiSummary":44,"trendingCount":16,"starSnapshotCount":16,"syncStatus":45,"lastSyncTime":46,"discoverSource":47},960,"OmniVoice-Studio","debpalash\u002FOmniVoice-Studio","debpalash","The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing and dictation Desktop App","",null,"Python",6771,1044,39,21,0,379,1399,2749,1137,40.06,"Other",false,"main",[26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41],"asr","docker","dubbing","dubbing-ai","elevenlabs","local-ai","self-hosted","speech-recognition","speech-to-text","text-to-speech","transcription","tts","video-editing","voice-ai","voice-cloning","voice-generation","2026-06-12 02:00:21","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"docs\u002Flogo.png\" alt=\"OmniVoice Logo\" width=\"120\" \u002F>\n  \u003Ch1>OmniVoice Studio\u003C\u002Fh1>\n  \u003Ch3>The open-source ElevenLabs alternative.\u003C\u002Fh3>\n  \u003Cp>Real-time dictation, zero-shot voice cloning, and cinematic video dubbing — all on your desktop.\u003Cbr\u002F>Open-source, no API keys, fully local. \u003Cb>646 languages.\u003C\u002Fb>\u003C\u002Fp>\n\n  \u003Cp>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdebpalash\u002FOmniVoice-Studio?style=flat-square&color=f59e0b\" alt=\"Stars\" \u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Freleases\u002Flatest\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fdebpalash\u002FOmniVoice-Studio?style=flat-square&color=10b981\" alt=\"Release\" \u002F>\u003C\u002Fa>\n    \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-FSL--1.1--ALv2-blue?style=flat-square\" alt=\"License\" \u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Fissues\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fdebpalash\u002FOmniVoice-Studio?style=flat-square&color=ef4444\" alt=\"Issues\" \u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FaRRdVj3de7\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join_Community-5865F2?style=flat-square&logo=discord&logoColor=white\" alt=\"Discord\" \u002F>\u003C\u002Fa>\n  \u003C\u002Fp>\n\n  \u003Cp>\n    \u003Ca href=\"#quickstart\">Quickstart\u003C\u002Fa> ·\n    \u003Ca href=\"#features\">Features\u003C\u002Fa> ·\n    \u003Ca href=\"#why-omnivoice-studio\">Why OmniVoice Studio?\u003C\u002Fa> ·\n    \u003Ca href=\"#tts-engines\">TTS Engines\u003C\u002Fa> ·\n    \u003Ca href=\"#contributing\">Contributing\u003C\u002Fa> ·\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FaRRdVj3de7\">Discord\u003C\u002Fa>\n  \u003C\u002Fp>\n\n  \u003Cp>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Freleases\u002Fdownload\u002Fv0.2.7\u002FOmniVoice.Studio_0.2.7_aarch64.dmg\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FmacOS-DMG_(Apple_Silicon)-000?style=for-the-badge&logo=apple&logoColor=white\" alt=\"Download macOS DMG\" \u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Freleases\u002Fdownload\u002Fv0.2.7\u002FOmniVoice.Studio_0.2.7_x64_en-US.msi\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWindows-MSI_(x64)-0078D4?style=for-the-badge&logo=windows&logoColor=white\" alt=\"Download Windows MSI\" \u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Freleases\u002Fdownload\u002Fv0.2.7\u002FOmniVoice.Studio_0.2.7_amd64.AppImage\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinux-AppImage_(x64)-FCC624?style=for-the-badge&logo=linux&logoColor=black\" alt=\"Download Linux AppImage\" \u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Freleases\u002Fdownload\u002Fv0.2.7\u002FOmniVoice.Studio_0.2.7_amd64.deb\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDebian-.deb-A81D33?style=for-the-badge&logo=debian&logoColor=white\" alt=\"Download Debian .deb\" \u002F>\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Cbr\u002F>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".github\u002Fassets\u002Fsocial-preview.png\" alt=\"OmniVoice Studio — The open-source ElevenLabs alternative\" width=\"100%\"\u002F>\n\u003C\u002Fdiv>\n\n> [!WARNING]\n> **OmniVoice Studio is in active beta.** Things may break between releases. For the latest features and fixes, clone the repo and run from source rather than using pre-built installers. Bug reports and PRs are very welcome — [open an issue](https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Fissues) or [join Discord](https:\u002F\u002Fdiscord.gg\u002FaRRdVj3de7).\n\n\u003Cbr\u002F>\n\n## Features\n\n\u003Ctable>\n\u003Ctr>\n  \u003Ctd align=\"center\" width=\"33%\">\n    \u003Ch3>🎙️ Voice Cloning\u003C\u002Fh3>\n    \u003Cp>3-second clip → mirror any voice.\u003Cbr\u002F>\u003Cb>646 languages\u003C\u002Fb>, zero-shot.\u003C\u002Fp>\n  \u003C\u002Ftd>\n  \u003Ctd align=\"center\" width=\"33%\">\n    \u003Ch3>🎨 Voice Design\u003C\u002Fh3>\n    \u003Cp>Gender, age, accent, pitch, speed,\u003Cbr\u002F>emotion, dialect — \u003Cb>dial it in\u003C\u002Fb>.\u003C\u002Fp>\n  \u003C\u002Ftd>\n  \u003Ctd align=\"center\" width=\"33%\">\n    \u003Ch3>🎬 Video Dubbing\u003C\u002Fh3>\n    \u003Cp>YouTube URL or file → transcribe →\u003Cbr\u002F>translate → re-voice → \u003Cb>MP4\u003C\u002Fb>.\u003C\u002Fp>\n  \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd align=\"center\" valign=\"top\">\n    \u003Ch3>⌨️ Dictation Widget\u003C\u002Fh3>\n    \u003Cp>\u003Ccode>⌘+⇧+Space\u003C\u002Fcode> from \u003Cb>any app\u003C\u002Fb>.\u003Cbr\u002F>Transcribes, auto-pastes, disappears.\u003C\u002Fp>\n  \u003C\u002Ftd>\n  \u003Ctd align=\"center\" valign=\"top\">\n    \u003Ch3>🔊 Vocal Isolation\u003C\u002Fh3>\n    \u003Cp>Demucs-powered. Splits speech\u003Cbr\u002F>from music, \u003Cb>keeps the background\u003C\u002Fb>.\u003C\u002Fp>\n  \u003C\u002Ftd>\n  \u003Ctd align=\"center\" valign=\"top\">\n    \u003Ch3>👥 Speaker Diarization\u003C\u002Fh3>\n    \u003Cp>Pyannote + WhisperX.\u003Cbr\u002F>\u003Cb>Auto-identifies\u003C\u002Fb> who said what.\u003C\u002Fp>\n  \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd align=\"center\" valign=\"top\">\n    \u003Ch3>📦 Batch Queue\u003C\u002Fh3>\n    \u003Cp>Drop \u003Cb>50 videos\u003C\u002Fb>, walk away.\u003Cbr\u002F>Progress bars per job.\u003C\u002Fp>\n  \u003C\u002Ftd>\n  \u003Ctd align=\"center\" valign=\"top\">\n    \u003Ch3>🤖 MCP Server\u003C\u002Fh3>\n    \u003Cp>Use OmniVoice from \u003Cb>Claude\u003C\u002Fb>,\u003Cbr\u002F>Cursor, or any MCP client.\u003C\u002Fp>\n  \u003C\u002Ftd>\n  \u003Ctd align=\"center\" valign=\"top\">\n    \u003Ch3>🛡️ AI Watermark\u003C\u002Fh3>\n    \u003Cp>AudioSeal (Meta). \u003Cb>Invisible\u003C\u002Fb>,\u003Cbr\u002F>survives compression.\u003C\u002Fp>\n  \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd align=\"center\" valign=\"top\">\n    \u003Ch3>🔐 100% Local\u003C\u002Fh3>\n    \u003Cp>No keys, no cloud, no accounts.\u003Cbr\u002F>\u003Cb>Your machine only\u003C\u002Fb>.\u003C\u002Fp>\n  \u003C\u002Ftd>\n  \u003Ctd align=\"center\" valign=\"top\">\n    \u003Ch3>⚡ GPU Auto-Detect\u003C\u002Fh3>\n    \u003Cp>CUDA · MPS · ROCm · CPU.\u003Cbr\u002F>≤8 GB? \u003Cb>Auto-offloads\u003C\u002Fb>.\u003C\u002Fp>\n  \u003C\u002Ftd>\n  \u003Ctd align=\"center\" valign=\"top\">\n    \u003Ch3>🧩 Extensible\u003C\u002Fh3>\n    \u003Cp>Subclass \u003Ccode>TTSBackend\u003C\u002Fcode>,\u003Cbr\u002F>add any engine in \u003Cb>~50 lines\u003C\u002Fb>.\u003C\u002Fp>\n  \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n## Quickstart\n\nPick your path — from zero-install to full developer setup:\n\n\u003Ctable>\n\u003Ctr>\n\u003Ctd width=\"33%\" align=\"center\">\n\u003Ch3>🖥️ Desktop App\u003C\u002Fh3>\n\u003Csub>\u003Cb>Easiest\u003C\u002Fb> · ~2 min · No dependencies\u003C\u002Fsub>\n\u003Cbr\u002F>\u003Cbr\u002F>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Freleases\u002Flatest\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDownload-Installer-10b981?style=for-the-badge&logo=github&logoColor=white\" alt=\"Download\"\u002F>\u003C\u002Fa>\n\u003Cbr\u002F>\u003Cbr\u002F>\n\u003Csub>macOS DMG · Windows MSI · Linux AppImage\u002Fdeb\u003Cbr\u002F>Auto-bootstraps Python + models on first launch.\u003C\u002Fsub>\n\u003C\u002Ftd>\n\u003Ctd width=\"33%\" align=\"center\">\n\u003Ch3>🐳 Docker\u003C\u002Fh3>\n\u003Csub>\u003Cb>One command\u003C\u002Fb> · ~3 min · Needs Docker\u003C\u002Fsub>\n\u003Cbr\u002F>\u003Cbr\u002F>\n\u003Ccode>docker pull ghcr.io\u002Fdebpalash\u002Fomnivoice-studio\u003C\u002Fcode>\n\u003Cbr\u002F>\u003Cbr\u002F>\n\u003Csub>Pre-built image from GHCR.\u003Cbr\u002F>CPU + NVIDIA GPU supported.\u003C\u002Fsub>\n\u003C\u002Ftd>\n\u003Ctd width=\"33%\" align=\"center\">\n\u003Ch3>⚡ From Source\u003C\u002Fh3>\n\u003Csub>\u003Cb>Full control\u003C\u002Fb> · ~5 min · Needs Bun + Python\u003C\u002Fsub>\n\u003Cbr\u002F>\u003Cbr\u002F>\n\u003Ccode>git clone → bun install → bun run dev\u003C\u002Fcode>\n\u003Cbr\u002F>\u003Cbr\u002F>\n\u003Csub>Hot reload, full codebase access.\u003Cbr\u002F>Best for contributors.\u003C\u002Fsub>\n\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n### 🖥️ Option 1 — Desktop App\n\nPre-built installers (~6–8 MB) are on the [**Releases**](https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Freleases\u002Flatest) page. Download, install, launch. The app bootstraps a Python environment and downloads model weights automatically — the splash screen shows progress.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>macOS — \"app is damaged and can't be opened\"\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\nmacOS quarantines apps downloaded outside the App Store. After dragging to `\u002FApplications`:\n\n```bash\nxattr -cr \u002FApplications\u002FOmniVoice\\ Studio.app\n```\n\nOpen normally after. One-time fix.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Windows — first launch takes 5–10 minutes\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\nThe app bootstraps a Python virtual environment, installs dependencies, and downloads ffmpeg on first run. The splash screen shows each step. Subsequent launches start in seconds.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Linux — AppImage needs FUSE\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\nIf FUSE isn't available, use the `.deb` package or extract-and-run:\n\n```bash\nchmod +x OmniVoice.Studio_*.AppImage\n.\u002FOmniVoice.Studio_*.AppImage --appimage-extract-and-run\n```\n\u003C\u002Fdetails>\n\n---\n\n### 🐳 Option 2 — Docker\n\nPull the pre-built image from **GitHub Container Registry**:\n\n```bash\ndocker pull ghcr.io\u002Fdebpalash\u002Fomnivoice-studio:latest\n```\n\n**Run it:**\n\n```bash\n# CPU mode\ndocker run -d --name omnivoice \\\n  -p 127.0.0.1:3900:3900 \\\n  -v omnivoice-data:\u002Fapp\u002Fomnivoice_data \\\n  ghcr.io\u002Fdebpalash\u002Fomnivoice-studio:latest\n\n# NVIDIA GPU mode\ndocker run -d --name omnivoice --gpus all \\\n  -p 127.0.0.1:3900:3900 \\\n  -v omnivoice-data:\u002Fapp\u002Fomnivoice_data \\\n  ghcr.io\u002Fdebpalash\u002Fomnivoice-studio:latest\n```\n\n**Or use Docker Compose:**\n\n```bash\n# CPU\ndocker compose -f deploy\u002Fdocker-compose.yml up -d\n\n# GPU\ndocker compose -f deploy\u002Fdocker-compose.yml --profile gpu up -d\n```\n\nOpen [localhost:3900](http:\u002F\u002Flocalhost:3900) once the health check passes. First run downloads ~4 GB of model weights — progress in `docker compose logs -f`.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Build from source instead of pulling\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\n```bash\ndocker compose -f deploy\u002Fdocker-compose.yml up --build -d\n```\n\n\u003C\u002Fdetails>\n\n> **Network access:** the container binds to `127.0.0.1` only. To expose on your LAN, change the port mapping to `\"0.0.0.0:3900:3900\"`. OmniVoice ships no authentication — put it behind a reverse proxy with auth (Caddy `basic_auth`, nginx + htpasswd, Tailscale, etc.).\n\n---\n\n### ⚡ Option 3 — From Source\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio.git && cd OmniVoice-Studio\nbun install && bun run dev\n```\n\nOpen [localhost:3901](http:\u002F\u002Flocalhost:3901) and start cloning voices. Hot-reload enabled for both frontend and backend.\n\n```bash\nbun run desktop    # Build the native desktop app from source\n```\n\n| Service | URL | Stack |\n|---------|-----|-------|\n| **Backend** | `localhost:3900` | FastAPI · 97 endpoints · WhisperX · Demucs · OmniVoice |\n| **Frontend** | `localhost:3901` | React · Vite · Waveform timeline · Glassmorphism UI |\n| **API Docs** | [`localhost:3900\u002Fdocs`](http:\u002F\u002Flocalhost:3900\u002Fdocs) | Scalar — interactive API reference |\n\n> [!NOTE]\n> First run downloads model weights (~2.4 GB). No account needed. For faster downloads, optionally set `HF_TOKEN=hf_...` in your environment ([get a free token here](https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftokens)).\n>\n> **Having issues?** Join our [Discord](https:\u002F\u002Fdiscord.gg\u002FaRRdVj3de7) for setup help and troubleshooting.\n\n---\n\n## Screenshots\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"50%\">\n      \u003Cimg src=\"docs\u002Fscreenshot-clone.png\" alt=\"Voice Clone\" width=\"100%\"\u002F>\n      \u003Cbr\u002F>\u003Cb>Voice Clone\u003C\u002Fb>\u003Cbr\u002F>\n      \u003Csub>Drop a 3-second clip → mirror any voice. 646 languages, zero-shot.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" width=\"50%\">\n      \u003Cimg src=\"docs\u002Fscreenshot-design.png\" alt=\"Voice Design\" width=\"100%\"\u002F>\n      \u003Cbr\u002F>\u003Cb>Voice Design\u003C\u002Fb>\u003Cbr\u002F>\n      \u003Csub>Build new voices from scratch — gender, age, accent, pitch, style.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cimg src=\"docs\u002Fscreenshot-dub.png\" alt=\"Video Dubbing\" width=\"100%\"\u002F>\n      \u003Cbr\u002F>\u003Cb>Video Dubbing\u003C\u002Fb>\u003Cbr\u002F>\n      \u003Csub>Upload or paste a YouTube URL. Transcribe, translate, re-voice, export.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\">\n      \u003Cimg src=\"docs\u002Fscreenshot-gallery.png\" alt=\"Voice Gallery\" width=\"100%\"\u002F>\n      \u003Cbr\u002F>\u003Cb>Voice Gallery\u003C\u002Fb>\u003Cbr\u002F>\n      \u003Csub>Search YouTube, browse categories, download clips, build your library.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cimg src=\"docs\u002Fscreenshot-settings.png\" alt=\"Settings — Models\" width=\"100%\"\u002F>\n      \u003Cbr\u002F>\u003Cb>Settings → Models\u003C\u002Fb>\u003Cbr\u002F>\n      \u003Csub>15 models. One-click install. Auto-detects your platform (CUDA \u002F MPS \u002F CPU).\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\">\n      \u003Cimg src=\"docs\u002Fscreenshot-libraryprojects.png\" alt=\"Projects\" width=\"100%\"\u002F>\n      \u003Cbr\u002F>\u003Cb>Projects\u003C\u002Fb>\u003Cbr\u002F>\n      \u003Csub>Dub projects, voice profiles, generation history, exports — all searchable.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\" colspan=\"2\">\n      \u003Cimg src=\"docs\u002Fscreenshot-logs.png\" alt=\"Settings — Logs\" width=\"100%\"\u002F>\n      \u003Cbr\u002F>\u003Cb>Settings → Logs\u003C\u002Fb>\u003Cbr\u002F>\n      \u003Csub>Live backend, frontend, and Tauri runtime logs. Filter, refresh, clear.\u003C\u002Fsub>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n## Why OmniVoice Studio?\n\nElevenLabs charges **$5–$330\u002Fmo** and processes your audio on their servers. OmniVoice Studio runs **on your hardware, with no usage limits.**\n\n| | **ElevenLabs** | **OmniVoice Studio** |\n|---|---|---|\n| **Pricing** | $5–$330\u002Fmo, per-character billing | Free for personal use · [Commercial license](#license) for business |\n| **Voice Cloning** | ✅ 3s clip | ✅ 3s clip, zero-shot |\n| **Voice Design** | ✅ Gender, age | ✅ Gender, age, accent, pitch, style, dialect |\n| **Languages** | 32 | **646** |\n| **Video Dubbing** | ✅ Cloud-only | ✅ Fully local |\n| **Data Privacy** | Audio sent to cloud | **Nothing leaves your machine** |\n| **API Keys** | Required | Not needed |\n| **GPU Support** | N\u002FA (cloud) | CUDA · Apple Silicon · ROCm · CPU |\n| **Desktop App** | ❌ | ✅ macOS · Windows · Linux |\n| **Customizable** | ❌ Closed | ✅ Fork it, extend it, ship it |\n\nOmniVoice Studio gives you professional-grade AI tools without the subscription or the cloud.\n\n---\n\n## System Requirements\n\n| | **Minimum** | **Recommended** |\n|---|---|---|\n| **OS** | Windows 10, macOS 12+, Ubuntu 20.04+ | Any modern 64-bit OS |\n| **RAM** | 8 GB | 16 GB+ |\n| **VRAM (GPU)** | 4 GB (auto-offloads TTS to CPU) | 8 GB+ (NVIDIA RTX 3060+) |\n| **Disk** | 10 GB free (models + cache) | 20 GB+ SSD |\n| **Python** | 3.10+ (managed by `uv`) | 3.11–3.12 |\n| **GPU** | Optional — CPU works | NVIDIA CUDA · Apple Silicon MPS · AMD ROCm |\n\n> [!TIP]\n> On GPUs with **≤8 GB VRAM**, OmniVoice automatically offloads TTS to CPU during transcription — no config needed. A dedicated GPU is not required; the entire pipeline runs on CPU (just slower).\n\n### TTS Engines\n\nOmniVoice ships a multi-engine TTS backend. The default engine (OmniVoice) is always available; additional engines are opt-in and auto-detected. Switch engines in **Settings → TTS Engine** or via the `OMNIVOICE_TTS_BACKEND` env var.\n\n| Engine | Languages | Clone | Instruct | Linux | macOS ARM | Windows | License |\n|--------|:---------:|:-----:|:--------:|:-----:|:---------:|:-------:|:-------:|\n| **OmniVoice** (default) | 600+ | ✅ | ✅ | ✅ CUDA\u002FCPU | ✅ MPS | ✅ CUDA\u002FCPU | Built-in |\n| **CosyVoice 3** | 9 + 18 dialects | ✅ | ✅ | ✅ CUDA\u002FCPU | ✅ MPS | ✅ CUDA\u002FCPU | Apache-2.0 |\n| **MLX-Audio** (Kokoro, Qwen3-TTS, CSM, Dia, …) | Multi | Varies | Varies | ❌ | ✅ Native | ❌ | Varies |\n| **VoxCPM2** | 30 | ✅ | ✅ | ✅ CUDA\u002FCPU | ✅ MPS | ✅ CUDA\u002FCPU | Apache-2.0 |\n| **MOSS-TTS-Nano** | 20 | ✅ | ❌ | ✅ CUDA\u002FCPU | ✅ CPU | ✅ CUDA\u002FCPU | Apache-2.0 |\n| **KittenTTS** | English | ❌ | ❌ | ✅ CPU | ✅ CPU | ✅ CPU | MIT |\n\n> **CUDA** = GPU-accelerated · **MPS** = Apple Silicon Metal · **CPU** = runs everywhere, slower for large models · KittenTTS and MOSS-TTS-Nano run realtime on CPU · MLX-Audio is Apple Silicon only.\n\n---\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────┐\n│                  Frontend (React)                │\n│  DubTab · VoicePreview · BatchQueue · Gallery    │\n├─────────────────────────────────────────────────┤\n│                Backend (FastAPI)                  │\n│  97 API endpoints · SSE streaming · SQLite       │\n├──────────┬──────────┬──────────┬────────────────┤\n│ WhisperX │  Demucs  │OmniVoice │   Pyannote     │\n│   ASR    │  Source  │   TTS    │  Diarization   │\n│          │  Sep.    │          │                │\n└──────────┴──────────┴──────────┴────────────────┘\n        CUDA \u002F MPS \u002F ROCm \u002F CPU (auto-detected)\n```\n\n---\n\n## Roadmap\n\n### ✅ Shipped\n\n| Category | Features |\n|----------|----------|\n| **Dubbing** | Full pipeline (transcribe→translate→synthesize→mux), scene-aware splitting, lip-sync scoring, streaming TTS |\n| **Voice** | Zero-shot cloning, voice design, A\u002FB comparison, voice preview widget, gallery with favorites\u002Ftags |\n| **Audio** | Demucs vocal isolation, per-segment gain, selective track export, stem\u002FSRT\u002FVTT\u002FMP3 export |\n| **Multi-Lang** | Multi-language batch picker, batch dubbing queue with sequential GPU execution |\n| **Diarization** | Pyannote ML diarization, auto speaker clone extraction, per-speaker voice assignment |\n| **Infra** | Docker deployment, CUDA\u002FMPS\u002FROCm auto-detect, cuDNN 8 compat, VRAM-aware model offloading |\n| **AI Provenance** | AudioSeal invisible watermarking (SynthID-like), video logo overlay, watermark detection API |\n| **UX** | Undo\u002Fredo, keyboard shortcuts, drag-and-drop, session persistence, glassmorphism design system |\n| **Real-time Events** | WebSocket event bus — instant sidebar refresh on data mutations, exponential backoff reconnect |\n| **State Management** | Zustand store migration — `uiSlice`, `pillSlice`, `dubSlice`, `generateSlice`, `prefsSlice`, `glossarySlice` |\n| **Desktop** | Cross-platform Tauri installers (macOS DMG, Windows MSI, Linux deb\u002FAppImage), auto-update infrastructure |\n| **Windows Hardening** | Cross-platform log paths, Triton workaround, HF symlink bypass, 300s health check timeout |\n| **Dictation** | Global system-wide hotkey (`⌘+⇧+Space`), frameless floating widget, streaming ASR via WebSocket, auto-paste |\n| **Batch Pipeline** | Full batch TTS: extract → transcribe → translate → generate → mix → export, with live progress tracking |\n\n### 🔜 Up Next\n\n- 🎬 **Lip-sync v2** — visual speech timing with wav2lip\n- 📖 **Audiobook Editor** — chapter-aware long-form narration\n- 🌐 **Hosted Demo** — try OmniVoice without installing anything\n- 🔌 **Plugin Marketplace** — community-contributed TTS engines and effects\n\n---\n\n## Contributing\n\nWe welcome contributions of all kinds — bug fixes, new TTS engine adapters, UI improvements, docs, and translations.\n\n- 📖 Read the **[Contributing Guide](CONTRIBUTING.md)** for setup, code style, and PR workflow\n- 🐛 Browse [good first issues](https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio\u002Flabels\u002Fgood%20first%20issue)\n- 💬 Join our [Discord](https:\u002F\u002Fdiscord.gg\u002FaRRdVj3de7) to discuss ideas or ask for help\n\n---\n\n## FAQ\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Is this really as good as ElevenLabs?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\nFor voice cloning and dubbing, yes — OmniVoice uses a state-of-the-art diffusion TTS model with 646 languages (ElevenLabs supports 32). Quality is comparable for most use cases. Where ElevenLabs wins is in their polished cloud API and pre-made voice library. OmniVoice wins on privacy, cost, language coverage, and customizability.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Does it work on Apple Silicon (M1\u002FM2\u002FM3\u002FM4)?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\nYes. MPS acceleration is auto-detected. MLX-optimized Whisper models are available for faster transcription on Apple hardware.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>How much VRAM do I need?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\u003Cb>4 GB minimum.\u003C\u002Fb> With ≤8 GB, the TTS model is automatically offloaded to CPU during transcription. With 8+ GB, everything runs on GPU simultaneously. No GPU at all? CPU mode works — just slower (~3× for TTS).\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Can I use this commercially?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\nPersonal, educational, internal-team, and non-commercial use is free under \u003Ca href=\"https:\u002F\u002Ffsl.software\u002F\">FSL-1.1-ALv2\u003C\u002Fa>. Building a competing product or service on top of OmniVoice Studio requires a commercial license — see \u003Ca href=\"#license\">License\u003C\u002Fa>. Pricing tiers coming soon. Each release converts to Apache 2.0 two years after publication.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>What languages are supported?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n646 languages for TTS via the OmniVoice model. Transcription (WhisperX) supports 99 languages. Translation coverage depends on the target language pair.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Can I add my own TTS engine?\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\nYes. OmniVoice uses a \u003Cb>built-in backend registry\u003C\u002Fb>. To add an engine in ~50 lines, subclass \u003Ccode>TTSBackend\u003C\u002Fcode> in \u003Ccode>backend\u002Fservices\u002Ftts_backend.py\u003C\u002Fcode> and add it to the \u003Ccode>_REGISTRY\u003C\u002Fcode> dictionary at the bottom. Six engines are built in: OmniVoice, CosyVoice, MLX-Audio (14+ sub-engines), VoxCPM2, MOSS-TTS-Nano, and KittenTTS. See the \u003Ca href=\"#tts-engines\">TTS Engines\u003C\u002Fa> section for details.\n\u003C\u002Fdetails>\n\n---\n\n## License\n\nOmniVoice Studio is source-available under the [**Functional Source License (FSL-1.1-ALv2)**](https:\u002F\u002Ffsl.software\u002F).\n\n**Free** for personal, educational, research, internal team, and non-commercial use. Each release **converts to Apache 2.0 automatically two years after publication**.\n\n**Business \u002F enterprise** users building a competing product or service on top of OmniVoice Studio need a commercial license. **Pricing tiers coming soon.** For inquiries in the meantime, reach out at **OmniVoice@palash.dev**.\n\nSee [`LICENSE`](LICENSE) for the full terms.\n\n---\n\n## Acknowledgments\n\nOmniVoice Studio is built on the shoulders of exceptional open-source work:\n\n| Project | Role |\n|---------|------|\n| [**OmniVoice (k2-fsa)**](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002FOmniVoice) | Zero-shot diffusion TTS engine — the core voice synthesis model |\n| [**WhisperX**](https:\u002F\u002Fgithub.com\u002Fm-bain\u002FwhisperX) | Word-level speech recognition and alignment |\n| [**Demucs (Meta)**](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdemucs) | Music source separation for vocal isolation |\n| [**Pyannote**](https:\u002F\u002Fgithub.com\u002Fpyannote\u002Fpyannote-audio) | Speaker diarization — who said what |\n| [**CTranslate2**](https:\u002F\u002Fgithub.com\u002FOpenNMT\u002FCTranslate2) | Optimized Transformer inference on CPU and GPU |\n| [**AudioSeal (Meta)**](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudioseal) | Invisible neural audio watermarking for AI provenance |\n| [**Tauri**](https:\u002F\u002Ftauri.app) | Native desktop app framework |\n\n---\n\n\u003Cdiv align=\"center\">\n\n\u003Cbr\u002F>\n\nIf you read this far, you're our kind of person.\u003Cbr\u002F>\n**[⭐ Star this repo](https:\u002F\u002Fgithub.com\u002Fdebpalash\u002FOmniVoice-Studio)** so others can find it too.\n\n\u003Cbr\u002F>\n\n  \u003Ca href=\"https:\u002F\u002Fstar-history.com\u002F#debpalash\u002FOmniVoice-Studio&Date\">\n    \u003Cpicture>\n      \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=debpalash\u002FOmniVoice-Studio&type=Date&theme=dark\" \u002F>\n      \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=debpalash\u002FOmniVoice-Studio&type=Date\" \u002F>\n      \u003Cimg alt=\"Star History\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=debpalash\u002FOmniVoice-Studio&type=Date&theme=dark\" width=\"600\" \u002F>\n    \u003C\u002Fpicture>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n","OmniVoice Studio 是一个开源的ElevenLabs替代方案，提供本地语音克隆、设计、创建和电影级视频配音功能，并支持实时听写。项目采用Python编写，具备零样本语音克隆、646种语言支持及实时文本转语音等核心功能，所有处理都在本地完成，无需API密钥。特别适合需要高质量语音合成与视频配音但又希望保持数据隐私的场景使用，如个人创作者、小型工作室或对数据安全有严格要求的企业。此外，它还支持ASR（自动语音识别）和TTS（文本到语音转换），能够无缝集成进各种音视频编辑工作流中。",2,"2026-06-11 02:40:31","CREATED_QUERY"]