[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71955":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},71955,"WhisperLiveKit","QuentinFuxa\u002FWhisperLiveKit","QuentinFuxa","Simultaneous speech-to-text models","",null,"Python",10437,1078,61,17,0,21,58,147,63,119.1,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:02","\u003Ch1 align=\"center\">WLK\u003C\u002Fh1>\n\u003Cp align=\"center\">\u003Cb>WhisperLiveKit: Ultra-low-latency, self-hosted speech-to-text with speaker identification\u003C\u002Fb>\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FQuentinFuxa\u002FWhisperLiveKit\u002Frefs\u002Fheads\u002Fmain\u002Fdemo.png\" alt=\"WhisperLiveKit Demo\" width=\"730\">\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fwhisperlivekit\u002F\">\u003Cimg alt=\"PyPI Version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fwhisperlivekit?color=g\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fpepy.tech\u002Fproject\u002Fwhisperlivekit\">\u003Cimg alt=\"PyPI Downloads\" src=\"https:\u002F\u002Fstatic.pepy.tech\u002Fpersonalized-badge\u002Fwhisperlivekit?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=installations\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fwhisperlivekit\u002F\">\u003Cimg alt=\"Python Versions\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11--3.13-dark_green\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fqfuxa\u002Fwhisper-base-french-lora\">\n  \u003Cimg alt=\"Hugging Face Weights\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-Hugging%20Face%20Weights-yellow\" \u002F>\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FQuentinFuxa\u002FWhisperLiveKit\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache 2.0-dark_green\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n### Powered by Leading Research:\n\n- Simul-[Whisper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.10052)\u002F[Streaming](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.17077) (SOTA 2025) - Ultra-low latency transcription using [AlignAtt policy](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.11408). \n- [NLLW](https:\u002F\u002Fgithub.com\u002FQuentinFuxa\u002FNoLanguageLeftWaiting) (2025), based on [distilled](https:\u002F\u002Fhuggingface.co\u002Fentai2965\u002Fnllb-200-distilled-600M-ctranslate2) [NLLB](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.04672) (2022, 2024) - Simulatenous translation from & to 200 languages.\n- [WhisperStreaming](https:\u002F\u002Fgithub.com\u002Fufal\u002Fwhisper_streaming) (SOTA 2023) - Low latency transcription using [LocalAgreement policy](https:\u002F\u002Fwww.isca-archive.org\u002Finterspeech_2020\u002Fliu20s_interspeech.pdf)\n- [Streaming Sortformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.18446) (SOTA 2025) - Advanced real-time speaker diarization\n- [Diart](https:\u002F\u002Fgithub.com\u002Fjuanmc2005\u002Fdiart) (SOTA 2021) - Real-time speaker diarization\n- [Voxtral Mini](https:\u002F\u002Fhuggingface.co\u002Fmistralai\u002FVoxtral-Mini-4B-Realtime-2602) (2025) - 4B-parameter multilingual speech model by Mistral AI\n- [Silero VAD](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-vad) (2024) - Enterprise-grade Voice Activity Detection\n\n\n> **Why not just run a simple Whisper model on every audio batch?** Whisper is designed for complete utterances, not real-time chunks. Processing small segments loses context, cuts off words mid-syllable, and produces poor transcription. WhisperLiveKit uses state-of-the-art simultaneous speech research for intelligent buffering and incremental processing.\n\n\n### Architecture\n\n\u003Cimg alt=\"Architecture\" src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FQuentinFuxa\u002FWhisperLiveKit\u002Frefs\u002Fheads\u002Fmain\u002Farchitecture.png\" \u002F>\n\n*The backend supports multiple concurrent users. Voice Activity Detection reduces overhead when no voice is detected.*\n\n### Installation & Quick Start\n\n```bash\npip install whisperlivekit\n```\n\n#### Quick Start\n\n```bash\n\n# Start the server — open http:\u002F\u002Flocalhost:8000 and start talking\nwlk --model base --language en\n\n\n# Auto-pull model and start server\nwlk run whisper:tiny\n\n# Transcribe a file (no server needed)\nwlk transcribe meeting.wav\n\n# Generate subtitles\nwlk transcribe --format srt podcast.mp3 -o podcast.srt\n\n# Manage models\nwlk models                             # See what's installed\nwlk pull large-v3                      # Download a model\nwlk rm large-v3                        # Delete a model\n\n# Benchmark speed and accuracy\nwlk bench\n```\n\n#### API Compatibility\n\nWhisperLiveKit exposes multiple APIs so you can use it as a drop-in replacement:\n\n```bash\n# OpenAI-compatible REST API\ncurl http:\u002F\u002Flocalhost:8000\u002Fv1\u002Faudio\u002Ftranscriptions -F file=@audio.wav\n\n# Works with the OpenAI Python SDK\nclient = OpenAI(base_url=\"http:\u002F\u002Flocalhost:8000\u002Fv1\", api_key=\"unused\")\n\n# Deepgram-compatible WebSocket (use any Deepgram SDK)\n# Just point your Deepgram client at localhost:8000\n\n# Native WebSocket for real-time streaming\nws:\u002F\u002Flocalhost:8000\u002Fasr\n```\n\nSee [docs\u002FAPI.md](docs\u002FAPI.md) for the complete API reference.\n\n> - See [here](https:\u002F\u002Fgithub.com\u002FQuentinFuxa\u002FWhisperLiveKit\u002Fblob\u002Fmain\u002Fwhisperlivekit\u002Fsimul_whisper\u002Fwhisper\u002Ftokenizer.py) for the list of all available languages.\n> - Check the [troubleshooting guide](docs\u002Ftroubleshooting.md) for step-by-step fixes collected from recent GPU setup\u002Fenv issues.\n> - For HTTPS requirements, see the **Parameters** section for SSL configuration options.\n\n\n\n\n#### Optional Dependencies\n\n| Feature | `uv sync` | `pip install -e` |\n|-----------|-------------|-------------|\n| **Apple Silicon MLX Whisper backend** | `uv sync --extra mlx-whisper` | `pip install -e \".[mlx-whisper]\"` |\n| **Voxtral (MLX backend, Apple Silicon)** | `uv sync --extra voxtral-mlx` | `pip install -e \".[voxtral-mlx]\"` |\n| **CPU PyTorch stack** | `uv sync --extra cpu` | `pip install -e \".[cpu]\"` |\n| **CUDA 12.9 PyTorch stack** | `uv sync --extra cu129` | `pip install -e \".[cu129]\"` |\n| **Translation** | `uv sync --extra translation` | `pip install -e \".[translation]\"` |\n| **Sentence tokenizer** | `uv sync --extra sentence_tokenizer` | `pip install -e \".[sentence_tokenizer]\"` |\n| **Voxtral (HF backend)** | `uv sync --extra voxtral-hf` | `pip install -e \".[voxtral-hf]\"` |\n| **Speaker diarization (Sortformer \u002F NeMo)** | `uv sync --extra diarization-sortformer` | `pip install -e \".[diarization-sortformer]\"` |\n| *[Not recommended]* Speaker diarization with Diart | `uv sync --extra diarization-diart` | `pip install -e \".[diarization-diart]\"` |\n\nSupported GPU profiles:\n\n```bash\n# Profile A: Sortformer diarization\nuv sync --extra cu129 --extra diarization-sortformer\n\n# Profile B: Voxtral HF + translation\nuv sync --extra cu129 --extra voxtral-hf --extra translation\n```\n\n`voxtral-hf` and `diarization-sortformer` are intentionally incompatible extras and must be installed in separate environments.\n\nSee **Parameters & Configuration** below on how to use them.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"benchmark_scatter_en_aware.png\" alt=\"Speed vs Accuracy — English\" width=\"700\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n\u003Cimg src=\"benchmark_scatter_fr_aware.png\" alt=\"Speed vs Accuracy — French\" width=\"700\">\n\u003C\u002Fp>\n\nBenchmarks use 6 minutes of public [LibriVox](https:\u002F\u002Flibrivox.org\u002F) audiobook recordings per language (30s + 60s + 120s + 180s), with ground truth from [Project Gutenberg](https:\u002F\u002Fwww.gutenberg.org\u002F). Fully reproducible with `python scripts\u002Frun_scatter_benchmark.py`.\nWe are actively looking for benchmark results on other hardware (NVIDIA GPUs, different Apple Silicon chips, cloud instances). If you run the benchmarks on your machine, please share your results via an issue or PR!\n\n\n#### Use it to capture audio from web pages.\n\nGo to `chrome-extension` for instructions.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FQuentinFuxa\u002FWhisperLiveKit\u002Frefs\u002Fheads\u002Fmain\u002Fchrome-extension\u002Fdemo-extension.png\" alt=\"WhisperLiveKit Demo\" width=\"600\">\n\u003C\u002Fp>\n\n\n### Voxtral Backend\n\nWhisperLiveKit supports [Voxtral Mini](https:\u002F\u002Fhuggingface.co\u002Fmistralai\u002FVoxtral-Mini-4B-Realtime-2602),\na 4B-parameter speech model from Mistral AI that natively handles 100+ languages with automatic\nlanguage detection. Whisper also supports auto-detection (`--language auto`), but Voxtral's per-chunk\ndetection is more reliable and does not bias towards English.\n\n```bash\n# Apple Silicon (native MLX, recommended)\npip install -e \".[voxtral-mlx]\"\nwlk --backend voxtral-mlx\n\n# Linux\u002FGPU (HuggingFace transformers)\npip install transformers torch\nwlk --backend voxtral\n```\n\nVoxtral uses its own streaming policy and does not use LocalAgreement or SimulStreaming.\nSee [BENCHMARK.md](BENCHMARK.md) for performance numbers.\n\n### Usage Examples\n\n**Command-line Interface**: Start the transcription server with various options:\n\n```bash\n# Large model and translate from french to danish\nwlk --model large-v3 --language fr --target-language da\n\n# Diarization and server listening on *\u002F80\nwlk --host 0.0.0.0 --port 80 --model medium --diarization --language fr\n\n# Voxtral multilingual (auto-detects language)\nwlk --backend voxtral-mlx\n```\n\n\n**Python API Integration**: Check [basic_server](https:\u002F\u002Fgithub.com\u002FQuentinFuxa\u002FWhisperLiveKit\u002Fblob\u002Fmain\u002Fwhisperlivekit\u002Fbasic_server.py) for a more complete example of how to use the functions and classes.\n\n```python\nimport asyncio\nfrom contextlib import asynccontextmanager\n\nfrom fastapi import FastAPI, WebSocket, WebSocketDisconnect\nfrom fastapi.responses import HTMLResponse\n\nfrom whisperlivekit import AudioProcessor, TranscriptionEngine, parse_args\n\ntranscription_engine = None\n\n@asynccontextmanager\nasync def lifespan(app: FastAPI):\n    global transcription_engine\n    transcription_engine = TranscriptionEngine(model_size=\"medium\", diarization=True, lan=\"en\")\n    yield\n\napp = FastAPI(lifespan=lifespan)\n\nasync def handle_websocket_results(websocket: WebSocket, results_generator):\n    async for response in results_generator:\n        await websocket.send_json(response)\n    await websocket.send_json({\"type\": \"ready_to_stop\"})\n\n@app.websocket(\"\u002Fasr\")\nasync def websocket_endpoint(websocket: WebSocket):\n    global transcription_engine\n\n    # Create a new AudioProcessor for each connection, passing the shared engine\n    audio_processor = AudioProcessor(transcription_engine=transcription_engine)    \n    results_generator = await audio_processor.create_tasks()\n    results_task = asyncio.create_task(handle_websocket_results(websocket, results_generator))\n    await websocket.accept()\n    while True:\n        message = await websocket.receive_bytes()\n        await audio_processor.process_audio(message)        \n```\n\n**Frontend Implementation**: The package includes an HTML\u002FJavaScript implementation [here](https:\u002F\u002Fgithub.com\u002FQuentinFuxa\u002FWhisperLiveKit\u002Fblob\u002Fmain\u002Fwhisperlivekit\u002Fweb\u002Flive_transcription.html). You can also import it using `from whisperlivekit import get_inline_ui_html` & `page = get_inline_ui_html()`\n\n\n## Parameters & Configuration\n\n\n| Parameter | Description | Default |\n|-----------|-------------|---------|\n| `--model` | Whisper model size. List and recommandations [here](https:\u002F\u002Fgithub.com\u002FQuentinFuxa\u002FWhisperLiveKit\u002Fblob\u002Fmain\u002Fdocs\u002Fdefault_and_custom_models.md) | `small` |\n| `--model-path` | Local .pt file\u002Fdirectory **or** Hugging Face repo ID containing the Whisper model. Overrides `--model`. Recommandations [here](https:\u002F\u002Fgithub.com\u002FQuentinFuxa\u002FWhisperLiveKit\u002Fblob\u002Fmain\u002Fdocs\u002Fdefault_and_custom_models.md) | `None` |\n| `--language` | List [here](docs\u002Fsupported_languages.md). If you use `auto`, the model attempts to detect the language automatically, but it tends to bias towards English. | `auto` |\n| `--target-language` | If sets, translates using [NLLW](https:\u002F\u002Fgithub.com\u002FQuentinFuxa\u002FNoLanguageLeftWaiting). [200 languages available](docs\u002Fsupported_languages.md). If you want to translate to english, you can also use `--direct-english-translation`. The STT model will try to directly output the translation. | `None` |\n| `--diarization` | Enable speaker identification | `False` |\n| `--backend-policy` | Streaming strategy: `1`\u002F`simulstreaming` uses AlignAtt SimulStreaming, `2`\u002F`localagreement` uses the LocalAgreement policy | `simulstreaming` |\n| `--backend` | ASR backend selector. `auto` picks MLX on macOS (if installed), otherwise Faster-Whisper, otherwise vanilla Whisper. Options: `mlx-whisper`, `faster-whisper`, `whisper`, `openai-api` (LocalAgreement only), `voxtral-mlx` (Apple Silicon), `voxtral` (HuggingFace) | `auto` |\n| `--no-vac` | Disable Voice Activity Controller. NOT ADVISED | `False` |\n| `--no-vad` | Disable Voice Activity Detection. NOT ADVISED | `False` |\n| `--warmup-file` | Audio file path for model warmup | `jfk.wav` |\n| `--host` | Server host address | `localhost` |\n| `--port` | Server port | `8000` |\n| `--ssl-certfile` | Path to the SSL certificate file (for HTTPS support) | `None` |\n| `--ssl-keyfile` | Path to the SSL private key file (for HTTPS support) | `None` |\n| `--forwarded-allow-ips` | Ip or Ips allowed to reverse proxy the whisperlivekit-server. Supported types are  IP Addresses (e.g. 127.0.0.1), IP Networks (e.g. 10.100.0.0\u002F16), or Literals (e.g. \u002Fpath\u002Fto\u002Fsocket.sock) | `None` |\n| `--pcm-input` | raw PCM (s16le) data is expected as input and FFmpeg will be bypassed. Frontend will use AudioWorklet instead of MediaRecorder | `False` |\n| `--lora-path` | Path or Hugging Face repo ID for LoRA adapter weights (e.g., `qfuxa\u002Fwhisper-base-french-lora`). Only works with native Whisper backend (`--backend whisper`) | `None` |\n\n| Translation options | Description | Default |\n|-----------|-------------|---------|\n| `--nllb-backend` | `transformers` or `ctranslate2` | `transformers` |\n| `--nllb-size` | `600M` or `1.3B` | `600M` |\n\n| Diarization options | Description | Default |\n|-----------|-------------|---------|\n| `--diarization-backend` |  `diart` or `sortformer` | `sortformer` |\n| `--disable-punctuation-split` | [NOT FUNCTIONAL IN 0.2.15 \u002F 0.2.16] Disable punctuation based splits. See #214 | `False` |\n| `--segmentation-model` | Hugging Face model ID for Diart segmentation model. [Available models](https:\u002F\u002Fgithub.com\u002Fjuanmc2005\u002Fdiart\u002Ftree\u002Fmain?tab=readme-ov-file#pre-trained-models) | `pyannote\u002Fsegmentation-3.0` |\n| `--embedding-model` | Hugging Face model ID for Diart embedding model. [Available models](https:\u002F\u002Fgithub.com\u002Fjuanmc2005\u002Fdiart\u002Ftree\u002Fmain?tab=readme-ov-file#pre-trained-models) | `pyannote\u002Fembedding` |\n\n| SimulStreaming backend options | Description | Default |\n|-----------|-------------|---------|\n| `--disable-fast-encoder` | Disable Faster Whisper or MLX Whisper backends for the encoder (if installed). Inference can be slower but helpful when GPU memory is limited | `False` |\n| `--custom-alignment-heads` | Use your own alignment heads, useful when `--model-dir` is used. Use `scripts\u002Fdetermine_alignment_heads.py` to extract them. \u003Cimg src=\"scripts\u002Falignment_heads_qwen3_asr_1.7B.png\" alt=\"WhisperLiveKit Demo\" width=\"300\">\n | `None` |\n| `--frame-threshold` | AlignAtt frame threshold (lower = faster, higher = more accurate) | `25` |\n| `--beams` | Number of beams for beam search (1 = greedy decoding) | `1` |\n| `--decoder` | Force decoder type (`beam` or `greedy`) | `auto` |\n| `--audio-max-len` | Maximum audio buffer length (seconds) | `30.0` |\n| `--audio-min-len` | Minimum audio length to process (seconds) | `0.0` |\n| `--cif-ckpt-path` | Path to CIF model for word boundary detection | `None` |\n| `--never-fire` | Never truncate incomplete words | `False` |\n| `--init-prompt` | Initial prompt for the model | `None` |\n| `--static-init-prompt` | Static prompt that doesn't scroll | `None` |\n| `--max-context-tokens` | Maximum context tokens | Depends on model used, but usually 448. |\n\n\n\n| WhisperStreaming backend options | Description | Default |\n|-----------|-------------|---------|\n| `--confidence-validation` | Use confidence scores for faster validation | `False` |\n| `--buffer_trimming` | Buffer trimming strategy (`sentence` or `segment`) | `segment` |\n\n\n\n\n> For diarization using Diart, you need to accept user conditions [here](https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fsegmentation) for the `pyannote\u002Fsegmentation` model, [here](https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fsegmentation-3.0) for the `pyannote\u002Fsegmentation-3.0` model and [here](https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fembedding) for the `pyannote\u002Fembedding` model. **Then**, login to HuggingFace: `huggingface-cli login`\n\n### 🚀 Deployment Guide\n\nTo deploy WhisperLiveKit in production:\n \n1. **Server Setup**: Install production ASGI server & launch with multiple workers\n   ```bash\n   pip install uvicorn gunicorn\n   gunicorn -k uvicorn.workers.UvicornWorker -w 4 your_app:app\n   ```\n\n2. **Frontend**: Host your customized version of the `html` example & ensure WebSocket connection points correctly\n\n3. **Nginx Configuration** (recommended for production):\n    ```nginx    \n   server {\n       listen 80;\n       server_name your-domain.com;\n        location \u002F {\n            proxy_pass http:\u002F\u002Flocalhost:8000;\n            proxy_set_header Upgrade $http_upgrade;\n            proxy_set_header Connection \"upgrade\";\n            proxy_set_header Host $host;\n    }}\n    ```\n\n4. **HTTPS Support**: For secure deployments, use \"wss:\u002F\u002F\" instead of \"ws:\u002F\u002F\" in WebSocket URL\n\n## 🐋 Docker\n\nDeploy the application easily using Docker with GPU or CPU support.\n\n### Prerequisites\n- Docker installed on your system\n- For GPU support: NVIDIA Docker runtime installed\n\n### Quick Start\n\n**With GPU acceleration (recommended):**\n```bash\ndocker build -t wlk .\ndocker run --gpus all -p 8000:8000 --name wlk wlk\n```\n\n**CPU only:**\n```bash\ndocker build -f Dockerfile.cpu -t wlk --build-arg EXTRAS=\"cpu\" .\ndocker run -p 8000:8000 --name wlk wlk\n```\n\n### Advanced Usage\n\n**Custom configuration:**\n```bash\n# Example with custom model and language\ndocker run --gpus all -p 8000:8000 --name wlk wlk --model large-v3 --language fr\n```\n\n**Compose (recommended for cache + token wiring):**\n```bash\n# GPU Sortformer profile\ndocker compose up --build wlk-gpu-sortformer\n\n# GPU Voxtral profile\ndocker compose up --build wlk-gpu-voxtral\n\n# CPU service\ndocker compose up --build wlk-cpu\n```\n\n### Memory Requirements\n- **Large models**: Ensure your Docker runtime has sufficient memory allocated\n\n\n#### Customization\n\n- `--build-arg` Options:\n  - `EXTRAS=\"cu129,diarization-sortformer\"` - GPU Sortformer profile extras.\n  - `EXTRAS=\"cu129,voxtral-hf,translation\"` - GPU Voxtral profile extras.\n  - `EXTRAS=\"cpu,diarization-diart,translation\"` - CPU profile extras.\n  - Hugging Face cache + token are configured in `compose.yml` using a named volume and `HF_TKN_FILE` (default: `.\u002Ftoken`).\n\n## Testing & Benchmarks\n\n```bash\n# Quick benchmark with the CLI\nwlk bench\nwlk bench --backend faster-whisper --model large-v3\nwlk bench --languages all --json results.json\n\n# Install test dependencies for full suite\npip install -e \".[test]\"\n\n# Run unit tests (no model download required)\npytest tests\u002F -v\n\n# Speed vs Accuracy scatter plot (all backends, compute-aware + unaware)\npython scripts\u002Fcreate_long_samples.py        # generate ~90s test samples (cached)\npython scripts\u002Frun_scatter_benchmark.py      # English (both modes)\npython scripts\u002Frun_scatter_benchmark.py --lang fr  # French\n```\n\n## Use Cases\nCapture discussions in real-time for meeting transcription, help hearing-impaired users follow conversations through accessibility tools, transcribe podcasts or videos automatically for content creation, transcribe support calls with speaker identification for customer service...\n","WhisperLiveKit 是一个用于实时语音转文字的自托管系统，具备超低延迟和说话人识别功能。项目基于最新的Simul-Whisper\u002FStreaming技术，采用AlignAtt策略实现高效低延迟的语音转录；同时集成了NLLW技术，支持200种语言之间的即时翻译。它还利用了Streaming Sortformer和Diart等前沿技术进行实时说话人分离，确保在多说话人场景下的准确性和流畅性。此外，WhisperLiveKit通过Voxtral Mini模型增强了对多种语言的支持，并使用Silero VAD来提高语音活动检测的精度。该项目特别适用于需要快速响应且能区分不同发言者的在线会议、直播字幕生成及远程教育等场景。",2,"2026-06-11 03:39:41","high_star"]