[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2355":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":16,"starSnapshotCount":16,"syncStatus":37,"lastSyncTime":38,"discoverSource":39},2355,"faster-whisper","SYSTRAN\u002Ffaster-whisper","SYSTRAN","Faster Whisper transcription with CTranslate2","",null,"Python",23546,1929,164,289,0,18,159,715,108,44.86,"MIT License",false,"master",[26,27,28,29,30,31,32,33],"deep-learning","inference","openai","quantization","speech-recognition","speech-to-text","transformer","whisper","2026-06-12 02:00:40","[![CI](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper\u002Fworkflows\u002FCI\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper\u002Factions?query=workflow%3ACI) [![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ffaster-whisper.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ffaster-whisper)\n\n# Faster Whisper transcription with CTranslate2\n\n**faster-whisper** is a reimplementation of OpenAI's Whisper model using [CTranslate2](https:\u002F\u002Fgithub.com\u002FOpenNMT\u002FCTranslate2\u002F), which is a fast inference engine for Transformer models.\n\nThis implementation is up to 4 times faster than [openai\u002Fwhisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper) for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.\n\n## Benchmark\n\n### Whisper\n\nFor reference, here's the time and memory usage that are required to transcribe [**13 minutes**](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0u7tTptBo9I) of audio using different implementations:\n\n* [openai\u002Fwhisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper)@[v20240930](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper\u002Ftree\u002Fv20240930)\n* [whisper.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fwhisper.cpp)@[v1.7.2](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fwhisper.cpp\u002Ftree\u002Fv1.7.2)\n* [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)@[v4.46.3](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fv4.46.3)\n* [faster-whisper](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper)@[v1.1.0](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper\u002Ftree\u002Fv1.1.0)\n\n### Large-v2 model on GPU\n\n| Implementation | Precision | Beam size | Time | VRAM Usage |\n| --- | --- | --- | --- | --- |\n| openai\u002Fwhisper | fp16 | 5 | 2m23s | 4708MB |\n| whisper.cpp (Flash Attention) | fp16 | 5 | 1m05s | 4127MB |\n| transformers (SDPA)[^1] | fp16 | 5 | 1m52s | 4960MB |\n| faster-whisper | fp16 | 5 | 1m03s | 4525MB |\n| faster-whisper (`batch_size=8`) | fp16 | 5 | 17s | 6090MB |\n| faster-whisper | int8 | 5 | 59s | 2926MB |\n| faster-whisper (`batch_size=8`) | int8 | 5 | 16s | 4500MB |\n\n### distil-whisper-large-v3 model on GPU\n\n| Implementation | Precision | Beam size | Time | YT Commons WER |\n| --- | --- | --- | --- | --- |\n| transformers (SDPA) (`batch_size=16`) | fp16 | 5 | 46m12s | 14.801 |\n| faster-whisper (`batch_size=16`) | fp16 | 5 | 25m50s | 13.527 |\n\n*GPU Benchmarks are Executed with CUDA 12.4 on a NVIDIA RTX 3070 Ti 8GB.*\n[^1]: transformers OOM for any batch size > 1\n\n### Small model on CPU\n\n| Implementation | Precision | Beam size | Time | RAM Usage |\n| --- | --- | --- | --- | --- |\n| openai\u002Fwhisper | fp32 | 5 | 6m58s | 2335MB |\n| whisper.cpp | fp32 | 5 | 2m05s | 1049MB |\n| whisper.cpp (OpenVINO) | fp32 | 5 | 1m45s | 1642MB |\n| faster-whisper | fp32 | 5 | 2m37s | 2257MB |\n| faster-whisper (`batch_size=8`) | fp32 | 5 | 1m06s | 4230MB |\n| faster-whisper | int8 | 5 | 1m42s | 1477MB |\n| faster-whisper (`batch_size=8`) | int8 | 5 | 51s | 3608MB |\n\n*Executed with 8 threads on an Intel Core i7-12700K.*\n\n\n## Requirements\n\n* Python 3.9 or greater\n\nUnlike openai-whisper, FFmpeg does **not** need to be installed on the system. The audio is decoded with the Python library [PyAV](https:\u002F\u002Fgithub.com\u002FPyAV-Org\u002FPyAV) which bundles the FFmpeg libraries in its package.\n\n### GPU\n\nGPU execution requires the following NVIDIA libraries to be installed:\n\n* [cuBLAS for CUDA 12](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcublas)\n* [cuDNN 9 for CUDA 12](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcudnn)\n\n**Note**: The latest versions of `ctranslate2` only support CUDA 12 and cuDNN 9. For CUDA 11 and cuDNN 8, the current workaround is downgrading to the `3.24.0` version of `ctranslate2`, for CUDA 12 and cuDNN 8, downgrade to the `4.4.0` version of `ctranslate2`, (This can be done with `pip install --force-reinstall ctranslate2==4.4.0` or specifying the version in a `requirements.txt`).\n\nThere are multiple ways to install the NVIDIA libraries mentioned above. The recommended way is described in the official NVIDIA documentation, but we also suggest other installation methods below. \n\n\u003Cdetails>\n\u003Csummary>Other installation methods (click to expand)\u003C\u002Fsummary>\n\n\n**Note:** For all these methods below, keep in mind the above note regarding CUDA versions. Depending on your setup, you may need to install the _CUDA 11_ versions of libraries that correspond to the CUDA 12 libraries listed in the instructions below.\n\n#### Use Docker\n\nThe libraries (cuBLAS, cuDNN) are installed in this official NVIDIA CUDA Docker images: `nvidia\u002Fcuda:12.3.2-cudnn9-runtime-ubuntu22.04`.\n\n#### Install with `pip` (Linux only)\n\nOn Linux these libraries can be installed with `pip`. Note that `LD_LIBRARY_PATH` must be set before launching Python.\n\n```bash\npip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*\n\nexport LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + \":\" + os.path.dirname(nvidia.cudnn.lib.__file__))'`\n```\n\n#### Download the libraries from Purfview's repository (Windows & Linux)\n\nPurfview's [whisper-standalone-win](https:\u002F\u002Fgithub.com\u002FPurfview\u002Fwhisper-standalone-win) provides the required NVIDIA libraries for Windows & Linux in a [single archive](https:\u002F\u002Fgithub.com\u002FPurfview\u002Fwhisper-standalone-win\u002Freleases\u002Ftag\u002Flibs). Decompress the archive and place the libraries in a directory included in the `PATH`.\n\n\u003C\u002Fdetails>\n\n## Installation\n\nThe module can be installed from [PyPI](https:\u002F\u002Fpypi.org\u002Fproject\u002Ffaster-whisper\u002F):\n\n```bash\npip install faster-whisper\n```\n\n\u003Cdetails>\n\u003Csummary>Other installation methods (click to expand)\u003C\u002Fsummary>\n\n### Install the master branch\n\n```bash\npip install --force-reinstall \"faster-whisper @ https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper\u002Farchive\u002Frefs\u002Fheads\u002Fmaster.tar.gz\"\n```\n\n### Install a specific commit\n\n```bash\npip install --force-reinstall \"faster-whisper @ https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper\u002Farchive\u002Fa4f1cc8f11433e454c3934442b5e1a4ed5e865c3.tar.gz\"\n```\n\n\u003C\u002Fdetails>\n\n## Usage\n\n### Faster-whisper\n\n```python\nfrom faster_whisper import WhisperModel\n\nmodel_size = \"large-v3\"\n\n# Run on GPU with FP16\nmodel = WhisperModel(model_size, device=\"cuda\", compute_type=\"float16\")\n\n# or run on GPU with INT8\n# model = WhisperModel(model_size, device=\"cuda\", compute_type=\"int8_float16\")\n# or run on CPU with INT8\n# model = WhisperModel(model_size, device=\"cpu\", compute_type=\"int8\")\n\nsegments, info = model.transcribe(\"audio.mp3\", beam_size=5)\n\nprint(\"Detected language '%s' with probability %f\" % (info.language, info.language_probability))\n\nfor segment in segments:\n    print(\"[%.2fs -> %.2fs] %s\" % (segment.start, segment.end, segment.text))\n```\n\n**Warning:** `segments` is a *generator* so the transcription only starts when you iterate over it. The transcription can be run to completion by gathering the segments in a list or a `for` loop:\n\n```python\nsegments, _ = model.transcribe(\"audio.mp3\")\nsegments = list(segments)  # The transcription will actually run here.\n```\n\n### Batched Transcription\nThe following code snippet illustrates how to run batched transcription on an example audio file. `BatchedInferencePipeline.transcribe` is a drop-in replacement for `WhisperModel.transcribe`\n\n```python\nfrom faster_whisper import WhisperModel, BatchedInferencePipeline\n\nmodel = WhisperModel(\"turbo\", device=\"cuda\", compute_type=\"float16\")\nbatched_model = BatchedInferencePipeline(model=model)\nsegments, info = batched_model.transcribe(\"audio.mp3\", batch_size=16)\n\nfor segment in segments:\n    print(\"[%.2fs -> %.2fs] %s\" % (segment.start, segment.end, segment.text))\n```\n\n### Faster Distil-Whisper\n\nThe Distil-Whisper checkpoints are compatible with the Faster-Whisper package. In particular, the latest [distil-large-v3](https:\u002F\u002Fhuggingface.co\u002Fdistil-whisper\u002Fdistil-large-v3)\ncheckpoint is intrinsically designed to work with the Faster-Whisper transcription algorithm. The following code snippet \ndemonstrates how to run inference with distil-large-v3 on a specified audio file:\n\n```python\nfrom faster_whisper import WhisperModel\n\nmodel_size = \"distil-large-v3\"\n\nmodel = WhisperModel(model_size, device=\"cuda\", compute_type=\"float16\")\nsegments, info = model.transcribe(\"audio.mp3\", beam_size=5, language=\"en\", condition_on_previous_text=False)\n\nfor segment in segments:\n    print(\"[%.2fs -> %.2fs] %s\" % (segment.start, segment.end, segment.text))\n```\n\nFor more information about the distil-large-v3 model, refer to the original [model card](https:\u002F\u002Fhuggingface.co\u002Fdistil-whisper\u002Fdistil-large-v3).\n\n### Word-level timestamps\n\n```python\nsegments, _ = model.transcribe(\"audio.mp3\", word_timestamps=True)\n\nfor segment in segments:\n    for word in segment.words:\n        print(\"[%.2fs -> %.2fs] %s\" % (word.start, word.end, word.word))\n```\n\n### VAD filter\n\nThe library integrates the [Silero VAD](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-vad) model to filter out parts of the audio without speech:\n\n```python\nsegments, _ = model.transcribe(\"audio.mp3\", vad_filter=True)\n```\n\nThe default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the [source code](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper\u002Fblob\u002Fmaster\u002Ffaster_whisper\u002Fvad.py). They can be customized with the dictionary argument `vad_parameters`:\n\n```python\nsegments, _ = model.transcribe(\n    \"audio.mp3\",\n    vad_filter=True,\n    vad_parameters=dict(min_silence_duration_ms=500),\n)\n```\nVad filter is enabled by default for batched transcription.\n\n### Logging\n\nThe library logging level can be configured like this:\n\n```python\nimport logging\n\nlogging.basicConfig()\nlogging.getLogger(\"faster_whisper\").setLevel(logging.DEBUG)\n```\n\n### Going further\n\nSee more model and transcription options in the [`WhisperModel`](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper\u002Fblob\u002Fmaster\u002Ffaster_whisper\u002Ftranscribe.py) class implementation.\n\n## Community integrations\n\nHere is a non exhaustive list of open-source projects using faster-whisper. Feel free to add your project to the list!\n\n\n* [speaches](https:\u002F\u002Fgithub.com\u002Fspeaches-ai\u002Fspeaches) is an OpenAI compatible server using `faster-whisper`. It's easily deployable with Docker, works with OpenAI SDKs\u002FCLI, supports streaming, and live transcription.\n* [WhisperX](https:\u002F\u002Fgithub.com\u002Fm-bain\u002FwhisperX) is an award-winning Python library that offers speaker diarization and accurate word-level timestamps using wav2vec2 alignment\n* [whisper-ctranslate2](https:\u002F\u002Fgithub.com\u002FSoftcatala\u002Fwhisper-ctranslate2) is a command line client based on faster-whisper and compatible with the original client from openai\u002Fwhisper.\n* [whisper-diarize](https:\u002F\u002Fgithub.com\u002FMahmoudAshraf97\u002Fwhisper-diarization) is a speaker diarization tool that is based on faster-whisper and NVIDIA NeMo.\n* [whisper-standalone-win](https:\u002F\u002Fgithub.com\u002FPurfview\u002Fwhisper-standalone-win) Standalone CLI executables of faster-whisper for Windows, Linux & macOS. \n* [asr-sd-pipeline](https:\u002F\u002Fgithub.com\u002Fhedrergudene\u002Fasr-sd-pipeline) provides a scalable, modular, end to end multi-speaker speech to text solution implemented using AzureML pipelines.\n* [Open-Lyrics](https:\u002F\u002Fgithub.com\u002Fzh-plus\u002FOpen-Lyrics) is a Python library that transcribes voice files using faster-whisper, and translates\u002Fpolishes the resulting text into `.lrc` files in the desired language using OpenAI-GPT.\n* [wscribe](https:\u002F\u002Fgithub.com\u002Fgeekodour\u002Fwscribe) is a flexible transcript generation tool supporting faster-whisper, it can export word level transcript and the exported transcript then can be edited with [wscribe-editor](https:\u002F\u002Fgithub.com\u002Fgeekodour\u002Fwscribe-editor)\n* [aTrain](https:\u002F\u002Fgithub.com\u002FBANDAS-Center\u002FaTrain) is a graphical user interface implementation of faster-whisper developed at the BANDAS-Center at the University of Graz for transcription and diarization in Windows ([Windows Store App](https:\u002F\u002Fapps.microsoft.com\u002Fdetail\u002Fatrain\u002F9N15Q44SZNS2)) and Linux.\n* [Whisper-Streaming](https:\u002F\u002Fgithub.com\u002Fufal\u002Fwhisper_streaming) implements real-time mode for offline Whisper-like speech-to-text models with faster-whisper as the most recommended back-end. It implements a streaming policy with self-adaptive latency based on the actual source complexity, and demonstrates the state of the art.\n* [WhisperLive](https:\u002F\u002Fgithub.com\u002Fcollabora\u002FWhisperLive) is a nearly-live implementation of OpenAI's Whisper which uses faster-whisper as the backend to transcribe audio in real-time.\n* [Faster-Whisper-Transcriber](https:\u002F\u002Fgithub.com\u002FBBC-Esq\u002Fctranslate2-faster-whisper-transcriber) is a simple but reliable voice transcriber that provides a user-friendly interface.\n* [Open-dubbing](https:\u002F\u002Fgithub.com\u002Fsoftcatala\u002Fopen-dubbing) is open dubbing is an AI dubbing system which uses machine learning models to automatically translate and synchronize audio dialogue into different languages.\n* [Whisper-FastAPI](https:\u002F\u002Fgithub.com\u002Fheimoshuiyu\u002Fwhisper-fastapi) whisper-fastapi is a very simple script that provides an API backend compatible with OpenAI, HomeAssistant, and Konele (Android voice typing) formats.\n\n## Model conversion\n\nWhen loading a model from its size such as `WhisperModel(\"large-v3\")`, the corresponding CTranslate2 model is automatically downloaded from the [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002FSystran).\n\nWe also provide a script to convert any Whisper models compatible with the Transformers library. They could be the original OpenAI models or user fine-tuned models.\n\nFor example the command below converts the [original \"large-v3\" Whisper model](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-large-v3) and saves the weights in FP16:\n\n```bash\npip install transformers[torch]>=4.23\n\nct2-transformers-converter --model openai\u002Fwhisper-large-v3 --output_dir whisper-large-v3-ct2\n--copy_files tokenizer.json preprocessor_config.json --quantization float16\n```\n\n* The option `--model` accepts a model name on the Hub or a path to a model directory.\n* If the option `--copy_files tokenizer.json` is not used, the tokenizer configuration is automatically downloaded when the model is loaded later.\n\nModels can also be converted from the code. See the [conversion API](https:\u002F\u002Fopennmt.net\u002FCTranslate2\u002Fpython\u002Fctranslate2.converters.TransformersConverter.html).\n\n### Load a converted model\n\n1. Directly load the model from a local directory:\n```python\nmodel = faster_whisper.WhisperModel(\"whisper-large-v3-ct2\")\n```\n\n2. [Upload your model to the Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_sharing#upload-with-the-web-interface) and load it from its name:\n```python\nmodel = faster_whisper.WhisperModel(\"username\u002Fwhisper-large-v3-ct2\")\n```\n\n## Comparing performance against other implementations\n\nIf you are comparing the performance against other Whisper implementations, you should make sure to run the comparison with similar settings. In particular:\n\n* Verify that the same transcription options are used, especially the same beam size. For example in openai\u002Fwhisper, `model.transcribe` uses a default beam size of 1 but here we use a default beam size of 5.\n* Transcription speed is closely affected by the number of words in the transcript, so ensure that other implementations have a similar WER (Word Error Rate) to this one.\n* When running on CPU, make sure to set the same number of threads. Many frameworks will read the environment variable `OMP_NUM_THREADS`, which can be set when running your script:\n\n```bash\nOMP_NUM_THREADS=4 python3 my_script.py\n```\n","faster-whisper 是一个基于 CTranslate2 的 OpenAI Whisper 模型的重实现，专注于提供更快的语音转文字服务。该项目通过使用高效的推理引擎和8位量化技术，在保持相同准确度的前提下，相比原版 Whisper 提升了高达4倍的速度，并且降低了内存消耗。它支持多种精度模式（如fp16、int8）及批量处理，适用于需要快速音频转写的应用场景，尤其是在资源受限的环境下。无论是用于实时字幕生成、会议记录还是大规模音频文件处理，faster-whisper 都能提供高效且经济的解决方案。",2,"2026-06-11 02:49:36","top_language"]