[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72815":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":8,"rankLanguage":8,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":8,"pushedAt":8,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":14,"starSnapshotCount":14,"syncStatus":15,"lastSyncTime":28,"discoverSource":29},72815,"insanely-fast-whisper","Vaibhavs10\u002Finsanely-fast-whisper","Vaibhavs10",null,"Jupyter Notebook",12969,954,89,101,0,2,20,88,6,78.74,"Apache License 2.0",false,"main",true,[],"2026-06-17 04:01:05","# Insanely Fast Whisper\n\nAn opinionated CLI to transcribe Audio files w\u002F Whisper on-device! Powered by 🤗 *Transformers*, *Optimum* & *flash-attn*\n\n**TL;DR** - Transcribe **150** minutes (2.5 hours) of audio in less than **98** seconds - with [OpenAI's Whisper Large v3](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-large-v3). Blazingly fast transcription is now a reality!⚡️\n\n```\npipx install insanely-fast-whisper==0.0.15 --force\n```\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Freach-vb\u002Frandom-images\u002Fresolve\u002Fmain\u002Finsanely-fast-whisper-img.png\" width=\"615\" height=\"308\">\n\u003C\u002Fp>\n\nNot convinced? Here are some benchmarks we ran on a Nvidia A100 - 80GB 👇\n\n| Optimisation type    | Time to Transcribe (150 mins of Audio) |\n|------------------|------------------|\n| large-v3 (Transformers) (`fp32`)             | ~31 (*31 min 1 sec*)             |\n| large-v3 (Transformers) (`fp16` + `batching [24]` + `bettertransformer`) | ~5 (*5 min 2 sec*)            |\n| **large-v3 (Transformers) (`fp16` + `batching [24]` + `Flash Attention 2`)** | **~2 (*1 min 38 sec*)**            |\n| distil-large-v2 (Transformers) (`fp16` + `batching [24]` + `bettertransformer`) | ~3 (*3 min 16 sec*)            |\n| **distil-large-v2 (Transformers) (`fp16` + `batching [24]` + `Flash Attention 2`)** | **~1 (*1 min 18 sec*)**           |\n| large-v2 (Faster Whisper) (`fp16` + `beam_size [1]`) | ~9.23 (*9 min 23 sec*)            |\n| large-v2 (Faster Whisper) (`8-bit` + `beam_size [1]`) | ~8 (*8 min 15 sec*)            |\n\nP.S. We also ran the benchmarks on a [Google Colab T4 GPU](\u002Fnotebooks\u002F) instance too!\n\nP.P.S. This project originally started as a way to showcase benchmarks for Transformers, but has since evolved into a lightweight CLI for people to use. This is purely community driven. We add whatever community seems to have a strong demand for! \n\n## 🆕 Blazingly fast transcriptions via your terminal! ⚡️\n\nWe've added a CLI to enable fast transcriptions. Here's how you can use it:\n\nInstall `insanely-fast-whisper` with `pipx` (`pip install pipx` or `brew install pipx`):\n\n```bash\npipx install insanely-fast-whisper\n```\n\n⚠️ If you have python 3.11.XX installed, `pipx` may parse the version incorrectly and install a very old version of `insanely-fast-whisper` without telling you (version `0.0.8`, which won't work anymore with the current `BetterTransformers`). In that case, you can install the latest version by passing `--ignore-requires-python` to `pip`:\n\n```bash\npipx install insanely-fast-whisper --force --pip-args=\"--ignore-requires-python\"\n```\n\nIf you're installing with `pip`, you can pass the argument directly: `pip install insanely-fast-whisper --ignore-requires-python`.\n\n\nRun inference from any path on your computer:\n\n```bash\ninsanely-fast-whisper --file-name \u003Cfilename or URL>\n```\n*Note: if you are running on macOS, you also need to add `--device-id mps` flag.*\n\n🔥 You can run [Whisper-large-v3](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-large-v3) w\u002F [Flash Attention 2](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention) from this CLI too:\n\n```bash\ninsanely-fast-whisper --file-name \u003Cfilename or URL> --flash True \n```\n\n🌟 You can run [distil-whisper](https:\u002F\u002Fhuggingface.co\u002Fdistil-whisper) directly from this CLI too:\n\n```bash\ninsanely-fast-whisper --model-name distil-whisper\u002Flarge-v2 --file-name \u003Cfilename or URL> \n```\n\nDon't want to install `insanely-fast-whisper`? Just use `pipx run`:\n\n```bash\npipx run insanely-fast-whisper --file-name \u003Cfilename or URL>\n```\n\n> [!NOTE]\n> The CLI is highly opinionated and only works on NVIDIA GPUs & Mac. Make sure to check out the defaults and the list of options you can play around with to maximise your transcription throughput. Run `insanely-fast-whisper --help` or `pipx run insanely-fast-whisper --help` to get all the CLI arguments along with their defaults. \n\n\n## CLI Options\n\nThe `insanely-fast-whisper` repo provides an all round support for running Whisper in various settings. Note that as of today 26th Nov, `insanely-fast-whisper` works on both CUDA and mps (mac) enabled devices.\n```\n  -h, --help            show this help message and exit\n  --file-name FILE_NAME\n                        Path or URL to the audio file to be transcribed.\n  --device-id DEVICE_ID\n                        Device ID for your GPU. Just pass the device number when using CUDA, or \"mps\" for Macs with Apple Silicon. (default: \"0\")\n  --transcript-path TRANSCRIPT_PATH\n                        Path to save the transcription output. (default: output.json)\n  --model-name MODEL_NAME\n                        Name of the pretrained model\u002F checkpoint to perform ASR. (default: openai\u002Fwhisper-large-v3)\n  --task {transcribe,translate}\n                        Task to perform: transcribe or translate to another language. (default: transcribe)\n  --language LANGUAGE   \n                        Language of the input audio. (default: \"None\" (Whisper auto-detects the language))\n  --batch-size BATCH_SIZE\n                        Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 24)\n  --flash FLASH         \n                        Use Flash Attention 2. Read the FAQs to see how to install FA2 correctly. (default: False)\n  --timestamp {chunk,word}\n                        Whisper supports both chunked as well as word level timestamps. (default: chunk)\n  --hf-token HF_TOKEN\n                        Provide a hf.co\u002Fsettings\u002Ftoken for Pyannote.audio to diarise the audio clips\n  --diarization_model DIARIZATION_MODEL\n                        Name of the pretrained model\u002F checkpoint to perform diarization. (default: pyannote\u002Fspeaker-diarization)\n  --num-speakers NUM_SPEAKERS\n                        Specifies the exact number of speakers present in the audio file. Useful when the exact number of participants in the conversation is known. Must be at least 1. Cannot be used together with --min-speakers or --max-speakers. (default: None)\n  --min-speakers MIN_SPEAKERS\n                        Sets the minimum number of speakers that the system should consider during diarization. Must be at least 1. Cannot be used together with --num-speakers. Must be less than or equal to --max-speakers if both are specified. (default: None)\n  --max-speakers MAX_SPEAKERS\n                        Defines the maximum number of speakers that the system should consider in diarization. Must be at least 1. Cannot be used together with --num-speakers. Must be greater than or equal to --min-speakers if both are specified. (default: None)\n```\n\n## Frequently Asked Questions\n\n**How to correctly install flash-attn to make it work with `insanely-fast-whisper`?**\n\nMake sure to install it via `pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation`. Massive kudos to @li-yifei for helping with this.\n\n**How to solve an `AssertionError: Torch not compiled with CUDA enabled` error on Windows?**\n\nThe root cause of this problem is still unknown, however, you can resolve this by manually installing torch in the virtualenv like `python -m pip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121`. Thanks to @pto2k for all tdebugging this.\n\n**How to avoid Out-Of-Memory (OOM) exceptions on Mac?**\n\nThe *mps* backend isn't as optimised as CUDA, hence is way more memory hungry. Typically you can run with `--batch-size 4` without any issues (should use roughly 12GB GPU VRAM). Don't forget to set `--device-id mps`.\n\n## How to use Whisper without a CLI?\n\n\u003Cdetails>\n\u003Csummary>All you need to run is the below snippet:\u003C\u002Fsummary>\n\n```\npip install --upgrade transformers optimum accelerate\n```\n\n```python\nimport torch\nfrom transformers import pipeline\nfrom transformers.utils import is_flash_attn_2_available\n\npipe = pipeline(\n    \"automatic-speech-recognition\",\n    model=\"openai\u002Fwhisper-large-v3\", # select checkpoint from https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-large-v3#model-details\n    torch_dtype=torch.float16,\n    device=\"cuda:0\", # or mps for Mac devices\n    model_kwargs={\"attn_implementation\": \"flash_attention_2\"} if is_flash_attn_2_available() else {\"attn_implementation\": \"sdpa\"},\n)\n\noutputs = pipe(\n    \"\u003CFILE_NAME>\",\n    chunk_length_s=30,\n    batch_size=24,\n    return_timestamps=True,\n)\n\noutputs\n```\n\u003C\u002Fdetails>\n\n## Acknowledgements\n\n1. [OpenAI Whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper) team for open sourcing such a brilliant check point.\n2. Hugging Face Transformers team, specifically [Arthur](https:\u002F\u002Fgithub.com\u002FArthurZucker), [Patrick](https:\u002F\u002Fgithub.com\u002Fpatrickvonplaten), [Sanchit](https:\u002F\u002Fgithub.com\u002Fsanchit-gandhi) & [Yoach](https:\u002F\u002Fgithub.com\u002Fylacombe)  (alphabetical order) for continuing to maintain Whisper in Transformers.\n3. Hugging Face [Optimum](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Foptimum) team for making the BetterTransformer API so easily accessible.\n4. [Patrick Arminio](https:\u002F\u002Fgithub.com\u002Fpatrick91) for helping me tremendously to put together this CLI.\n\n## Community showcase\n\n1. @ochen1 created a brilliant MVP for a CLI here: https:\u002F\u002Fgithub.com\u002Fochen1\u002Finsanely-fast-whisper-cli (Try it out now!)\n2. @arihanv created an app (Shush) using NextJS (Frontend) & Modal (Backend): https:\u002F\u002Fgithub.com\u002Farihanv\u002FShush (Check it outtt!)\n3. @kadirnar created a python package on top of the transformers with optimisations: https:\u002F\u002Fgithub.com\u002Fkadirnar\u002Fwhisper-plus (Go go go!!!)\n","Insanely Fast Whisper 是一个用于音频文件转录的命令行工具，能够在本地设备上快速运行。它基于 Hugging Face 的 Transformers、Optimum 和 Flash Attention 技术，能够以惊人的速度完成音频转录任务，例如在 Nvidia A100 GPU 上仅需约 98 秒即可转录 150 分钟的音频。该工具支持多种优化选项，包括 FP16、批处理和 Flash Attention 2，显著提升了转录效率。适用于需要高效处理大量音频数据的场景，如会议记录、播客转录等。项目由社区驱动，持续更新以满足用户需求。","2026-06-17 03:42:41","high_star"]