[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73375":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":45,"readmeContent":46,"aiSummary":47,"trendingCount":16,"starSnapshotCount":16,"syncStatus":48,"lastSyncTime":49,"discoverSource":50},73375,"FluidAudio","FluidInference\u002FFluidAudio","FluidInference","Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source. ","https:\u002F\u002Fdocs.fluidinference.com\u002Fintroduction",null,"Swift",2161,303,42,4,0,20,54,150,60,109.45,"Apache License 2.0",false,"main",[26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],"ane","asr","audio","automatic-speech-recognition","avfoundation","coreml","ios","macos","nvidia","parakeet","real-time","speaker-diarization","speaker-embedding","speaker-identification","speaker-recognition","speech-to-text","swift","vad","voice-activity-detection","2026-06-12 04:01:09","![banner.png](banner.png)\n\n# FluidAudio - Transcription, Text-to-speech, VAD, Speaker diarization with CoreML Models\n\n[![Swift](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSwift-6.0+-orange.svg)](https:\u002F\u002Fswift.org)\n[![Platform](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPlatform-macOS%20%7C%20iOS-blue.svg)](https:\u002F\u002Fdeveloper.apple.com)\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-docs.fluidinference.com-008574.svg)](https:\u002F\u002Fdocs.fluidinference.com\u002Fintroduction)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join%20Chat-7289da.svg)](https:\u002F\u002Fdiscord.gg\u002FWNsvaCtmDe)\n[![Hugging Face Models](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face%20Models-800k%2B%20downloads-brightgreen?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002FFluidInference)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002FFluidInference\u002FFluidAudio)\n\nFluidAudio is a Swift SDK for fully local, low-latency audio AI on Apple devices, with inference offloaded to the Apple Neural Engine (ANE), resulting in less memory and generally faster inference.\n\nThe SDK includes state-of-the-art speaker diarization, transcription, and voice activity detection via open-source models (MIT\u002FApache 2.0) that can be integrated with just a few lines of code. Models are optimized for background processing, ambient computing and always on workloads by running inference on the ANE, minimizing CPU usage and avoiding GPU\u002FMPS entirely.\n\nFor custom use cases, feedback, additional model support, or platform requests, join our [Discord](https:\u002F\u002Fdiscord.gg\u002FWNsvaCtmDe). We're also bringing visual, language, and TTS models to device and will share updates there.\n\nBelow are some featured local AI apps using Fluid Audio models on macOS and iOS:\n\n\u003Cp align=\"left\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FBeingpax\u002FVoiceInk\u002F\">\u003Cimg src=\"Documentation\u002Fassets\u002Fvoiceink.png\" height=\"40\" alt=\"Voice Ink\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fspokenly.app\u002F\">\u003Cimg src=\"Documentation\u002Fassets\u002Fspokenly.png\" height=\"40\" alt=\"Spokenly\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fslipbox.ai\u002F\">\u003Cimg src=\"Documentation\u002Fassets\u002Fslipbox.png\" height=\"40\" alt=\"Slipbox\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhex.kitlangton.com\u002F\">\u003Cimg src=\"Documentation\u002Fassets\u002Fhex.png\" height=\"40\" alt=\"Hex\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fboltai.com\u002F\">\u003Cimg src=\"Documentation\u002Fassets\u002Fboltai.png\" height=\"40\" alt=\"BoltAI\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fparaspeech.com\">\u003Cimg src=\"Documentation\u002Fassets\u002Fparaspeech.png\" height=\"40\" alt=\"Paraspeech\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Faltic.dev\u002Ffluid\">\u003Cimg src=\"Documentation\u002Fassets\u002Ffluidvoice.png\" height=\"40\" alt=\"Fluid Voice\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fsnaply.ai\">\u003Cimg src=\"Documentation\u002Fassets\u002Fsnaply.png\" height=\"40\" alt=\"Snaply\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fyazinsai\u002FOpenOats\">\u003Cimg src=\"Documentation\u002Fassets\u002Fopenoats.png\" height=\"40\" alt=\"OpenOats\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Ftalat.app\">\u003Cimg src=\"Documentation\u002Fassets\u002Ftalat.png\" height=\"40\" alt=\"Talat\">\u003C\u002Fa>\n\u003C!-- Add your app: submit logo via PR. The Fluid Inference team works to curate this and add new apps to the showcase section every couple of weeks. We appreciate your patience. -->\n\u003C\u002Fp>\n\nWant to convert your own model? Check [möbius](https:\u002F\u002Fgithub.com\u002FFluidInference\u002Fmobius)\n\n## Highlights\n\n- **Automatic Speech Recognition (ASR)**: [Parakeet TDT v3](Documentation\u002FModels.md#batch-transcription-near-real-time) (0.6b) and other TDT\u002FCTC models for batch transcription supporting 25 European languages, Japanese, and Chinese; [Parakeet EOU](Documentation\u002FModels.md#streaming-transcription-true-real-time) (120m) for streaming ASR with end-of-utterance detection (English only). See all [ASR models](Documentation\u002FModels.md#asr-models).\n- **Inverse Text Normalization (ITN)**: Post-process ASR output to convert spoken-form to written-form (\"two hundred\" → \"200\"). See [text-processing-rs](https:\u002F\u002Fgithub.com\u002FFluidInference\u002Ftext-processing-rs)\n- **Text-to-Speech (TTS)**: Kokoro (82m) for parallel synthesis with SSML and pronunciation control across 9 languages (EN, ES, FR, HI, IT, JA, PT, ZH); PocketTTS for streaming TTS with voice cloning support (EN, DE, ES, FR, IT, PT — 6L and 24L variants); **Magpie (357m, experimental)** autoregressive multilingual TTS with 5 speakers, `|…|` IPA override, and 8-language coverage (EN, ES, DE, FR, IT, VI, ZH, HI) — note: quite slow (~0.04 RTFx on Apple Silicon, ~25× slower than realtime) and needs further perf work, see [Magpie docs](Documentation\u002FTTS\u002FMagpie.md) before adopting\n- **Speaker Diarization (Online + Offline)**: Speaker separation and identification across audio streams. Streaming pipeline for real-time processing and offline batch pipeline with advanced clustering.\n- **Speaker Embedding Extraction**: Generate speaker embeddings for voice comparison and clustering, you can use this for speaker identification\n- **Voice Activity Detection (VAD)**: Voice activity detection with Silero models\n- **Apple Neural Engine**: Models run efficiently on Apple's ANE for maximum performance with minimal power consumption\n- **Open-Source Models**: All models are publicly available on HuggingFace — converted and optimized by our team; permissive licenses. See [full model catalog](Documentation\u002FModels.md).\n\n## Video Demos\n\n| Link | Description |\n| --- | --- |\n| **[Spokenly Real-time ASR](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=9fXKKkyL8JE)** | Video demonstration of FluidAudio's transcription accuracy and speed |\n| **[Senko Integration](https:\u002F\u002Fx.com\u002Fhamza_q_\u002Fstatus\u002F1970228971657928995)** | Python Speaker diarization on Mac using FluidAudio's segmentation model |\n| **[Kokoro TTS](https:\u002F\u002Fx.com\u002Fsach1n\u002Fstatus\u002F1977817056507793521)** | Text-to-speech demo using FluidAudio's Kokoro and Silero models on iOS |\n| **[Parakeet Realtime EOU](https:\u002F\u002Fx.com\u002Fsach1n\u002Fstatus\u002F2003210626659680762)** | Parakeet streaming ASR with end-of-utterance detection on iOS |\n| **[Sortformer Diarization](https:\u002F\u002Fx.com\u002FAlex_tra_memory\u002Fstatus\u002F2010530705667661843)** | Sortformer for speaker diarization with overlapping speech on iOS |\n| **[PocketTTS](https:\u002F\u002Fx.com\u002Fsach1n\u002Fstatus\u002F2017627657006158296)** | Streaming text-to-speech using PocketTTS on iOS |\n| **[Parakeet EOU Ultra-Low Latency](https:\u002F\u002Fx.com\u002Fy_earu\u002Fstatus\u002F2038654262608064967)** | Real-time Parakeet EOU transcription on iOS demonstrating ultra-low latency speech-to-text |\n| **[Action Phrase Live Production Control](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ykcvdTHHmrk)** | Voice-controlled live production workflow using FluidAudio's ASR and speaker diarization to trigger cameras, graphics, and layouts with natural voice commands |\n| **[talat - VAD, ASR, Speaker ID](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OjP4Adrv9_E)** | A video demo showcasing FluidAudio's VAD, two different ASR models, and speaker diarization during a talat.app meeting recording |\n\n## Showcase\n\nMake a PR if you want to add your app, please keep it in chronological order.\n\n| App | Description |\n| --- | --- |\n| **[Voice Ink](https:\u002F\u002Ftryvoiceink.com\u002F)** | Local AI for instant, private transcription with near-perfect accuracy. Uses Parakeet ASR. |\n| **[Spokenly](https:\u002F\u002Fspokenly.app\u002F)** | Mac dictation app for fast, accurate voice-to-text; supports real-time dictation and file transcription. Uses Parakeet ASR and speaker diarization. |\n| **[Senko](https:\u002F\u002Fgithub.com\u002Fnarcotic-sh\u002Fsenko)** | A very fast and accurate speaker diarization pipeline. A [good example](https:\u002F\u002Fgithub.com\u002Fnarcotic-sh\u002Fsenko\u002Fcommit\u002F51dbd8bde764c3c6648dbbae57d6aff66c5ca15c) for how to integrate FluidAudio into a Python app |\n| **[Slipbox](https:\u002F\u002Fslipbox.ai\u002F)** | Privacy-first meeting assistant for real-time conversation intelligence. Uses Parakeet ASR (iOS) and speaker diarization across platforms. |\n| **[Whisper Mate](https:\u002F\u002Fwhisper.marksdo.com)** | Transcribes movies and audio locally; records and transcribes in real time from speakers or system apps. Uses speaker diarization. |\n| **[Altic\u002FFluid Voice](https:\u002F\u002Fgithub.com\u002Faltic-dev\u002FFluid-oss)** | Lightweight Fully free and Open Source Voice to Text dictation for macOS built using FluidAudio. Never pay for dictation apps |\n| **[Paraspeech](https:\u002F\u002Fparaspeech.com)** | AI powered voice to text. Fully offline. No subscriptions. |\n| **[mac-whisper-speedtest](https:\u002F\u002Fgithub.com\u002Fanvanvan\u002Fmac-whisper-speedtest)** | Comparison of different local ASR, including one of the first versions of FluidAudio's ASR models |\n| **[Starling](https:\u002F\u002Fgithub.com\u002FRyandonofrio3\u002FStarling)** | Open Source, fully local voice-to-text transcription with auto-paste at your cursor. |\n| **[BoltAI](https:\u002F\u002Fboltai.com\u002F)** | Write content 10x faster using parakeet models |\n| **[Voxeoflow](https:\u002F\u002Fwww.voxeoflow.app)** | Mac dictation app with real-time translation. Lightning-fast transcription in over 100 languages, instantly translated to your target language. |\n| **[Speakmac](https:\u002F\u002Fspeakmac.app)** | Mac app that lets you type anywhere on your Mac using your voice. Fully local, private dictation built on FluidAudio. |\n| **[SamScribe](https:\u002F\u002Fgithub.com\u002FSteven-Weng\u002FSamScribe)** | An open-source macOS app that captures and transcribes audio from your microphone and meeting applications (Zoom, Teams, Chrome) in real-time, with cross-session speaker recognition. |\n| **[WhisKey](https:\u002F\u002Fwhiskey.asktobuild.app\u002F)** | Privacy-first voice dictation keyboard for iOS and macOS. On-device transcription with 12+ languages, AI meeting summaries, and mindmap generation. Great for daily use and vibe-coding. Uses speaker diarization. |\n| **[Dictate Anywhere](https:\u002F\u002Fgithub.com\u002Fhoomanaskari\u002Fmac-dictate-anywhere)** | Native macOS dictation app with global Fn key activation. Dictate into any app with 25 language support. Uses Parakeet ASR. |\n| **[hongbomiao.com](https:\u002F\u002Fgithub.com\u002Fhongbo-miao\u002Fhongbomiao.com)** | A personal R&D lab that facilitates knowledge sharing. Uses Parakeet ASR. |\n| **[Hex](https:\u002F\u002Fgithub.com\u002Fkitlangton\u002FHex)** | macOS app that lets you press-and-hold a hotkey to record your voice, transcribe it, and paste into any application. Uses Parakeet ASR. |\n| **[Super Voice Assistant](https:\u002F\u002Fgithub.com\u002Fykdojo\u002Fsuper-voice-assistant)** | Open-source macOS voice assistant with local transcription. Uses Parakeet ASR. |\n| **[VoiceTypr](https:\u002F\u002Fgithub.com\u002Fmoinulmoin\u002Fvoicetypr)** | Open-source voice-to-text dictation for macOS and Windows. Uses Parakeet ASR. |\n| **[Summit AI Notes](https:\u002F\u002Fsummitnotes.app\u002F)** | Local meeting transcription and summarization with speaker identification. Supports 100+ languages. |\n| **[Ora](https:\u002F\u002Ffuturelab.studio\u002Fora)** | Local voice assistant for macOS with speech recognition and text-to-speech. |\n| **[Flowstay](https:\u002F\u002Fflowstay.app)** | Easy text-to-speech, local post-processing and Claude Code integration for macOS. Free forever. |\n| **[macos-speech-server](https:\u002F\u002Fgithub.com\u002Fdokterbob\u002Fmacos-speech-server)** | OpenAI compatible STT\u002Ftranscription and TTS\u002Fspeech API server. |\n| **[Snaply](https:\u002F\u002Fsnaply.ai)** |Free, Fast, 100% local AI dictation for Mac. |\n| **[OpenOats](https:\u002F\u002Fgithub.com\u002Fyazinsai\u002FOpenOats)** | Open-source meeting note-taker that transcribes conversations in real time and surfaces relevant notes from your knowledge base. Uses FluidAudio for local transcription. |\n| **[Enconvo](https:\u002F\u002Fenconvo.com)** | AI Agent Launcher for macOS with voice input, live captions, and text-to-speech. Uses Parakeet ASR for local speech recognition. |\n| **[Meeting Transcriber](https:\u002F\u002Fgithub.com\u002Fpasrom\u002Fmeeting-transcriber)** | macOS menu bar app that auto-detects, records, and transcribes meetings (Teams, Zoom, Webex) with dual-track speaker diarization. Uses Parakeet ASR, Qwen3-ASR, and speaker diarization. |\n| **[Hitoku Draft](https:\u002F\u002Fhitoku.me\u002Fdraft)** | A local, private, voice writing assistant on your macOS menu bar. Uses Parakeet ASR. |\n| **[Audite](https:\u002F\u002Fgithub.com\u002Fzachatrocity\u002Faudite)** | macOS menu-bar app that records meetings and transcribes them locally into Markdown notes for Obsidian. Uses Parakeet ASR via FluidAudio on the Apple Neural Engine. |\n| **[Muesli](https:\u002F\u002Fgithub.com\u002FpHequals7\u002Fmuesli)** | Native macOS dictation and meeting transcription with ~0.13s latency. Captures microphone and system audio with automatic speaker diarization. Uses Parakeet TDT and Qwen3 ASR. |\n| **[NanoVoice](https:\u002F\u002Fapps.apple.com\u002Fkz\u002Fapp\u002Fnanovoice\u002Fid6760539688)** | Free iOS voice keyboard for fast, private dictation in any app. Uses Parakeet ASR. |\n| **[MiniWhisper](https:\u002F\u002Fgithub.com\u002Fandyhtran\u002FMiniWhisper)** | Open-source macOS menu bar for quick local voice-to-text with minimal setup. Pick a shortcut, start talking. Uses Parakeet ASR. |\n| **[Talat](https:\u002F\u002Ftalat.app)** | Privacy-focused AI meeting notes app. Records and transcribes meetings locally on your Mac with speaker identification and LLM-powered summaries. Featured in [TechCrunch](https:\u002F\u002Ftechcrunch.com\u002F2026\u002F03\u002F24\u002Ftalats-ai-meeting-notes-stay-on-your-machine-not-in-the-cloud\u002F). Uses Parakeet ASR. |\n| **[Volocal](https:\u002F\u002Fgithub.com\u002Ffikrikarim\u002Fvolocal)** | Fully local voice AI on iOS. Uses streaming Parakeet EOU ASR and streaming PocketTTS. |\n| **[VivaDicta](https:\u002F\u002Fgithub.com\u002Fn0an\u002FVivaDicta)** | Open-source iOS voice-to-text app with system-wide AI voice keyboard — dictate and AI-process text in any app. 15+ AI providers, 40+ AI presets. Uses Parakeet ASR. |\n| **[MimicScribe](https:\u002F\u002Fmimicscribe.app\u002F)** | macOS menu bar app combining Parakeet TDT streaming ASR, PyanNote Community 1 speaker diarization, and cloud LLMs to provide AI-generated talking points during meetings, derived from the live transcript and user-provided instructions. Features meeting summarization, natural language search, an MCP server for agent integration, and a keyboard- and voice-forward UI. |\n| **[Action Phrase](https:\u002F\u002Factionphrase.com\u002F)** | Voice-controlled live production app for iOS, iPadOS, and macOS. Control cameras, graphics, layouts, and production workflows with natural voice commands. Integrates with popular tools including OBS, vMix, ProPresenter, Bitfocus Companion, and more. Uses Parakeet TDT ASR and Sortformer diarization. |\n| **[Sayboard](https:\u002F\u002Fgithub.com\u002Fstanlsv\u002Fsayboard)** | Privacy-first AI voice keyboard for iOS. Local models, no servers, no tracking, no subscriptions, no ads, no in-app purchases. Fully offline and open-source. |\n| **[Kesha Voice Kit](https:\u002F\u002Fgithub.com\u002Fdrakulavich\u002Fkesha-voice-kit)** | Open-source voice toolkit for Apple Silicon. CLI tool and [OpenClaw](https:\u002F\u002Fgithub.com\u002Fopenclaw\u002Fopenclaw) skill that gives LLM agents local speech-to-text in 25 languages. Uses Parakeet TDT ASR via FluidAudio. |\n| **[Dictato](https:\u002F\u002Fdicta.to)** | Turn your voice into text anywhere on your Mac. Fully local, private, and offline — boost your own vocabulary and dictate in multiple languages. Uses Parakeet TDT ASR. |\n| **[Utter](https:\u002F\u002Fgithub.com\u002Fjoepetrakovich\u002Futter)** | An ultra-minimal speech-to-text status bar utility for Mac.  Register a hotkey and go. |\n| **[Resonant](https:\u002F\u002Fonresonant.com)** | macOS voice workspace for dictation, meetings, and ambient work context. Uses FluidAudio for local transcription and speaker diarization. |\n| **[Thoth](https:\u002F\u002Fthoth-app.com)** | Privacy-first meeting recorder for Mac. Records both sides of any call with dual-channel audio, transcribes locally with speaker diarization, and summarizes with on-device AI or BYOK cloud. Available on the Mac App Store. Featured in [MacGeneration](https:\u002F\u002Fwww.macg.co\u002Flogiciels\u002F2026\u002F05\u002Fthoth-une-nouvelle-app-de-transcription-axee-sur-les-reunions-et-le-temps-reel-308471). Uses Parakeet EOU and Parakeet TDT ASR. |\n| **[Dettivo](https:\u002F\u002Fdettivo.com)** | Local-first Mac app for private dictation, transcripts, and meeting workflows in one place, with developer tooling across CLI, MCP, REST, and app automation. Uses FluidAudio Parakeet TDT ASR and offline speaker diarization. |\n\n## Installation\n\nAdd FluidAudio to your project using Swift Package Manager:\n\n```swift\ndependencies: [\n    .package(url: \"https:\u002F\u002Fgithub.com\u002FFluidInference\u002FFluidAudio.git\", from: \"0.12.4\"),\n],\n```\n\n**In Xcode:**\n1. Add the FluidAudio package to your project\n2. In the \"Add Package\" dialog, select `FluidAudio`\n3. Add it to your app target\n\n**In Package.swift:**\n```swift\n.product(name: \"FluidAudio\", package: \"FluidAudio\")\n```\n\n**CocoaPods:** We recommend using [cocoapods-spm](https:\u002F\u002Fgithub.com\u002Ftrinhngocthuyen\u002Fcocoapods-spm) for better SPM integration, but if needed, you can also use our podspec: `pod 'FluidAudio', '~> 0.12.4'`\n\n### Other Frameworks\n\nBuilding with a different framework? Use one of our official wrappers:\n\n| Platform | Package | Install |\n|----------|---------|---------|\n| **React Native \u002F Expo** | [@fluidinference\u002Freact-native-fluidaudio](https:\u002F\u002Fgithub.com\u002FFluidInference\u002Freact-native-fluidaudio) | `npm install @fluidinference\u002Freact-native-fluidaudio` |\n| **Rust \u002F Tauri** | [fluidaudio-rs](https:\u002F\u002Fgithub.com\u002FFluidInference\u002Ffluidaudio-rs) | `cargo add fluidaudio-rs` |\n\n### Post-Processing Tools\n\nEnhance ASR output with post-processing:\n\n| Tool | Description | Language |\n|------|-------------|----------|\n| **[text-processing-rs](https:\u002F\u002Fgithub.com\u002FFluidInference\u002Ftext-processing-rs)** | Inverse Text Normalization (ITN) and Text Normalization (TN) across 7 languages (EN, DE, ES, FR, HI, JA, ZH). 100% NeMo test compatibility (3,011 tests). Converts spoken-form ASR output to written form (\"two hundred\" → \"200\", \"five dollars\" → \"$5\"). Rust port of [NVIDIA NeMo Text Processing](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNeMo-text-processing) with Swift wrapper. | Rust, Swift |\n\n## Configuration\n\n### Quick Reference\n\nBoth solve the same problem: **\"I can't reach HuggingFace directly.\"** They're alternative approaches - pick whichever matches your setup:\n\n| Scenario | Solution | Configuration |\n|----------|----------|---|\n| You have a **local mirror or internal model server** | Use Registry URL override | `REGISTRY_URL=https:\u002F\u002Fyour-mirror.com` |\n| You're **behind a corporate firewall** with a proxy that can reach HuggingFace | Use Proxy configuration | `https_proxy=http:\u002F\u002Fproxy.company.com:8080` |\n\n**How they work:**\n- **Registry URL** - App requests from `your-mirror.com` instead of `huggingface.co`\n- **Proxy** - App still requests `huggingface.co`, but traffic routes through proxy to reach it\n\nIn most cases, you only need one. (You'd use both only if your mirror is behind the proxy and unreachable without it.)\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Model Registry URL\u003C\u002Fb> - Change download destination\u003C\u002Fsummary>\n\nBy default, FluidAudio downloads models from HuggingFace. You can override this to use a mirror, local server, or air-gapped environment.\n\n**Programmatic override (recommended for apps):**\n```swift\nimport FluidAudio\n\n\u002F\u002F Set custom registry before using any managers\nModelRegistry.baseURL = \"https:\u002F\u002Fyour-mirror.example.com\"\n\n\u002F\u002F Models will now download from the custom registry\nlet diarizer = DiarizerManager()\n```\n\n**Environment Variables (recommended for CLI\u002Ftesting):**\n```bash\n# Use custom registry\nexport REGISTRY_URL=https:\u002F\u002Fyour-mirror.example.com\nswift run fluidaudiocli transcribe audio.wav\n\n# Or use the MODEL_REGISTRY_URL alias\nexport MODEL_REGISTRY_URL=https:\u002F\u002Fmodels.internal.corp\nswift run fluidaudiocli diarization-benchmark --auto-download\n```\n\n**Xcode Scheme Configuration:**\n1. Edit Scheme → Run → Arguments\n2. Go to **Environment Variables** tab\n3. Click `+` and add: `REGISTRY_URL` = `https:\u002F\u002Fyour-mirror.example.com`\n4. The custom registry will apply to all debug runs\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Proxy Configuration\u003C\u002Fb> - Route downloads through a proxy server\u003C\u002Fsummary>\n\nIf you're behind a corporate firewall and cannot reach HuggingFace (or your registry) directly, configure a proxy to forward requests:\n\nSet the `https_proxy` environment variable:\n\n```bash\nexport https_proxy=http:\u002F\u002Fproxy.company.com:8080\n# or for authenticated proxies:\nexport https_proxy=http:\u002F\u002Fuser:password@proxy.company.com:8080\n\nswift run fluidaudiocli transcribe audio.wav\n```\n\n**Xcode Scheme Configuration for Proxy:**\n1. Edit Scheme → Run → Arguments\n2. Go to **Environment Variables** tab\n3. Click `+` and add: `https_proxy` = `http:\u002F\u002Fproxy.company.com:8080`\n4. FluidAudio will automatically route downloads through the proxy\n\n\u003C\u002Fdetails>\n\n## Documentation\n\n**[DeepWiki](https:\u002F\u002Fdeepwiki.com\u002FFluidInference\u002FFluidAudio)** for auto-generated docs for this repo.\n\n### Documentation Index\n\n- Guides\n  - [Audio Conversion for Inference](Documentation\u002FGuides\u002FAudioConversion.md)\n  - Manual model download & loading options: [ASR](Documentation\u002FASR\u002FManualModelLoading.md), [Diarizer](Documentation\u002FDiarization\u002FGettingStarted.md#manual-model-loading), [VAD](Documentation\u002FVAD\u002FGettingStarted.md#manual-model-loading)\n  - Routing Hugging Face (or compatible) requests through a proxy? Set `https_proxy` before running the download helpers (see [Documentation\u002FAPI.md](Documentation\u002FAPI.md:9)).\n- Models\n  - Automatic Speech Recognition\u002FTranscription\n    - [Getting Started](Documentation\u002FASR\u002FGettingStarted.md)\n    - [Last Chunk Handling](Documentation\u002FASR\u002FLastChunkHandling.md)\n  - Speaker Diarization\n    - [Speaker Diarization Guide](Documentation\u002FDiarization\u002FGettingStarted.md)\n  - VAD: [Getting Started](Documentation\u002FVAD\u002FGettingStarted.md)\n    - [Segmentation](Documentation\u002FVAD\u002FSegmentation.md)\n    - [Model Conversion Code](https:\u002F\u002Fgithub.com\u002FFluidInference\u002Fmobius)\n- [Benchmarks](Documentation\u002FBenchmarks.md)\n- [API Reference](Documentation\u002FAPI.md)\n- [Command Line Guide](Documentation\u002FCLI.md)\n\n### MCP Server\n\nThe repo is indexed by DeepWiki MCP server, so your coding tool can access the docs:\n\n```json\n{\n  \"mcpServers\": {\n    \"deepwiki\": {\n      \"url\": \"https:\u002F\u002Fmcp.deepwiki.com\u002Fmcp\"\n    }\n  }\n}\n```\n\nFor claude code:\n\n```bash\nclaude mcp add -s user -t http deepwiki https:\u002F\u002Fmcp.deepwiki.com\u002Fmcp\n```\n\n## Automatic Speech Recognition (ASR) \u002F Transcription\n\n- **Models**:\n  - `FluidInference\u002Fparakeet-tdt-0.6b-v3-coreml` (multilingual, 25 European languages)\n  - `FluidInference\u002Fparakeet-tdt-0.6b-v2-coreml` (English-only, highest recall)\n- **Processing Mode**: Batch transcription for complete audio files\n- **Real-time Factor**: ~190x on M4 Pro (processes 1 hour of audio in ~19 seconds)\n- **Streaming Support**: Real-time streaming via `SlidingWindowAsrManager` with sliding window processing and cancellation support\n- **Backend**: Same Parakeet TDT v3 model powers our backend ASR\n\n### ASR Quick Start\n\n```swift\nimport FluidAudio\n\n\u002F\u002F Batch transcription from an audio file\nTask {\n    \u002F\u002F 1) Initialize ASR manager and load models\n    let models = try await AsrModels.downloadAndLoad(version: .v3)  \u002F\u002F Switch to .v2 for English-only work\n    let asrManager = AsrManager(config: .default)\n    try await asrManager.loadModels(models)\n\n    \u002F\u002F 3) Transcribe the audio 16hz, already converted\n    let result = try await asrManager.transcribe(samples)\n\n    \u002F\u002F 3) Transcribe a file\n    \u002F\u002F let url = URL(fileURLWithPath: sample.audioPath)\n\n    \u002F\u002F 3) Transcribe AVAudioPCMBuffer\n    \u002F\u002F let result = try await asrManager.transcribe(audioBuffer)\n    print(\"Transcription: \\(result.text)\")\n}\n```\n\n```bash\n# Transcribe an audio file (batch)\nswift run fluidaudiocli transcribe audio.wav\n\n# English-only run with higher recall\nswift run fluidaudiocli transcribe audio.wav --model-version v2\n```\n\n## Speaker Diarization\n\n### Offline Speaker Diarization Pipeline\n\nPyannote Community-1 pipeline (powerset segmentation + WeSpeaker + VBx) for offline speaker diarization. Use this for most use cases, see Benchmarks.md for benchmarks.\n\n```swift\nimport FluidAudio\n\nlet config = OfflineDiarizerConfig()\nlet manager = OfflineDiarizerManager(config: config)\ntry await manager.prepareModels()  \u002F\u002F Downloads + compiles Core ML bundles if they are missing\n\nlet samples = try AudioConverter().resampleAudioFile(path: \"meeting.wav\")\nlet result = try await manager.process(audio: samples)\n\nfor segment in result.segments {\n    print(\"\\(segment.speakerId) \\(segment.startTimeSeconds)s → \\(segment.endTimeSeconds)s\")\n}\n```\n\nFor processing audio files, use the file-based API which automatically uses memory-mapped streaming for efficiency:\n\n```swift\nlet url = URL(fileURLWithPath: \"meeting.wav\")\nlet result = try await manager.process(url)\n\nfor segment in result.segments {\n    print(\"\\(segment.speakerId) \\(segment.startTimeSeconds)s → \\(segment.endTimeSeconds)s\")\n}\n```\n\n```bash\n# Process a meeting with full VBx clustering\nswift run fluidaudiocli process ~\u002FFluidAudioDatasets\u002Fami_official\u002Fsdm\u002FES2004a.Mix-Headset.wav \\\n  --mode offline --threshold 0.6 --output es2004a_offline.json\n\n# Run the AMI single-file benchmark with automatic downloads\nswift run fluidaudiocli diarization-benchmark --mode offline --auto-download \\\n  --single-file ES2004a --threshold 0.6 --output offline_results.json\n```\n\n`offline_results.json` contains DER\u002FJER\u002FRTFx along with timing breakdowns for segmentation, embedding extraction, and VBx clustering. CI now runs this workflow on every PR to ensure the offline models stay healthy and the Hugging Face assets remain accessible.\n\n### LS-EEND (LongForm Streaming End-to-End Neural Diarization)\n\nEnd-to-end streaming diarization with CoreML inference. Default choice for online diarization — single model, no clustering pipeline, up to 10 speakers, 100ms frame updates with 900ms tentative preview. Supports both streaming and complete-buffer processing. See [Documentation\u002FDiarization\u002FGettingStarted.md](Documentation\u002FDiarization\u002FGettingStarted.md) for details.\n\n```swift\nimport FluidAudio\n\nTask {\n    let diarizer = try await LSEENDDiarizer(variant: .dihard3)\n\n    let samples = try await loadSamples16kMono(path: \"path\u002Fto\u002Fmeeting.wav\")\n    let timeline = try diarizer.processComplete(samples, sourceSampleRate: 16_000)\n\n    for speaker in timeline.speakers.values {\n        for segment in speaker.finalizedSegments {\n            print(\"Speaker \\(speaker.index): \\(segment.startTime)s - \\(segment.endTime)s\")\n        }\n    }\n}\n```\n\n### Sortformer (End-to-End Neural Diarization)\n\nEnd-to-end neural diarization using [NVIDIA's Sortformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.06656). Secondary streaming diarizer — trades LS-EEND's higher speaker capacity and benchmark results for better speaker identity stability. Limited to 4 speakers. No separate VAD, segmentation, or clustering needed. Licensed under NVIDIA Open Model License.\n\nBoth LS-EEND and Sortformer emit results into a `DiarizerTimeline` with ultra-low-latency updates. See [Documentation\u002FDiarization\u002FSortformer.md](Documentation\u002FDiarization\u002FSortformer.md) for usage and comparison.\n\n### Streaming\u002FOnline Speaker Diarization (Pyannote)\n\nPyannote 3.1 pipeline (segmentation + WeSpeaker embeddings) for online\u002Fstreaming diarization. This is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization.\n\nWhy use the WeSpeaker\u002FPyannote pipeline:\n- More modular pipeline if you want separate segmentation and embedding stages\n- Better fit when you need to integrate external speaker identification or clustering logic\n- Speaker pre-enrollment is reliable\n- Speaker database management is much easier\n- Purging or updating individual speakers is straightforward\n- Not recommended when low-latency live diarization is the priority\n\nIn most applications:\n- Use LS-EEND as the default online diarizer\n- Use Sortformer as the second choice when its stronger identity stability and participant focus matter more than the 4-speaker limit\n- Use the WeSpeaker\u002FPyannote pipeline only when you specifically need its modular design despite the speed cost\n\nTradeoffs:\n- Slower in both inference time and practical latency than LS-EEND or Sortformer\n- Needs larger chunks, with at least 5 seconds usually required for decent results\n- Unlike LS-EEND and Sortformer, speaker state is much easier to manipulate explicitly\n\n```swift\nimport FluidAudio\n\n\u002F\u002F Diarize an audio file\nTask {\n    let models = try await DiarizerModels.downloadIfNeeded()\n    let diarizer = DiarizerManager() \n    diarizer.initialize(models: models)\n\n    \u002F\u002F Prepare 16 kHz mono samples (see: Audio Conversion)\n    let samples = try await loadSamples16kMono(path: \"path\u002Fto\u002Fmeeting.wav\")\n\n    \u002F\u002F Run diarization\n    let result = try diarizer.performCompleteDiarization(samples)\n    for segment in result.segments {\n        print(\"Speaker \\(segment.speakerId): \\(segment.startTimeSeconds)s - \\(segment.endTimeSeconds)s\")\n    }\n}\n```\n\nFor diarization streaming see [Documentation\u002FDiarization\u002FGettingStarted.md](Documentation\u002FDiarization\u002FGettingStarted.md)\n\n```bash\nswift run fluidaudiocli diarization-benchmark --single-file ES2004a \\\n  --chunk-seconds 3 --overlap-seconds 2\n```\n\n### CLI\n\n```bash\n# Process an individual file and save JSON\nswift run fluidaudiocli process meeting.wav --output results.json --threshold 0.6\n```\n\n## Voice Activity Detection (VAD)\n\nSilero VAD powers our on-device detector. The latest release surfaces the same\ntimestamp extraction and streaming heuristics as the upstream PyTorch\nimplementation. Ping us on Discord if you need help tuning it for your\nenvironment.\n\n### VAD Quick Start (Offline Segmentation)\n\nSimple call to return chunk-level probabilities every 256 ms hop:\n\n```swift\nlet results = try await manager.process(samples)\nfor (index, chunk) in results.enumerated() {\n    print(\n        String(\n            format: \"Chunk %02d: prob=%.3f, inference=%.4fs\",\n            index,\n            chunk.probability,\n            chunk.processingTime\n        )\n    )\n}\n```\n\nThe following are higher level APIs better suited to integrate with other systems\n\n```swift\nimport FluidAudio\n\nTask {\n    let manager = try await VadManager(\n        config: VadConfig(defaultThreshold: 0.75)\n    )\n\n    let audioURL = URL(fileURLWithPath: \"path\u002Fto\u002Faudio.wav\")\n    let samples = try AudioConverter().resampleAudioFile(audioURL)\n\n    var segmentation = VadSegmentationConfig.default\n    segmentation.minSpeechDuration = 0.25\n    segmentation.minSilenceDuration = 0.4\n\n    let segments = try await manager.segmentSpeech(samples, config: segmentation)\n    for segment in segments {\n        print(\n            String(format: \"Speech %.2f–%.2fs\", segment.startTime, segment.endTime)\n        )\n    }\n}\n```\n\n### Streaming\n\n```swift\nimport FluidAudio\n\nTask {\n    let manager = try await VadManager()\n    var state = await manager.makeStreamState()\n\n    for chunk in microphoneChunks {\n        let result = try await manager.processStreamingChunk(\n            chunk,\n            state: state,\n            config: .default,\n            returnSeconds: true,\n            timeResolution: 2\n        )\n\n        state = result.state\n\n        \u002F\u002F Access raw probability (0.0-1.0) for custom logic\n        print(String(format: \"Probability: %.3f\", result.probability))\n\n        if let event = result.event {\n            let label = event.kind == .speechStart ? \"Start\" : \"End\"\n            print(\"\\(label) @ \\(event.time ?? 0)s\")\n        }\n    }\n}\n```\n\n### CLI\n\nStart with the general-purpose `process` command, which runs the diarization\npipeline (and therefore VAD) end-to-end on a single file:\n\n```bash\nswift run fluidaudiocli process path\u002Fto\u002Faudio.wav\n```\n\nOnce you need to experiment with VAD-specific knobs directly, reach for:\n\n```bash\n# Inspect offline segments (default mode)\nswift run fluidaudiocli vad-analyze path\u002Fto\u002Faudio.wav\n\n# Streaming simulation only (timestamps printed in seconds by default)\nswift run fluidaudiocli vad-analyze path\u002Fto\u002Faudio.wav --streaming\n\n# Benchmark accuracy\u002Fprecision trade-offs\nswift run fluidaudiocli vad-benchmark --num-files 50 --threshold 0.3\n```\n\n`swift run fluidaudiocli vad-analyze --help` lists every tuning option, including\nnegative-threshold overrides, max-speech splitting, padding, and chunk size.\nOffline mode also reports RTFx using the model's per-chunk processing time.\n\n## Text‑To‑Speech (TTS)\n\n> **⚠️ Beta:** TTS currently supports American English only. Additional language support is planned.\n\nFluidAudio ships two TTS backends:\n\n| | PocketTTS | Kokoro |\n|---|---|---|\n| **GPL dependencies** | None | None |\n| **Tokenizer** | SentencePiece | CoreML G2P → IPA phonemes |\n| **Generation** | Frame-by-frame autoregressive (80ms) | Parallel (all frames at once) |\n| **Streaming** | Yes | No |\n| **Voice cloning** | Yes (1–30s audio sample) | No |\n| **Pronunciation control** | No | Yes (SSML, custom lexicon) |\n| **Output** | 24 kHz mono WAV | 24 kHz mono WAV |\n\n### PocketTTS\n\nStreaming-friendly TTS with voice cloning support from short audio samples.\nAvailable language packs: `english` (default), `german`, `german_24l`,\n`italian`, `italian_24l`, `portuguese`, `portuguese_24l`, `spanish`,\n`spanish_24l`, `french_24l` (24-layer only — no 6-layer French upstream).\n\n```swift\nimport FluidAudio\n\nTask {\n    let manager = PocketTtsManager(language: .spanish)\n    try await manager.initialize()\n    let audioData = try await manager.synthesize(text: \"Hola, mundo.\")\n    try audioData.write(to: URL(fileURLWithPath: \"out.wav\"))\n}\n```\n\n```bash\n# English (default)\nswift run fluidaudiocli tts \"Hello from FluidAudio.\" --output out.wav --backend pocket\n\n# Other languages\nswift run fluidaudiocli tts \"Hola mundo\" --backend pocket --language spanish --output es.wav\nswift run fluidaudiocli tts \"Bonjour\" --backend pocket --language french_24l --output fr.wav\n\n# Clone a voice from an audio sample (works with any language pack)\nswift run fluidaudiocli tts \"Hello world.\" --output out.wav --backend pocket --clone-voice speaker.wav\n```\n\nSee [Documentation\u002FTTS\u002FPocketTTS.md](Documentation\u002FTTS\u002FPocketTTS.md#languages)\nfor the full language table.\n\n### KokoroAne\n\nANE-resident Kokoro 82M (4-stage on Neural Engine, 3-stage on GPU). Yields\n3-11× RTFx on Apple Silicon vs. the prior single-graph Kokoro path. English\n(`af_heart`) and Mandarin variants ship with a built-in G2P pipeline (BART\nCoreML for English OOV, jieba + sandhi + G2pW for Mandarin).\n\n```swift\nimport FluidAudio\n\nTask {\n    let manager = KokoroAneManager()\n    try await manager.initialize()\n    let samples = try await manager.synthesize(text: \"Hello from FluidAudio.\")\n    \u002F\u002F `samples` is 24 kHz mono Float32 PCM\n}\n```\n\n```bash\nswift run fluidaudiocli tts \"Hello from FluidAudio.\" --backend kokoroAne --output out.wav\n```\n\nModel assets are cached under `~\u002F.cache\u002Ffluidaudio\u002FModels\u002Fkokoro\u002F`.\n\n### Magpie (Multilingual) — experimental\n\n> ⚠️ **Quite slow on Apple Silicon — needs significant perf work; not for\n> real-time \u002F latency-sensitive use.** First synth on a fresh process is\n> dominated by CoreML model load + first-call ANE compile (~30 s). Warm\n> synths run at **~96 s wall for an 8-word English sentence** on M-series\n> (RTFx ≈ **0.04**, i.e. ~25× slower than realtime). Output is\n> perceptually clean \u002F ASR-clean across 4 of the 5 speakers; speaker 0\n> has a single trailing-word artifact attributable to fp16\n> sampler-trajectory drift (not a structural bug). Whether the throughput\n> ceiling is a model characteristic, a CoreML conversion limitation, or\n> both is still being investigated and is expected to improve in\n> subsequent iterations. **Use Kokoro (~20× RTFx) or PocketTTS\n> (~1.5–2× RTFx) for real-time use.** Magpie ships for multilingual\n> coverage and the 5 speaker contexts, not throughput.\n\nMagpie TTS Multilingual (357M) is NVIDIA's autoregressive encoder-decoder TTS with 8-codebook NanoCodec vocoder output at 22.05 kHz. It exposes 5 built-in speakers and supports 8 languages (English, Spanish, German, French, Italian, Vietnamese, Mandarin, Hindi) with a `|…|` IPA override that routes inline phoneme sequences directly to the tokenizer. Japanese is deferred pending OpenJTalk integration.\n\n```swift\nimport FluidAudio\n\nTask {\n    let manager = try await MagpieTtsManager.downloadAndCreate(\n        languages: [.english, .spanish]\n    )\n    let result = try await manager.synthesize(\n        text: \"Hello | ˈ n ɛ m o ʊ | from FluidAudio.\",\n        speaker: .john,\n        language: .english\n    )\n    let wav = AudioWAV.data(from: result.samples, sampleRate: result.sampleRate)\n    try wav.write(to: URL(fileURLWithPath: \"hello.wav\"))\n}\n```\n\n```bash\n# Pre-download assets for selected languages\nswift run fluidaudiocli magpie download --languages en,es\n\n# Synthesize with IPA override enabled (default)\nswift run fluidaudiocli magpie text --text \"Hello | ˈ n ɛ m o ʊ |.\" \\\n    --speaker 0 --language en --output hello.wav\n\n# Classifier-free guidance and sampling controls\nswift run fluidaudiocli magpie text --text \"Bonjour.\" --language fr \\\n    --cfg 2.5 --temperature 0.6 --topk 80 --seed 42 --output bonjour.wav\n```\n\nParity \u002F probe \u002F compute-plan tooling lives upstream in `mobius` (Python).\n\nAssets (4 CoreML models + `constants\u002F` + per-language tokenizer files) are fetched from [`FluidInference\u002Fmagpie-tts-multilingual-357m-coreml`](https:\u002F\u002Fhuggingface.co\u002FFluidInference\u002Fmagpie-tts-multilingual-357m-coreml) on first use. The 1-layer local transformer (256d, top-k + temperature sampling, forbidden-token mask) runs on CPU via Accelerate\u002FBNNS; the 12-layer decoder KV cache is rolled stateful across steps.\n\nWhen `--seed N` is supplied, sampling is driven by a NumPy-compatible MT19937 RNG so the Swift output is bit-reproducible against the Python reference seeded with `np.random.seed(N)`.\n\n## Continuous Integration\n\n- `tests.yml`: Default build matrix covering SwiftPM tests and an iOS archive smoke test.\n- `diarizer-benchmark.yml`: Runs the streaming diarization benchmark on ES2004a for regression tracking.\n- `offline-pipeline.yml`: Executes the VBx offline pipeline end-to-end (`fluidaudio diarization-benchmark --mode offline`) and fails if DER\u002FJER drift beyond guardrails or if models fail to download. Use this workflow as a reference for provisioning model caches in your own CI.\n\n## Everything Else\n\n### FAQs\n\n- CLI is available on macOS only. For iOS, use the library programmatically.\n- Models auto-download on first use. If your network restricts Hugging Face access, set an HTTPS proxy: `export https_proxy=http:\u002F\u002F127.0.0.1:7890`.\n- Windows alternative in development: [fluid-server](https:\u002F\u002Fgithub.com\u002FFluidInference\u002Ffluid-server)\n- If you're looking to get the system audio on a Mac, take a look at this repo for reference [AudioCap](https:\u002F\u002Fgithub.com\u002Finsidegui\u002FAudioCap\u002Ftree\u002Fmain)\n\n### License\n\nApache 2.0 — see `LICENSE` for details.\n\n### Acknowledgments\n\nThis project builds upon the excellent work of the [sherpa-onnx](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002Fsherpa-onnx) project for speaker diarization algorithms and techniques.\n\nPyannote: \u003Chttps:\u002F\u002Fgithub.com\u002Fpyannote\u002Fpyannote-audio>\n\nWeSpeaker: \u003Chttps:\u002F\u002Fgithub.com\u002Fwenet-e2e\u002Fwespeaker>\n\nParakeet-mlx: \u003Chttps:\u002F\u002Fgithub.com\u002Fsenstella\u002Fparakeet-mlx>\n\nsilero-vad: \u003Chttps:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-vad>\n\nKokoro-82M: \u003Chttps:\u002F\u002Fhuggingface.co\u002Fhexgrad\u002FKokoro-82M>\n\n### Citation\n\nIf you use FluidAudio in your work, please cite:\n\nFluidInference Team. (2025). FluidAudio: Local Speaker Diarization, ASR, and VAD for Apple Platforms (Version 0.12.4) [Computer software]. GitHub. \u003Chttps:\u002F\u002Fgithub.com\u002FFluidInference\u002FFluidAudio>\n\n```bibtex\n@software{FluidInferenceTeam_FluidAudio_2025,\n  author = {{FluidInference Team}},\n  title = {{FluidAudio: Local Speaker Diarization, ASR, and VAD for Apple Platforms}},\n  year = {2025},\n  month = {3},\n  version = {0.12.4},\n  url = {https:\u002F\u002Fgithub.com\u002FFluidInference\u002FFluidAudio},\n  note = {Computer software}\n}\n```\n\n---\n\n## Show Your Support\n\nHelp the Fluid Inference community grow by adding a \"Powered by Fluid Inference\" badge to your project!\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Ffluidinference.com\">\n    \u003Cimg src=\"https:\u002F\u002Fassets.inference.plus\u002Ffi-badge.png\" alt=\"Powered by Fluid Inference\" height=\"80\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\nCopy and paste this prompt to your coding agent where you host your homepage:\n\n```text\nAdd a centered 'Powered by Fluid Inference' badge to the footer linking to fluidinference.com. Image: https:\u002F\u002Fassets.inference.plus\u002Ffi-badge.png. Whitelist the image hostname in your framework's config.\n```\n\nOr use one of these code snippets:\n\n\u003Cdetails>\n\u003Csummary>React\u002FNext.js\u003C\u002Fsummary>\n\n```jsx\n\u003Cdiv className=\"flex justify-center py-8\">\n  \u003Ca href=\"https:\u002F\u002Ffluidinference.com\">\n    \u003Cimg\n      src=\"https:\u002F\u002Fassets.inference.plus\u002Ffi-badge.png\"\n      alt=\"Powered by Fluid Inference\"\n      height={80}\n    \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>HTML\u003C\u002Fsummary>\n\n```html\n\u003Cdiv style=\"text-align: center; padding: 20px;\">\n  \u003Ca href=\"https:\u002F\u002Ffluidinference.com\">\n    \u003Cimg src=\"https:\u002F\u002Fassets.inference.plus\u002Ffi-badge.png\" alt=\"Powered by Fluid Inference\" height=\"80\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Markdown\u003C\u002Fsummary>\n\n```markdown\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Ffluidinference.com\">\n    \u003Cimg src=\"https:\u002F\u002Fassets.inference.plus\u002Ffi-badge.png\" alt=\"Powered by Fluid Inference\" height=\"80\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n```\n\n\u003C\u002Fdetails>\n","FluidAudio 是一个用于在苹果设备上实现全本地、低延迟音频AI的Swift SDK，支持文本转语音、语音转文本、语音活动检测和说话人分割等功能。该SDK利用开源模型并通过Apple Neural Engine (ANE)进行推理，从而减少内存占用并提高处理速度。其核心功能包括先进的说话人分割、转录及语音活动检测，并且这些功能可以通过几行代码轻松集成到应用中。FluidAudio 适用于需要后台处理、环境计算或持续运行的工作负载场景，特别适合iOS和macOS平台上的开发者使用。",2,"2026-06-11 03:45:14","high_star"]