[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72251":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72251,"RealtimeVoiceChat","KoljaB\u002FRealtimeVoiceChat","KoljaB","Have a natural, spoken conversation with AI!","",null,"Python",3752,439,35,40,0,1,5,103,3,71.43,false,"main",true,[],"2026-06-12 04:01:04","\n# Real-Time AI Voice Chat 🎤💬🧠🔊\n\n**Have a natural, spoken conversation with an AI!**  \n\nThis project lets you chat with a Large Language Model (LLM) using just your voice, receiving spoken responses in near real-time. Think of it as your own digital conversation partner.\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F16cc29a7-bec2-4dd0-a056-d213db798d8f\n\n*(early preview - first reasonably stable version)*\n\n> ❗ **Project Status: Community-Driven**\n> \n> This project is no longer being actively maintained by me due to time constraints. I've taken on too many projects and I have to step back. I will no longer be implementing new features or providing user support.\n>\n> I will continue to review and merge high-quality, well-written Pull Requests from the community from time to time. Your contributions are welcome and appreciated!\n\n## What's Under the Hood?\n\nA sophisticated client-server system built for low-latency interaction:\n\n1.  🎙️ **Capture:** Your voice is captured by your browser.\n2.  ➡️ **Stream:** Audio chunks are whisked away via WebSockets to a Python backend.\n3.  ✍️ **Transcribe:** `RealtimeSTT` rapidly converts your speech to text.\n4.  🤔 **Think:** The text is sent to an LLM (like Ollama or OpenAI) for processing.\n5.  🗣️ **Synthesize:** The AI's text response is turned back into speech using `RealtimeTTS`.\n6.  ⬅️ **Return:** The generated audio is streamed back to your browser for playback.\n7.  🔄 **Interrupt:** Jump in anytime! The system handles interruptions gracefully.\n\n## Key Features ✨\n\n*   **Fluid Conversation:** Speak and listen, just like a real chat.\n*   **Real-Time Feedback:** See partial transcriptions and AI responses as they happen.\n*   **Low Latency Focus:** Optimized architecture using audio chunk streaming.\n*   **Smart Turn-Taking:** Dynamic silence detection (`turndetect.py`) adapts to the conversation pace.\n*   **Flexible AI Brains:** Pluggable LLM backends (Ollama default, OpenAI support via `llm_module.py`).\n*   **Customizable Voices:** Choose from different Text-to-Speech engines (Kokoro, Coqui, Orpheus via `audio_module.py`).\n*   **Web Interface:** Clean and simple UI using Vanilla JS and the Web Audio API.\n*   **Dockerized Deployment:** Recommended setup using Docker Compose for easier dependency management.\n\n## Technology Stack 🛠️\n\n*   **Backend:** Python \u003C 3.13, FastAPI\n*   **Frontend:** HTML, CSS, JavaScript (Vanilla JS, Web Audio API, AudioWorklets)\n*   **Communication:** WebSockets\n*   **Containerization:** Docker, Docker Compose\n*   **Core AI\u002FML Libraries:**\n    *   `RealtimeSTT` (Speech-to-Text)\n    *   `RealtimeTTS` (Text-to-Speech)\n    *   `transformers` (Turn detection, Tokenization)\n    *   `torch` \u002F `torchaudio` (ML Framework)\n    *   `ollama` \u002F `openai` (LLM Clients)\n*   **Audio Processing:** `numpy`, `scipy`\n\n## Before You Dive In: Prerequisites 🏊‍♀️\n\nThis project leverages powerful AI models, which have some requirements:\n\n*   **Operating System:**\n    *   **Docker:** Linux is recommended for the best GPU integration with Docker.\n    *   **Manual:** The provided script (`install.bat`) is for Windows. Manual steps are possible on Linux\u002FmacOS but may require more troubleshooting (especially for DeepSpeed).\n*   **🐍 Python:** 3.9 or higher (if setting up manually).\n*   **🚀 GPU:** **A powerful CUDA-enabled NVIDIA GPU is *highly recommended***, especially for faster STT (Whisper) and TTS (Coqui). Performance on CPU-only or weaker GPUs will be significantly slower.\n    *   The setup assumes **CUDA 12.1**. Adjust PyTorch installation if you have a different CUDA version.\n    *   **Docker (Linux):** Requires [NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html).\n*   **🐳 Docker (Optional but Recommended):** Docker Engine and Docker Compose v2+ for the containerized setup.\n*   **🧠 Ollama (Optional):** If using the Ollama backend *without* Docker, install it separately and pull your desired models. The Docker setup includes an Ollama service.\n*   **🔑 OpenAI API Key (Optional):** If using the OpenAI backend, set the `OPENAI_API_KEY` environment variable (e.g., in a `.env` file or passed to Docker).\n\n---\n\n## Getting Started: Installation & Setup ⚙️\n\n**Clone the repository first:**\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeVoiceChat.git\ncd RealtimeVoiceChat\n```\n\nNow, choose your adventure:\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>🚀 Option A: Docker Installation (Recommended for Linux\u002FGPU)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nThis is the most straightforward method, bundling the application, dependencies, and even Ollama into manageable containers.\n\n1.  **Build the Docker images:**\n    *(This takes time! It downloads base images, installs Python\u002FML dependencies, and pre-downloads the default STT model.)*\n    ```bash\n    docker compose build\n    ```\n    *(If you want to customize models\u002Fsettings in `code\u002F*.py`, do it **before** this step!)*\n\n2.  **Start the services (App & Ollama):**\n    *(Runs containers in the background. GPU access is configured in `docker-compose.yml`.)*\n    ```bash\n    docker compose up -d\n    ```\n    Give them a minute to initialize.\n\n3.  **(Crucial!) Pull your desired Ollama Model:**\n    *(This is done *after* startup to keep the main app image smaller and allow model changes without rebuilding. Execute this command to pull the default model into the running Ollama container.)*\n    ```bash\n    # Pull the default model (adjust if you configured a different one in server.py)\n    docker compose exec ollama ollama pull hf.co\u002Fbartowski\u002Fhuihui-ai_Mistral-Small-24B-Instruct-2501-abliterated-GGUF:Q4_K_M\n\n    # (Optional) Verify the model is available\n    docker compose exec ollama ollama list\n    ```\n\n4.  **Stopping the Services:**\n    ```bash\n    docker compose down\n    ```\n\n5.  **Restarting:**\n    ```bash\n    docker compose up -d\n    ```\n\n6.  **Viewing Logs \u002F Debugging:**\n    *   Follow app logs: `docker compose logs -f app`\n    *   Follow Ollama logs: `docker compose logs -f ollama`\n    *   Save logs to file: `docker compose logs app > app_logs.txt`\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>🛠️ Option B: Manual Installation (Windows Script \u002F venv)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nThis method requires managing the Python environment yourself. It offers more direct control but can be trickier, especially regarding ML dependencies.\n\n**B1) Using the Windows Install Script:**\n\n1.  Ensure you meet the prerequisites (Python, potentially CUDA drivers).\n2.  Run the script. It attempts to create a venv, install PyTorch for CUDA 12.1, a compatible DeepSpeed wheel, and other requirements.\n    ```batch\n    install.bat\n    ```\n    *(This opens a new command prompt within the activated virtual environment.)*\n    Proceed to the **\"Running the Application\"** section.\n\n**B2) Manual Steps (Linux\u002FmacOS\u002FWindows):**\n\n1.  **Create & Activate Virtual Environment:**\n    ```bash\n    python -m venv venv\n    # Linux\u002FmacOS:\n    source venv\u002Fbin\u002Factivate\n    # Windows:\n    .\\venv\\Scripts\\activate\n    ```\n\n2.  **Upgrade Pip:**\n    ```bash\n    python -m pip install --upgrade pip\n    ```\n\n3.  **Navigate to Code Directory:**\n    ```bash\n    cd code\n    ```\n\n4.  **Install PyTorch (Crucial Step - Match Your Hardware!):**\n    *   **With NVIDIA GPU (CUDA 12.1 Example):**\n        ```bash\n        # Verify your CUDA version! Adjust 'cu121' and the URL if needed.\n        pip install torch==2.5.1+cu121 torchaudio==2.5.1+cu121 torchvision --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n        ```\n    *   **CPU Only (Expect Slow Performance):**\n        ```bash\n        # pip install torch torchaudio torchvision\n        ```\n    *   *Find other PyTorch versions:* [https:\u002F\u002Fpytorch.org\u002Fget-started\u002Fprevious-versions\u002F](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Fprevious-versions\u002F)\n\n5.  **Install Other Requirements:**\n    ```bash\n    pip install -r requirements.txt\n    ```\n    *   **Note on DeepSpeed:** The `requirements.txt` may include DeepSpeed. Installation can be complex, especially on Windows. The `install.bat` tries a precompiled wheel. If manual installation fails, you might need to build it from source or consult resources like [deepspeedpatcher](https:\u002F\u002Fgithub.com\u002Ferew123\u002Fdeepspeedpatcher) (use at your own risk). Coqui TTS performance benefits most from DeepSpeed.\n\n\u003C\u002Fdetails>\n\n---\n\n## Running the Application ▶️\n\n**If using Docker:**\nYour application is already running via `docker compose up -d`! Check logs using `docker compose logs -f app`.\n\n**If using Manual\u002FScript Installation:**\n\n1.  **Activate your virtual environment** (if not already active):\n    ```bash\n    # Linux\u002FmacOS: source ..\u002Fvenv\u002Fbin\u002Factivate\n    # Windows: ..\\venv\\Scripts\\activate\n    ```\n2.  **Navigate to the `code` directory** (if not already there):\n    ```bash\n    cd code\n    ```\n3.  **Start the FastAPI server:**\n    ```bash\n    python server.py\n    ```\n\n**Accessing the Client (Both Methods):**\n\n1.  Open your web browser to `http:\u002F\u002Flocalhost:8000` (or your server's IP if running remotely\u002Fin Docker on another machine).\n2.  **Grant microphone permissions** when prompted.\n3.  Click **\"Start\"** to begin chatting! Use \"Stop\" to end and \"Reset\" to clear the conversation.\n\n---\n\n## Configuration Deep Dive 🔧\n\nWant to tweak the AI's voice, brain, or how it listens? Modify the Python files in the `code\u002F` directory.\n\n**⚠️ Important Docker Note:** If using Docker, make any configuration changes *before* running `docker compose build` to ensure they are included in the image.\n\n*   **TTS Engine & Voice (`server.py`, `audio_module.py`):**\n    *   Change `START_ENGINE` in `server.py` to `\"coqui\"`, `\"kokoro\"`, or `\"orpheus\"`.\n    *   Adjust engine-specific settings (e.g., voice model path for Coqui, speaker ID for Orpheus, speed) within `AudioProcessor.__init__` in `audio_module.py`.\n*   **LLM Backend & Model (`server.py`, `llm_module.py`):**\n    *   Set `LLM_START_PROVIDER` (`\"ollama\"` or `\"openai\"`) and `LLM_START_MODEL` (e.g., `\"hf.co\u002F...\"` for Ollama, model name for OpenAI) in `server.py`. Remember to pull the Ollama model if using Docker (see Installation Step A3).\n    *   Customize the AI's personality by editing `system_prompt.txt`.\n*   **STT Settings (`transcribe.py`):**\n    *   Modify `DEFAULT_RECORDER_CONFIG` to change the Whisper model (`model`), language (`language`), silence thresholds (`silence_limit_seconds`), etc. The default `base.en` model is pre-downloaded during the Docker build.\n*   **Turn Detection Sensitivity (`turndetect.py`):**\n    *   Adjust pause duration constants within the `TurnDetector.update_settings` method.\n*   **SSL\u002FHTTPS (`server.py`):**\n    *   Set `USE_SSL = True` and provide paths to your certificate (`SSL_CERT_PATH`) and key (`SSL_KEY_PATH`) files.\n    *   **Docker Users:** You'll need to adjust `docker-compose.yml` to map the SSL port (e.g., 443) and potentially mount your certificate files as volumes.\n    \u003Cdetails>\n    \u003Csummary>\u003Cstrong>Generating Local SSL Certificates (Windows Example w\u002F mkcert)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n    1.  Install Chocolatey package manager if you haven't already.\n    2.  Install mkcert: `choco install mkcert`\n    3.  Run Command Prompt *as Administrator*.\n    4.  Install a local Certificate Authority: `mkcert -install`\n    5.  Generate certs (replace `your.local.ip`): `mkcert localhost 127.0.0.1 ::1 your.local.ip`\n        *   This creates `.pem` files (e.g., `localhost+3.pem` and `localhost+3-key.pem`) in the current directory. Update `SSL_CERT_PATH` and `SSL_KEY_PATH` in `server.py` accordingly. Remember to potentially mount these into your Docker container.\n    \u003C\u002Fdetails>\n\n---\n\n## Contributing 🤝\n\nGot ideas or found a bug? Contributions are welcome! Feel free to open issues or submit pull requests.\n\n## License 📜\n\nThe core codebase of this project is released under the **MIT License** (see the [LICENSE](.\u002FLICENSE) file for details).\n\nThis project relies on external specific TTS engines (like `Coqui XTTSv2`) and LLM providers which have their **own licensing terms**. Please ensure you comply with the licenses of all components you use.\n","RealtimeVoiceChat 是一个让你能够通过语音与大型语言模型进行自然对话的项目。它利用先进的客户端-服务器架构，支持低延迟的语音交互，包括语音捕捉、实时转录、AI处理和语音合成等关键步骤。项目使用Python构建后端，并结合FastAPI、WebSockets以及Docker容器化技术确保高效运行。前端则基于HTML、CSS和原生JavaScript开发了一个简洁直观的用户界面。此外，该项目还提供了灵活的AI大脑选择（如Ollama或OpenAI）及多种TTS引擎配置选项，以满足不同场景下的个性化需求。非常适合需要实现人机口语交流的应用场合，例如虚拟助手、在线教育平台或是客户服务系统。",2,"2026-06-11 03:41:02","high_star"]