speech-to-speech

huggingface

Build local voice agents with open-source models

AI 简介

这是一个构建本地化语音代理（voice agent）的开源工具包，实现端到端语音到语音（Speech-to-Speech）转换。项目采用模块化级联架构，整合语音活动检测（VAD）、语音识别（STT）、大语言模型（LLM）推理与语音合成（TTS）四大组件，支持多种Hugging Face Hub上的开源模型（如Whisper、ChatTTS、Qwen3-TTS等），并适配不同硬件平台（含Apple Silicon优化）。所有模块可独立替换或定制，支持实时流式处理、WebSocket通信及本地离线运行。适用于需隐私保护、低延迟响应和可定制交互逻辑的语音助手、智能客服、无障碍交互等本地化部署场景。

Python

Apache License 2.0

ai assistant language-model machine-learning python speech speech-synthesis speech-to-text speech-translation

在 GitHub 查看

5.6k

Stars

677

Forks

Watchers

Issues

Star 增长

今日0

近 7 天0

近 30 天+661

综合评分74.49

默认分支main

speech-to-speech

Star 增长

加入交流群