LLaMA-Omni

ictnlp

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

AI 简介

LLaMA-Omni 是一个基于 Llama-3.1-8B-Instruct 的端到端语音交互模型，支持低延迟（最低226ms）的语音输入与同步文本+语音输出。其核心技术包括语音-语言联合建模、轻量级训练（4卡3天内完成）、以及对多轮语音对话的适配能力。模型在保持高质量语言理解与生成的同时，实现接近GPT-4o级别的实时语音交互体验。适用于智能语音助手、实时会议转录与应答、无障碍人机交互等需要低延迟、多模态响应的场景。

Python

Apache License 2.0

large-language-models multimodal-large-language-models speech-interaction speech-language-model speech-to-speech speech-to-text

在 GitHub 查看官方网站

3.1k

Stars

223

Forks

Watchers

Issues

Star 增长

今日0

近 7 天0

近 30 天+1

综合评分29.05

默认分支main

LLaMA-Omni

Star 增长

加入交流群