sam-audio

facebookresearch

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

AI 简介

SAM-Audio 是一个用于音频声源分离的多模态基础模型，支持通过文本描述、视觉线索（如视频帧）或时间区间等提示方式，从混合音频中精准分离指定声音。其核心技术基于 Perception-Encoder Audio-Visual（PE-AV）架构，结合大模型推理与跨模态对齐能力，提供端到端的音频分割与分离功能。适用于音视频内容编辑、语音增强、声学场景分析、无障碍辅助及科研中的细粒度音频理解等任务，需 CUDA GPU 加速运行。

Python

Other

在 GitHub 查看

3.6k

Stars

323

Forks

Watchers

Issues

Star 增长

今日0

近 7 天0

近 30 天+16

综合评分61.13

默认分支main

sam-audio

Star 增长

加入交流群