[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80050":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":13,"forks30d":13,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":22,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":13,"starSnapshotCount":13,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},80050,"Hive","JusperLee\u002FHive","JusperLee",null,"Python",170,20,10,0,35,68,104,105,3.97,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:03:57","\u003Ch1 align=\"center\">A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation\u003C\u002Fh1>\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assert\u002Flogo.png\" alt=\"Logo\" width=\"250\"\u002F>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Cstrong>Kai Li\u003Csup>*\u003C\u002Fsup>, Jintao Cheng\u003Csup>*\u003C\u002Fsup>, Chang Zeng, Zijun Yan, Helin Wang, Zixiong Su, Bo Zheng, Xiaolin Hu\u003C\u002Fstrong>\u003Cbr>\n    \u003Cstrong>Tsinghua University, Shanda AI, Johns Hopkins University\u003C\u002Fstrong>\u003Cbr>\n    \u003Cstrong>\u003Csup>*\u003C\u002Fsup>Equal contribution\u003C\u002Fstrong>\u003Cbr>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.22599\">📜 Arxiv 2026\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fcslikai.cn\u002FHive\u002F\">🎶 Demo\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FJusperLee\u002FHive\">🤗 Metadata\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FJusperLee\u002FHive-ALL\">🤗 Hive-ALL Audio\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FJusperLee\u002FHive\">🤗 Space\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fvisitor-badge.laobi.icu\u002Fbadge?page_id=JusperLee.Hive\" alt=\"访客统计\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FJusperLee\u002FHive?style=social\" alt=\"GitHub stars\">\n  \u003Cimg alt=\"Static Badge\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-blue.svg\" \u002F>\n\u003C\u002Fp>\n\n---\n\n## 💥 News\n\n- [2026-05-21] **Hive-ALL** is now also available on **[ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fdatasets\u002FJusperLee\u002FHive-All\u002Ffiles)** for users in China who prefer faster downloads via the ModelScope mirror. 🇨🇳🚀\n- [2026-05-19] We release **[Hive-ALL](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FJusperLee\u002FHive-ALL)**, the pre-generated mixed audio dataset (9.17 TB, WebDataset format). No need to re-mix from metadata. 🚀\n- [2026-03-09] We release `infer_audiosep.py` for one-command AudioSep inference in Hive. The script automatically downloads config and checkpoints from [JusperLee\u002FAudioSep-hive](https:\u002F\u002Fhuggingface.co\u002FJusperLee\u002FAudioSep-hive). 🚀\n- [2026-03-09] We release `infer_flowsep.py` for one-command FlowSep inference in Hive, with automatic config and checkpoint download from [JusperLee\u002FFlowSep-hive](https:\u002F\u002Fhuggingface.co\u002FJusperLee\u002FFlowSep-hive). 🚀\n- [2026-03-09] We release `app.py`, a unified Gradio demo that supports both **AudioSep-hive** and **FlowSep-hive** in a single interface. 🚀\n- [2026-03-09] Community Hugging Face Space is available at [JusperLee\u002FHive](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FJusperLee\u002FHive) for quick interactive demo. 🚀\n- [2026-02-09] Thanks to [@faiteamartaliius](https:\u002F\u002Fhuggingface.co\u002Ffaiteamartaliius) for using this codebase to synthesize data and publicly sharing a third-party Hive-style dataset: [faiteamartaliius\u002FHive](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Ffaiteamartaliius\u002FHive). 🙏\n\n\n## 📌 Table of Contents\n\n- [📄 Abstract](#-abstract)\n- [📂 Repository Structure](#-repository-structure)\n- [🍯 Hive Dataset](#-hive-dataset)\n- [⚙️ Data Collection Pipeline](#️-data-collection-pipeline)\n- [🎧 Inference](#-inference)\n- [🖥️ Gradio App](#️-gradio-app)\n- [📖 Citation](#-citation)\n- [⚖️ License](#️-license)\n- [🙏 Acknowledgments](#-acknowledgments)\n\n\n## 📄 Abstract\n\n\u003Cp align=\"justify\">\nQuery-based universal sound separation is fundamental to intelligent auditory systems, aiming to isolate specific sources from unconstrained mixtures. \nDespite recent advances, existing methods continue to suffer from residual interference in complex acoustic scenes. \nThis performance limitation stems largely from a data bottleneck: ubiquitous in-the-wild datasets contain weak labels and severe event co-occurrence. \nThese flaws induce models to learn spurious correlations between background noise and target categories instead of robust acoustic features. \nTo address this, we propose an automated pipeline that eliminates co-occurrence noise by mining high-purity single-event segments \nfrom unconstrained recordings and synthesizing mixtures via semantically consistent strategies. \nUtilizing this pipeline, we constructed \u003Ci>Hive\u003C\u002Fi>, a high-quality synthetic dataset comprising 2k hours of audio. \n\u003Ca style=\"color:blue;\">Experimental results demonstrate that, despite using only \u003Cb>~0.2%\u003C\u002Fb> of the data scale of million-hour baselines, \nmodels trained on Hive achieve competitive separation accuracy and perceptual quality.\u003C\u002Fa> \nMoreover, these models exhibit remarkable zero-shot generalization on out-of-distribution evaluation benchmarks such as MUSDB18-HQ and USS-Bench. \nThese findings highlight that \u003Ca style=\"background-color:LightYellow;color:red;\">prioritizing supervision purity enables significant data efficiency\u003C\u002Fa>, \noffering a new paradigm for training robust auditory foundation models with reduced computational costs.\n\u003C\u002Fp>\n\n\n## 📂 Repository Structure\n\n```\n.\n├── hive_dataset\u002F                           # Hive Dataset generation and curation\n│   ├── mix_from_metadata\u002F                  # Generate mixtures from metadata\n│   │   ├── mix_from_metadata.py\n│   │   └── dataset_paths.json\n│   ├── mix_curation\u002F                       # Data curation for mix audio\n│   │   ├── mix_data_curation.py\n│   │   └── ontology.json\n│   ├── README.md                           # Dataset documentation\n│   ├── requirements.txt\n│   └── LICENSE\n├── pipeline\u002F                               # Single-Event Data Collection Pipeline\n│   ├── code\u002F                               # Pipeline scripts\n│   │   ├── 01_audio_chunking.py\n│   │   ├── 02_filter_single_label.py\n│   │   ├── 03_filter_single_event_qwen.py\n│   │   ├── 04_audioset_label_audiotag.py\n│   │   ├── 05_leaf_label_qwen.py\n│   │   └── 06_superres_apollo.py\n│   ├── data\u002F                               # Pipeline data directories\n│   ├── ontology\u002F                           # AudioSet ontologies\n│   ├── icefall\u002F                            # AudioTag model repository\n│   ├── Apollo\u002F                             # Apollo model repository\n│   ├── requirements.txt                    # Pipeline dependencies\n│   └── README.md                           # Pipeline documentation\n├── LICENCE                                 # Apache 2.0 License\n└── README.md                              \n```\n\n\n## 🍯 Hive Dataset\n\n**Hive** is a high-quality synthetic dataset with 2,442 hours of raw audio and 19.6M mixtures for Universal Sound Separation.\n\n**Features:**\n- 283 sound categories from AudioSet ontology\n- Semantically consistent mixing logic\n- 44.1kHz sample rate\n\n**Two release formats:**\n- **[JusperLee\u002FHive](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FJusperLee\u002FHive)** — metadata-only release (Parquet, ~1.24 GB). Use `mix_from_metadata.py` to regenerate mixtures locally from the original 12 source datasets.\n- **[JusperLee\u002FHive-ALL](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FJusperLee\u002FHive-ALL)** — pre-generated mixed audio (285 GB, WebDataset `.tar` shards, 2-source mixtures). Ready to stream and train without any re-mixing step.\n\nPlease refer to [`hive_dataset\u002F`](hive_dataset\u002F) for details\n\n\n## ⚙️ Data Collection Pipeline\n\nAn automated 6-step pipeline for mining high-purity single-event audio from weakly-labeled sources.\n\n**Pipeline Stages:**\n1. Audio Chunking - Split long audio into segments\n2. Single Label Filtering - Remove multi-label samples\n3. Single Event Filtering - Verify acoustic purity with Qwen3-Omni\n4. AudioSet Label Tagging - Assign ontology labels with AudioTag\n5. Leaf Label Classification - Refine to leaf nodes with Qwen3-Omni\n6. Audio Super-Resolution - Upsample to 44.1kHz with Apollo\n\nPlease refer to [`pipeline\u002F`](pipeline\u002F) for details\n\n\n## 🎧 Inference\n\nHive provides two inference scripts with automatic checkpoint\u002Fconfig download from Hugging Face:\n\n- `infer_audiosep.py` -> [JusperLee\u002FAudioSep-hive](https:\u002F\u002Fhuggingface.co\u002FJusperLee\u002FAudioSep-hive)\n- `infer_flowsep.py` -> [JusperLee\u002FFlowSep-hive](https:\u002F\u002Fhuggingface.co\u002FJusperLee\u002FFlowSep-hive)\n\n### 1) Install dependencies\n\n```bash\ncd Hive\npip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub gradio\n```\n\n### 2) AudioSep inference\n\n```bash\npython infer_audiosep.py \\\n  --audio_file \u002Fpath\u002Fto\u002Fmixture.wav \\\n  --text \"acoustic guitar\" \\\n  --output_file \u002Fpath\u002Fto\u002Faudiosep_output.wav\n```\n\n### 3) FlowSep inference\n\n```bash\npython infer_flowsep.py \\\n  --audio_file \u002Fpath\u002Fto\u002Fmixture.wav \\\n  --text \"acoustic guitar\" \\\n  --output_file \u002Fpath\u002Fto\u002Fflowsep_output.wav\n```\n\n## 🖥️ Gradio App\n\n`app.py` launches an interactive local demo with both models in one UI:\n\n- Model choices: `AudioSep-hive`, `FlowSep-hive`\n- Input: mixed audio + text query\n- Output: separated waveform\n\nRun:\n\n```bash\ncd Hive\npython app.py\n```\n\nThen open the local Gradio URL printed in terminal.\n\n## 📖 Citation\n\nIf you use this code or the Hive Dataset, please cite:\n\n```bibtex\n@article{li2026semantically,\n  title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation},\n  author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin},\n  journal={arXiv preprint arXiv:2601.22599},\n  year={2026}\n}\n```\n\n\n## ⚖️ License\n\n### Project License\n\nThis project is licensed under the Apache License 2.0. See [LICENCE](LICENCE) for details.\n\n### Model Licenses\n\n- **Qwen3-Omni**: Apache 2.0\n- **AudioTag**: Apache 2.0\n- **Apollo**: Check model repository for specific license\n\n\n## 🙏 Acknowledgments\n\nThe Hive dataset is a collaborative achievement built upon the foundation of the open-source audio community. We extend our deepest gratitude to the researchers and organizations who curated the twelve foundational datasets. Their work provides the essential long-tailed acoustic space for advancing **Universal Sound Separation**.\n\n### 🏛️ Foundational Data Sources\n\nWe gratefully acknowledge the following core datasets which provided the majority of our high-fidelity clips:\n\n- **BBC Sound Effects** (369,603 clips, 1,020.62h) - Professional-grade recordings with broadcast-level fidelity under Remix License\n- **AudioSet** (326,890 clips, 896.61h) - Large-scale benchmark from YouTube under CC BY (Google)\n- **VGGSound** (115,191 clips, 319.10h) - Real-world acoustic diversity under CC BY 4.0 (University of Oxford)\n- **FreeSound** (17,451 clips, 46.90h) - Rich crowdsourced soundscapes under CC0\u002FBY\u002FBY-NC (MTG-UPF)\n\n### 🎯 Specialized Domain Contributors\n\nOur sincere thanks go to the following datasets for providing the raw source audio that forms the specialized domains of the **Hive Dataset**:\n\n**Music & Speech:**\n- **MUSIC21** (32,701 clips, 90.28h) - Solo and ensemble instruments for harmonic structure modeling\n- **Voicebank-DEMAND** (12,376 clips, 9.94h) - Clean speech signals under CC BY 4.0\n- **FSD50K** (636 clips, 0.80h) - Finely annotated subset based on AudioSet ontology\n\n**Environmental & Events:**\n- **ClothoV2** (14,759 clips, 38.19h) - Audio captioning dataset with rich temporal evolution\n- **AVE** (3,054 clips, 6.91h) - Audio-visual event localization under CC BY-NC-SA\n- **SoundBible** (2,501 clips, 5.78h) - Curated short clips under CC BY 4.0\n- **DCASE** (1,969 clips, 5.46h) - Acoustic scene detection challenges\n- **ESC50** (1,433 clips, 1.99h) - Environmental sound classification benchmark under CC BY-NC 3.0\n\n### ⚖️ License & Ethical Compliance\n\nAll source data were processed in strict accordance with their respective licenses (e.g., CC BY, CC0, Remix License). An automated data collection pipeline was employed to ensure that only semantically aligned and single-label pure segments were extracted, respecting the original intent of the data contributors while enhancing their utility for sound separation tasks.\n\n**Important Note**: This repository releases only the **metadata** (JSON files containing mixing parameters and source references) for reproducibility. We do **not** redistribute the original audio files from the source datasets. Users must independently download and prepare the source datasets according to their respective licenses and terms of use.\n\n*We thank all original contributors for their invaluable service to the scientific community.*\n","Hive 是一个用于数据高效查询式通用声音分离的语义一致的数据集。该项目提供了预生成的混合音频数据集（9.17 TB，WebDataset 格式），用户无需从元数据重新混音即可使用。核心功能包括一键式的 AudioSep 和 FlowSep 推理脚本，以及一个统一的 Gradio 演示界面，支持在单个界面中进行两种模型的演示。项目使用 Python 语言编写，并托管于 GitHub 上，采用 Apache License 2.0 许可。Hive 适用于需要高效处理和分离复杂音频场景的研究人员和开发者，特别是在语音识别、音乐信息检索等领域。",2,"2026-06-11 03:59:02","CREATED_QUERY"]