[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-76107":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":11,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":21,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},76107,"Khala","Khala-Music-AI\u002FKhala","Khala-Music-AI",null,"Python",269,22,9,6,0,11,66,33,4.09,false,"main",true,[],"2026-06-12 02:03:40","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\".\u002Fassets\u002Flogo.png\" width=\"280\" alt=\"Khala Logo\" \u002F>\n\n# High-Fidelity Song Generation With a Unified Acoustic-Token Pipeline\n\nEnglish | [中文](.\u002FREADME_zh.md)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n\u003Ca href=\"https:\u002F\u002Fkhala-music-ai.github.io\u002FKhala-demo\u002F\">\n  \u003Cimg alt=\"Demo\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%8E%A7%20Demo-Online-brightgreen\">\n\u003C\u002Fa>\n\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.01790\">\n  \u003Cimg alt=\"Paper\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%93%84%20Paper-arXiv-b31b1b\">\n\u003C\u002Fa>\n\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fliujiafeng\u002FKhala-MusicGeneration-v1.0\">\n  \u003Cimg alt=\"Model Weights\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Model-Hugging%20Face-ffc107\">\n\u003C\u002Fa>\n\u003Ca href=\".\u002FENVIRONMENT_SETUP.md\">\n  \u003Cimg alt=\"Environment Setup\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%9B%A0%20Environment-Setup-4c8eda\">\n\u003C\u002Fa>\n\u003Ca href=\".\u002Fbackend\u002FREADME_backend.md\">\n  \u003Cimg alt=\"Backend Docs\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A7%A0%20Backend-Docs-6f42c1\">\n\u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n## ✨ What Is Khala?\n\nKhala is an open-source system for high-fidelity song generation, capable of generating complete songs from text descriptions and lyric conditions. Unlike approaches built around semantic tokens, diffusion models, or multi-stage audio generation stacks, Khala follows a unified acoustic-token route and generates both coarse musical structure and fine acoustic detail within the same discrete audio representation space.\n\nThe core characteristics of Khala include:\n\n- **Full-song generation**: designed for complete song generation rather than short clips or loop-style accompaniment.\n- **Text and lyric control**: supports natural-language prompts and lyrics to control style, mood, vocals, and content.\n- **Unified acoustic-token representation**: built on a 64-layer RVQ acoustic token hierarchy that represents audio as coarse-to-fine discrete acoustic tokens.\n- **Two-stage generation pipeline**: a backbone first generates coarse acoustic tokens, then a super-resolution model completes higher RVQ token layers, and finally a decoder reconstructs the waveform.\n- **Complete system implementation**: includes a frontend UI, a FastAPI backend dispatcher, a single-GPU inference worker, model loading, and the end-to-end audio generation path rather than just standalone inference scripts.\n\n## 📰 News\n\n- `⚠️ [2026-05-07]` We have identified a potential issue that may significantly affect inference quality. The problem is currently under investigation and may be related to numerical precision. Until this notice is removed, please treat current generation quality as unstable.\n\n### ✅ Updated\n\n- `[2026-05-16]` The online audio demo page is now available: [Khala Demo](https:\u002F\u002Fkhala-music-ai.github.io\u002FKhala-demo\u002F)\n- `[2026-05-11]` Backend inference launch now supports single-GPU safe startup by default, plus multi-GPU and runtime-mode overrides for deployment compatibility.\n- `[2026-05-05]` The arXiv paper is now available: [Khala: Scaling Acoustic Token Language Models Toward High-Fidelity Music Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.01790)\n- `[2026-05-01]` The codebase, environment documentation, and Dockerfile have been cleaned up for release.\n\n### ⏳ TODOs\n\n- `[Coming Soon]` A full deployment guide for musicians and beginner users.\n- `[Coming Soon]` Discord community server.\n\n### 🖥️ Web UI\n#### Prompt Mode\n![Khala Frontend Demo 1](.\u002Fassets\u002Ffront_1.png)\n#### Tag Mode\n![Khala Frontend Demo 2](.\u002Fassets\u002Ffront_2.png)\n\n### 🎧 Audio Samples\n\nListen to generated samples on the online demo page: [Khala Demo](https:\u002F\u002Fkhala-music-ai.github.io\u002FKhala-demo\u002F)\n\n## ✅ Runtime Requirements\n\nThe current release is mainly intended for researchers and developers who are already familiar with GPU servers.\n\n- NVIDIA GPU, with 24GB or more VRAM recommended for the full inference pipeline, such as an RTX 4090 or a higher-tier GPU.\n- Docker and NVIDIA Container Toolkit.\n- A CUDA-compatible NVIDIA driver.\n- Python and Node.js are already included in the prebuilt image.\n- Model weights need to be downloaded into the `checkpoints\u002F` directory at the repository root.\n\n## 🚀 Quick Start\n\nThis section is intended for researchers and developers who are already comfortable with basic Docker and CUDA workflows, and provides the shortest path to running the system.\n\nIf you want to configure the environment step by step from a clean NGC container, please read:\n\n- [ENVIRONMENT_SETUP.md](.\u002FENVIRONMENT_SETUP.md)\n- [ENVIRONMENT_SETUP_zh.md](.\u002FENVIRONMENT_SETUP_zh.md)\n\nIf you want to understand the backend structure and runtime logic, please read:\n\n- [backend\u002FREADME_backend.md](.\u002Fbackend\u002FREADME_backend.md)\n- [backend\u002FREADME_backend_zh.md](.\u002Fbackend\u002FREADME_backend_zh.md)\n\n### 1. Prepare the runtime environment\nThe currently available prebuilt image is:\n```bash\ndocker pull ghcr.io\u002Fdavidliujiafeng\u002Fkhala-env:ngc25.02-node24\n\ndocker run --gpus all -it --rm \\\n  --name khala \\\n  -p 30869:30869 \\\n  -p 8889:8889 \\\n  ghcr.io\u002Fdavidliujiafeng\u002Fkhala-env:ngc25.02-node24\n```\n> Note: the command above uses `--rm`, so files created inside the container will be removed after the container exits. If you want a long-lived development container or want to keep downloaded model weights, use a mounted directory or remove `--rm`.\n\n### 2. Clone the repository\nAfter entering the container, run:\n```bash\ncd \u002Fworkspace\ngit clone https:\u002F\u002Fgithub.com\u002FKhala-Music-AI\u002FKhala.git\ncd Khala\n```\n\n### 3. Download the model checkpoints\n\nModel repository:\n\n- [Hugging Face: liujiafeng\u002FKhala-MusicGeneration-v1.0](https:\u002F\u002Fhuggingface.co\u002Fliujiafeng\u002FKhala-MusicGeneration-v1.0)\n\nFrom the repository root, run:\n\n```bash\nmkdir -p checkpoints\nhf download liujiafeng\u002FKhala-MusicGeneration-v1.0 --local-dir checkpoints\n```\n\nThis command downloads the model repository contents into the local `checkpoints\u002F` directory.\n\n### 4. Start the backend\n\n```bash\ncd \u002Fworkspace\u002FKhala\u002Fbackend\nbash run_backend.sh\n```\n\nThe default launcher now starts in a single-GPU safe mode. Advanced users can also select specific GPU ids and switch between `one_shot` and `keep_loaded` runtime modes from the same script; see [backend\u002FREADME_backend.md](.\u002Fbackend\u002FREADME_backend.md) for details.\n\n### 5. Start the frontend\n\nIn another terminal, run:\n\n```bash\ncd \u002Fworkspace\u002FKhala\u002Ffrontend\nnpm install\nnpm run dev\n```\n\n### 6. Open the web UI\n\nDefault URL:\n\n- [http:\u002F\u002F127.0.0.1:30869](http:\u002F\u002F127.0.0.1:30869)\n\n## 🧠 System Overview\n\nThe current system has three layers:\n\n- Frontend: accepts prompts, lyrics, and generation settings, and displays results.\n- API dispatcher: receives requests, creates jobs, queues them, and dispatches them to idle workers.\n- Inference worker: runs backbone, super-resolution, and decoder inference.\n\nThe request path is:\n\n```mermaid\nflowchart LR\n    A[\"Frontend UI\"] --> B[\"backend_api.py\"]\n    B --> C[\"backend_worker.py\"]\n    C --> D[\"Backbone\"]\n    D --> E[\"Super-resolution\"]\n    E --> F[\"Decoder\"]\n    F --> G[\"Generated Audio\"]\n    G --> B\n    B --> A\n```\n\n## 🔗 Project Resources\n\n- Demo page: [Khala Demo](https:\u002F\u002Fkhala-music-ai.github.io\u002FKhala-demo\u002F)\n- arXiv paper: [Khala: Scaling Acoustic Token Language Models Toward High-Fidelity Music Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.01790)\n- Model weights: https:\u002F\u002Fhuggingface.co\u002Fliujiafeng\u002FKhala-MusicGeneration-v1.0\n- Environment setup: [ENVIRONMENT_SETUP.md](.\u002FENVIRONMENT_SETUP.md)\n- Backend docs: [backend\u002FREADME_backend.md](.\u002Fbackend\u002FREADME_backend.md)\n\n## 🗂 Repository Structure\n\n```text\nKhala\u002F\n├── backend\u002F\n├── frontend\u002F\n├── core\u002F\n├── models\u002F\n├── checkpoints\u002F\n├── assets\u002F\n├── Dockerfile\n├── requirements.txt\n├── ENVIRONMENT_SETUP.md\n└── ENVIRONMENT_SETUP_zh.md\n```\n\nMain directories:\n\n- `frontend\u002F`: frontend pages and the Vite project.\n- `backend\u002F`: backend API, worker, and launcher scripts.\n- `core\u002F`: project-specific core modules.\n- `models\u002F`: Megatron, decoder, and tokenizer related code.\n- `checkpoints\u002F`: model checkpoint directory.\n- `assets\u002F`: images used by the README and demo materials.\n\n## 📚 Citation\n\nIf this project is helpful to your research or development work, you are welcome to cite our paper:\n\n- [Khala: Scaling Acoustic Token Language Models Toward High-Fidelity Music Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.01790)\n\nThe final BibTeX information will be added later to both the paper page and the repository documentation.\n\n## 🙏 Acknowledgements\n\nThe current implementation builds on a number of excellent open-source projects and tools, including but not limited to:\n\n- NVIDIA NGC\n- Megatron \u002F Megatron Core\n- Hugging Face\n- FastAPI\n- Vite \u002F React\n\n## 📜 License\n\nThe model weights are currently intended to be released under `CC BY-NC 4.0` (Creative Commons Attribution-NonCommercial 4.0 International).\n\n## 💬 Contact\n\nFeel free to join the WeChat group for discussion, usage questions, and future updates:\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fwechat_group.jpg\" width=\"320\" alt=\"Khala WeChat Group QR Code\" \u002F>\n\u003C\u002Fdiv>\n","Khala 是一个用于高保真歌曲生成的开源系统，能够根据文本描述和歌词条件生成完整的歌曲。其核心功能包括全曲生成、文本与歌词控制以及统一的声学令牌表示。Khala 采用64层RVQ声学令牌层次结构，通过两阶段生成管道先生成粗略声学令牌，再由超分辨率模型完成更高层次的RVQ令牌，并最终解码为波形。该项目适用于需要高质量音乐创作的场景，如专业音乐制作和个人创意表达。此外，Khala 提供了从前端界面到后端调度及单GPU推理的完整系统实现。",2,"2026-06-11 03:54:30","CREATED_QUERY"]