[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71956":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},71956,"ACE-Step-1.5","ace-step\u002FACE-Step-1.5","ace-step","The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.","https:\u002F\u002Facemusic.ai\u002F",null,"Python",10972,1328,83,116,0,76,201,733,228,119.37,"MIT License",false,"main",true,[27],"text2music","2026-06-12 04:01:02","\u003Ch1 align=\"center\">ACE-Step 1.5\u003C\u002Fh1>\n\u003Ch1 align=\"center\">Pushing the Boundaries of Open-Source Music Generation\u003C\u002Fh1>\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Facemusic.ai\">ACEMusic\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Face-step.github.io\u002Face-step-v1.5.github.io\u002F\">Project\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002FAce-Step1.5\">Hugging Face\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FACE-Step\u002FAce-Step1.5\">ModelScope\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FACE-Step\u002FAce-Step-v1.5\">Space Demo\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FPeWDxrkdj7\">Discord\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.00744\">Technical Report\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Face-step\u002Fawesome-ace-step\">Awesome ACE-Step\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\".\u002Fassets\u002Forganization_logos.png\" height=\"80\" alt=\"StepFun Logo\" style=\"vertical-align: middle;\">\n    &nbsp;&nbsp;\n    \u003Ca href=\"https:\u002F\u002Facemusic.ai\">\n        \u003Cimg src=\".\u002Fassets\u002Facemusic-logo.svg\" height=\"57\" alt=\"ACEMusic - Try ACE-Step Online\" style=\"vertical-align: middle; position: relative; top: 2px;\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n## 📰 News\n\n> 🎵 **Want a faster & more stable experience? Try [acemusic.ai](https:\u002F\u002Facemusic.ai) — 100% free!**\n\n- **[2026-04-02] 🎉 ACE-Step 1.5 XL (4B DiT) Released!** — We introduce the XL series with a 4B-parameter DiT decoder for higher audio quality. Three variants available: [xl-base](https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002Facestep-v15-xl-base), [xl-sft](https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002Facestep-v15-xl-sft), [xl-turbo](https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002Facestep-v15-xl-turbo). Requires ≥12GB VRAM (with offload), ≥20GB recommended. All LM models fully compatible. See [Model Zoo](#-model-zoo) for details.\n\n## Table of Contents\n\n- [📰 News](#-news)\n- [✨ Features](#-features)\n- [⚡ Quick Start](#-quick-start)\n- [🚀 Launch Scripts](#-launch-scripts)\n- [📚 Documentation](#-documentation)\n- [📖 Tutorial](#-tutorial)\n- [🏗️ Architecture](#️-architecture)\n- [🦁 Model Zoo](#-model-zoo)\n- [🔬 Benchmark](#-benchmark)\n\n## 📝 Abstract\n🚀 We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves quality beyond most commercial music models while remaining extremely fast—under 2 seconds per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM, and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style.\n\n🌉 At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). ⚡ Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. 🎚️\n\n🔮 Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. 🎸\n\n\n## ✨ Features\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\".\u002Fassets\u002Fapplication_map.png\" width=\"100%\" alt=\"ACE-Step Framework\">\n\u003C\u002Fp>\n\n### ⚡ Performance\n- ✅ **Ultra-Fast Generation** — Under 2s per full song on A100, under 10s on RTX 3090 (0.5s to 10s on A100 depending on think mode & diffusion steps)\n- ✅ **Flexible Duration** — Supports 10 seconds to 10 minutes (600s) audio generation\n- ✅ **Batch Generation** — Generate up to 8 songs simultaneously\n\n### 🎵 Generation Quality\n- ✅ **Commercial-Grade Output** — Quality beyond most commercial music models (between Suno v4.5 and Suno v5)\n- ✅ **Rich Style Support** — 1000+ instruments and styles with fine-grained timbre description\n- ✅ **Multi-Language Lyrics** — Supports 50+ languages with lyrics prompt for structure & style control\n\n### 🎛️ Versatility & Control\n\n| Feature | Description |\n|---------|-------------|\n| ✅ Reference Audio Input | Use reference audio to guide generation style |\n| ✅ Cover Generation | Create covers from existing audio |\n| ✅ Repaint & Edit | Selective local audio editing and regeneration |\n| ✅ Track Separation | Separate audio into individual stems |\n| ✅ Multi-Track Generation | Add layers like Suno Studio's \"Add Layer\" feature |\n| ✅ Vocal2BGM | Auto-generate accompaniment for vocal tracks |\n| ✅ Metadata Control | Control duration, BPM, key\u002Fscale, time signature |\n| ✅ Simple Mode | Generate full songs from simple descriptions |\n| ✅ Query Rewriting | Auto LM expansion of tags and lyrics |\n| ✅ Audio Understanding | Extract BPM, key\u002Fscale, time signature & caption from audio |\n| ✅ LRC Generation | Auto-generate lyric timestamps for generated music |\n| ✅ LoRA Training | One-click annotation & training in Gradio. 8 songs, 1 hour on 3090 (12GB VRAM) |\n| ✅ Quality Scoring | Automatic quality assessment for generated audio |\n\n## 🔔 Staying ahead\nStar ACE-Step on GitHub and be instantly notified of new releases\n![](assets\u002Fstar.gif)\n\n## 🤝 Partners\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fwww.comfy.org\u002F\">\u003Cimg src=\"https:\u002F\u002Fregistry.comfy.org\u002F_next\u002Fstatic\u002Fmedia\u002Flogo_blue.9ac227d3.png\" alt=\"ComfyUI\" height=\"40\" style=\"margin: 5px;\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fzilliz.com\u002F\">\u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F18416694\" alt=\"Zilliz\" height=\"40\" style=\"margin: 5px;\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fmilvus.io\u002F\">\u003Cimg src=\"https:\u002F\u002Fmiro.medium.com\u002Fv2\u002Fresize:fit:2400\u002F1*-VEGyAgcIBD62XtZWavy8w.png\" alt=\"Milvus\" height=\"40\" style=\"margin: 5px;\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fzeabur.com\u002F\">\u003Cimg src=\"https:\u002F\u002Fzeabur.notion.site\u002Fimage\u002Fattachment%3A43bc244b-9a2d-4b96-9646-8392aa6fc862%3Alogo-dark_1.svg?table=block&id=318a221c-948e-8056-b3c0-f9c39ce543ba&spaceId=ba37aeb9-0937-401d-aa41-ce1d3b6ff778&userId=&cache=v2\" alt=\"Zeabur\" height=\"40\" width=\"40\" style=\"margin: 5px;\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fmajiks.studio\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FMajiks-Studio\u002Fmajiks-brand-kit\u002Fmain\u002Flogos\u002Fapp-icon\u002Fpng\u002Fapp-icon-128.png\" alt=\"Majik's Music Studio\" height=\"40\" width=\"40\" style=\"margin: 5px;\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n## ⚡ Quick Start\n\n> 🎵 **Don't want to install locally? Try [acemusic.ai](https:\u002F\u002Facemusic.ai) — 100% free, no GPU required!**\n\n> **Requirements:** Python 3.11-3.12, CUDA GPU recommended (also supports MPS \u002F ROCm \u002F Intel XPU \u002F CPU)\n> \n> **Note:** ROCm on Windows requires Python 3.12 (AMD officially provides Python 3.12 wheels only)\n\n```bash\n# 1. Install uv\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh          # macOS \u002F Linux\n# powershell -ExecutionPolicy ByPass -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"  # Windows\n\n# 2. Clone & install\ngit clone https:\u002F\u002Fgithub.com\u002FACE-Step\u002FACE-Step-1.5.git\ncd ACE-Step-1.5\nuv sync\n\n# 3. Launch Gradio UI (models auto-download on first run)\nuv run acestep\n\n# Or launch REST API server\nuv run acestep-api\n```\n\nOpen http:\u002F\u002Flocalhost:7860 (Gradio) or http:\u002F\u002Flocalhost:8001 (API).\n\n> 📦 **Windows users:** A [portable package](https:\u002F\u002Ffiles.acemusic.ai\u002Facemusic\u002Fwin\u002FACE-Step-1.5.7z) with pre-installed dependencies is available. See [Installation Guide](.\u002Fdocs\u002Fen\u002FINSTALL.md#-windows-portable-package).\n\n> 📦 **MacOS users:** A [portable package](https:\u002F\u002Ffiles.acemusic.ai\u002Facemusic\u002Fmac\u002FACE-Step-1.5.zip) with pre-installed dependencies is available. See [Installation Guide](.\u002Fdocs\u002Fen\u002FINSTALL.md#-macos-portable-package).\n\n> 📖 **Full installation guide** (AMD\u002FROCm, Intel GPU, CPU, environment variables, command-line options): [English](.\u002Fdocs\u002Fen\u002FINSTALL.md) | [中文](.\u002Fdocs\u002Fzh\u002FINSTALL.md) | [日本語](.\u002Fdocs\u002Fja\u002FINSTALL.md)\n\n### 💡 Which Model Should I Choose?\n\n| Your GPU VRAM | Recommended DiT | Recommended LM Model | Backend | Notes |\n|---------------|----------------|---------------------|---------|-------|\n| **≤6GB** | 2B turbo | None (DiT only) | — | LM disabled by default; INT8 quantization + full CPU offload |\n| **6-8GB** | 2B turbo | `acestep-5Hz-lm-0.6B` | `pt` | Lightweight LM with PyTorch backend |\n| **8-16GB** | 2B turbo\u002Fsft | `acestep-5Hz-lm-0.6B` \u002F `1.7B` | `vllm` | 0.6B for 8-12GB, 1.7B for 12-16GB |\n| **16-20GB** | 2B sft or XL turbo | `acestep-5Hz-lm-1.7B` | `vllm` | XL requires CPU offload below 20GB |\n| **20-24GB** | XL turbo\u002Fsft | `acestep-5Hz-lm-1.7B` | `vllm` | XL fits without offload; 4B LM available |\n| **≥24GB** | XL sft (or xl-base for extract\u002Flego\u002Fcomplete) | `acestep-5Hz-lm-4B` | `vllm` | Best quality, all models fit without offload |\n\n> **XL (4B) models** (`acestep-v15-xl-*`) offer higher audio quality with ~9GB VRAM for weights (vs ~4.7GB for 2B). They require ≥12GB VRAM (with offload + quantization) or ≥20GB (without offload). All LM models are fully compatible with XL.\n\nThe UI automatically selects the best configuration for your GPU. All settings (LM model, backend, offloading, quantization) are tier-aware and pre-configured.\n\n> 📖 GPU compatibility details: [English](.\u002Fdocs\u002Fen\u002FGPU_COMPATIBILITY.md) | [中文](.\u002Fdocs\u002Fzh\u002FGPU_COMPATIBILITY.md) | [日本語](.\u002Fdocs\u002Fja\u002FGPU_COMPATIBILITY.md) | [한국어](.\u002Fdocs\u002Fko\u002FGPU_COMPATIBILITY.md)\n\n## 🚀 Launch Scripts\n\nReady-to-use launch scripts for all platforms with auto environment detection, update checking, and dependency installation.\n\n| Platform | Scripts | Backend |\n|----------|---------|---------|\n| **Windows** | `start_gradio_ui.bat`, `start_api_server.bat` | CUDA |\n| **Windows (ROCm)** | `start_gradio_ui_rocm.bat`, `start_api_server_rocm.bat` | AMD ROCm |\n| **Linux** | `start_gradio_ui.sh`, `start_api_server.sh` | CUDA |\n| **macOS** | `start_gradio_ui_macos.sh`, `start_api_server_macos.sh` | MLX (Apple Silicon) |\n\n```bash\n# Windows\nstart_gradio_ui.bat\n\n# Linux\nchmod +x start_gradio_ui.sh && .\u002Fstart_gradio_ui.sh\n\n# macOS (Apple Silicon)\nchmod +x start_gradio_ui_macos.sh && .\u002Fstart_gradio_ui_macos.sh\n```\n\n### ⚙️ Customizing Launch Settings\n\n**Recommended:** Create a `.env` file to customize models, ports, and other settings. Your `.env` configuration will survive repository updates.\n\n```bash\n# Copy the example file\ncp .env.example .env\n\n# Edit with your preferred settings\n# Examples in .env:\nACESTEP_CONFIG_PATH=acestep-v15-turbo\nACESTEP_LM_MODEL_PATH=acestep-5Hz-lm-1.7B\nPORT=7860\nLANGUAGE=en\n```\n\n> 📖 **Script configuration & customization:** [English](.\u002Fdocs\u002Fen\u002FINSTALL.md#-launch-scripts) | [中文](.\u002Fdocs\u002Fzh\u002FINSTALL.md#-启动脚本) | [日本語](.\u002Fdocs\u002Fja\u002FINSTALL.md#-起動スクリプト)\n\n## 📚 Documentation\n\n### Usage Guides\n\n| Method | Description | Documentation |\n|--------|-------------|---------------|\n| 🖥️ **Gradio Web UI** | Interactive web interface for music generation | [Guide](.\u002Fdocs\u002Fen\u002FGRADIO_GUIDE.md) |\n| 🧭 **UI Support Baseline** | Supported UI boundary and future UI parity checklist | [Guide](.\u002Fdocs\u002Fen\u002FUI_SUPPORT.md) |\n| 🎛️ **VST3 Plugin** | Standalone VST3 plugin (C++\u002FGGML) for DAW integration | [acestep.vst3](https:\u002F\u002Fgithub.com\u002Face-step\u002Facestep.vst3) |\n| 🐍 **Python API** | Programmatic access for integration | [Guide](.\u002Fdocs\u002Fen\u002FINFERENCE.md) |\n| 🌐 **REST API** | HTTP-based async API for services | [Guide](.\u002Fdocs\u002Fen\u002FAPI.md) |\n| ⌨️ **CLI** | Interactive wizard and configuration | [Guide](.\u002Fdocs\u002Fen\u002FCLI.md) |\n\n### Setup & Configuration\n\n| Topic | Documentation |\n|-------|---------------|\n| 📦 Installation (all platforms) | [English](.\u002Fdocs\u002Fen\u002FINSTALL.md) \\| [中文](.\u002Fdocs\u002Fzh\u002FINSTALL.md) \\| [日本語](.\u002Fdocs\u002Fja\u002FINSTALL.md) |\n| 🎮 GPU Compatibility | [English](.\u002Fdocs\u002Fen\u002FGPU_COMPATIBILITY.md) \\| [中文](.\u002Fdocs\u002Fzh\u002FGPU_COMPATIBILITY.md) \\| [日本語](.\u002Fdocs\u002Fja\u002FGPU_COMPATIBILITY.md) |\n| 🔧 GPU Troubleshooting | [English](.\u002Fdocs\u002Fen\u002FGPU_TROUBLESHOOTING.md) |\n| 🔬 Benchmark & Profiling | [English](.\u002Fdocs\u002Fen\u002FBENCHMARK.md) \\| [中文](.\u002Fdocs\u002Fzh\u002FBENCHMARK.md) |\n\n### Multi-Language Docs\n\n| Language | API | Gradio | Inference | Tutorial | LoRA Training | Install | Benchmark |\n|----------|-----|--------|-----------|----------|---------------|---------|-----------|\n| 🇺🇸 English | [Link](.\u002Fdocs\u002Fen\u002FAPI.md) | [Link](.\u002Fdocs\u002Fen\u002FGRADIO_GUIDE.md) | [Link](.\u002Fdocs\u002Fen\u002FINFERENCE.md) | [Link](.\u002Fdocs\u002Fen\u002FTutorial.md) | [Link](.\u002Fdocs\u002Fen\u002FLoRA_Training_Tutorial.md) | [Link](.\u002Fdocs\u002Fen\u002FINSTALL.md) | [Link](.\u002Fdocs\u002Fen\u002FBENCHMARK.md) |\n| 🇨🇳 中文 | [Link](.\u002Fdocs\u002Fzh\u002FAPI.md) | [Link](.\u002Fdocs\u002Fzh\u002FGRADIO_GUIDE.md) | [Link](.\u002Fdocs\u002Fzh\u002FINFERENCE.md) | [Link](.\u002Fdocs\u002Fzh\u002FTutorial.md) | [Link](.\u002Fdocs\u002Fzh\u002FLoRA_Training_Tutorial.md) | [Link](.\u002Fdocs\u002Fzh\u002FINSTALL.md) | [Link](.\u002Fdocs\u002Fzh\u002FBENCHMARK.md) |\n| 🇯🇵 日本語 | [Link](.\u002Fdocs\u002Fja\u002FAPI.md) | [Link](.\u002Fdocs\u002Fja\u002FGRADIO_GUIDE.md) | [Link](.\u002Fdocs\u002Fja\u002FINFERENCE.md) | [Link](.\u002Fdocs\u002Fja\u002FTutorial.md) | [Link](.\u002Fdocs\u002Fja\u002FLoRA_Training_Tutorial.md) | [Link](.\u002Fdocs\u002Fja\u002FINSTALL.md) | — |\n| 🇰🇷 한국어 | [Link](.\u002Fdocs\u002Fko\u002FAPI.md) | [Link](.\u002Fdocs\u002Fko\u002FGRADIO_GUIDE.md) | [Link](.\u002Fdocs\u002Fko\u002FINFERENCE.md) | [Link](.\u002Fdocs\u002Fko\u002FTutorial.md) | [Link](.\u002Fdocs\u002Fko\u002FLoRA_Training_Tutorial.md) | — | — |\n\n## 📖 Tutorial\n\n**🎯 Must Read:** Comprehensive guide to ACE-Step 1.5's design philosophy and usage methods.\n\n| Language | Link |\n|----------|------|\n| 🇺🇸 English | [English Tutorial](.\u002Fdocs\u002Fen\u002FTutorial.md) |\n| 🇨🇳 中文 | [中文教程](.\u002Fdocs\u002Fzh\u002FTutorial.md) |\n| 🇯🇵 日本語 | [日本語チュートリアル](.\u002Fdocs\u002Fja\u002FTutorial.md) |\n\nThis tutorial covers: mental models and design philosophy, model architecture and selection, input control (text and audio), inference hyperparameters, random factors and optimization strategies.\n\n## 🔨 Train\n\n📖 **LoRA Training Tutorial** — step-by-step guide covering data preparation, annotation, preprocessing, and training:\n\n| Language | Link |\n|----------|------|\n| 🇺🇸 English | [LoRA Training Tutorial](.\u002Fdocs\u002Fen\u002FLoRA_Training_Tutorial.md) |\n| 🇨🇳 中文 | [LoRA 训练教程](.\u002Fdocs\u002Fzh\u002FLoRA_Training_Tutorial.md) |\n| 🇯🇵 日本語 | [LoRA トレーニングチュートリアル](.\u002Fdocs\u002Fja\u002FLoRA_Training_Tutorial.md) |\n| 🇰🇷 한국어 | [LoRA 학습 튜토리얼](.\u002Fdocs\u002Fko\u002FLoRA_Training_Tutorial.md) |\n\nSee also the **LoRA Training** tab in Gradio UI for one-click training, or [Gradio Guide - LoRA Training](.\u002Fdocs\u002Fen\u002FGRADIO_GUIDE.md#lora-training) for UI reference.\n\n🔧 **Advanced Training with [Side-Step](https:\u002F\u002Fgithub.com\u002Fkoda-dernet\u002FSide-Step)** — CLI-based training toolkit with corrected timestep sampling, LoKR adapters, VRAM optimization, gradient sensitivity analysis, and more. See the [Side-Step documentation](.\u002Fdocs\u002Fsidestep\u002FGetting%20Started.md).\n\n## 🏗️ Architecture\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\".\u002Fassets\u002FACE-Step_framework.png\" width=\"100%\" alt=\"ACE-Step Framework\">\n\u003C\u002Fp>\n\n## 🦁 Model Zoo\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\".\u002Fassets\u002Fmodel_zoo.png\" width=\"100%\" alt=\"Model Zoo\">\n\u003C\u002Fp>\n\n### DiT Models\n\n| DiT Model | Pre-Training | SFT | RL | CFG | Step | Refer audio | Text2Music | Cover | Repaint | Extract | Lego | Complete | Quality | Diversity | Fine-Tunability | Hugging Face |\n|-----------|:------------:|:---:|:--:|:---:|:----:|:-----------:|:----------:|:-----:|:-------:|:-------:|:----:|:--------:|:-------:|:---------:|:---------------:|--------------|\n| `acestep-v15-base` | ✅ | ❌ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | High | Easy | [Link](https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002Facestep-v15-base) |\n| `acestep-v15-sft` | ✅ | ✅ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | High | Medium | Easy | [Link](https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002Facestep-v15-sft) |\n| `acestep-v15-turbo` | ✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | [Link](https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002FAce-Step1.5) |\n\n### XL (4B) DiT Models\n\n> XL models use a larger 4B-parameter DiT decoder (~9GB bf16) for higher audio quality. They require ≥12GB VRAM (with offload + quantization) or ≥20GB (without offload). All LM models are fully compatible.\n\n| DiT Model | Pre-Training | SFT | RL | CFG | Step | Refer audio | Text2Music | Cover | Repaint | Extract | Lego | Complete | Quality | Diversity | Fine-Tunability | Hugging Face |\n|-----------|:------------:|:---:|:--:|:---:|:----:|:-----------:|:----------:|:-----:|:-------:|:-------:|:----:|:--------:|:-------:|:---------:|:---------------:|--------------|\n| `acestep-v15-xl-base` | ✅ | ❌ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | High | High | Easy | [Link](https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002Facestep-v15-xl-base) |\n| `acestep-v15-xl-sft` | ✅ | ✅ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Easy | [Link](https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002Facestep-v15-xl-sft) |\n| `acestep-v15-xl-turbo` | ✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | [Link](https:\u002F\u002Fhuggingface.co\u002FACE-Step\u002Facestep-v15-xl-turbo) |\n\n### LM Models\n\n| LM Model | Pretrain from | Pre-Training | SFT | RL | CoT metas | Query rewrite | Audio Understanding | Composition Capability | Copy Melody | Hugging Face |\n|----------|---------------|:------------:|:---:|:--:|:---------:|:-------------:|:-------------------:|:----------------------:|:-----------:|--------------|\n| `acestep-5Hz-lm-0.6B` | Qwen3-0.6B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Weak | ✅ |\n| `acestep-5Hz-lm-1.7B` | Qwen3-1.7B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Medium | ✅ |\n| `acestep-5Hz-lm-4B` | Qwen3-4B | ✅ | ✅ | ✅ | ✅ | ✅ | Strong | Strong | Strong | ✅ |\n\n## 🔬 Benchmark\n\nACE-Step 1.5 includes `profile_inference.py`, a profiling & benchmarking tool that measures LLM, DiT, and VAE timing across devices and configurations.\n\n```bash\npython profile_inference.py                        # Single-run profile\npython profile_inference.py --mode benchmark       # Configuration matrix\n```\n\n> 📖 **Full guide** (all modes, CLI options, output interpretation): [English](.\u002Fdocs\u002Fen\u002FBENCHMARK.md) | [中文](.\u002Fdocs\u002Fzh\u002FBENCHMARK.md)\n\n## 📜 License & Disclaimer\n\nThis project is licensed under [MIT](.\u002FLICENSE)\n\nACE-Step enables original music generation across diverse genres, with applications in creative production, education, and entertainment. While designed to support positive and artistic use cases, we acknowledge potential risks such as unintentional copyright infringement due to stylistic similarity, inappropriate blending of cultural elements, and misuse for generating harmful content. To ensure responsible use, we encourage users to verify the originality of generated works, clearly disclose AI involvement, and obtain appropriate permissions when adapting protected styles or materials. By using ACE-Step, you agree to uphold these principles and respect artistic integrity, cultural diversity, and legal compliance. The authors are not responsible for any misuse of the model, including but not limited to copyright violations, cultural insensitivity, or the generation of harmful content.\n\n🔔 Important Notice  \nThe only official website for the ACE-Step project is our GitHub Pages site.    \n We do not operate any other websites.  \n🚫 Fake domains include but are not limited to:\nac\\*\\*p.com, a\\*\\*p.org, a\\*\\*\\*c.org  \n⚠️ Please be cautious. Do not visit, trust, or make payments on any of those sites.\n\n## 🌐 Community & Ecosystem\n\nCheck out **[Awesome ACE-Step](https:\u002F\u002Fgithub.com\u002Face-step\u002Fawesome-ace-step)** — a curated list of community projects, alternative UIs, ComfyUI nodes, cloud deployments, training tools, and more built around ACE-Step.\n\n## 🙏 Acknowledgements\n\nThis project is co-led by ACE Studio and StepFun.\n\n\n## 📖 Citation\n\nIf you find this project useful for your research, please consider citing:\n\n```BibTeX\n@misc{gong2026acestep,\n\ttitle={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},\n\tauthor={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, \n\thowpublished={\\url{https:\u002F\u002Fgithub.com\u002Face-step\u002FACE-Step-1.5}},\n\tyear={2026},\n\tnote={GitHub repository}\n}\n```\n","ACE-Step 1.5 是一个强大的本地音乐生成模型，其性能超越了大多数商业替代品，支持Mac、AMD、Intel和CUDA设备。该项目的核心功能包括通过文本生成音乐，并采用了新颖的混合架构，其中语言模型作为全能规划器，将简单的用户查询转化为详细的歌曲蓝图，并通过链式思维合成元数据、歌词等来指导扩散变换器（DiT）。此外，该模型能够在消费级硬件上运行，使用不到4GB的显存即可完成操作，还支持用户通过少量样本训练LoRA以实现个性化风格捕捉。ACE-Step 1.5适用于需要高质量音乐创作但又希望保持低成本的个人创作者或小型团队，尤其适合那些追求快速迭代与创新的音乐制作场景。",2,"2026-06-11 03:39:41","high_star"]