[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80816":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":12,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":13,"forks30d":13,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":13,"starSnapshotCount":13,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},80816,"live-music-diffusion-models","ZacharyNovack\u002Flive-music-diffusion-models","ZacharyNovack",null,"Python",44,3,1,0,5,6,7,15,1.81,"MIT License",false,"master",true,[],"2026-06-12 02:04:07","# Live Music Diffusion Models\n\n[**Demo Page**](https:\u002F\u002Fstephenbrade.github.io\u002Flmdm-public\u002F) | [**Paper**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22717)\n\nTraining and inference code for Live Music Diffusion Models (LMDMs): streaming, autoregressive music diffusion models. Models\ngenerate audio block-by-block over a sliding context window, supporting live generation. Huge shout-out to the [Stable Audio](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fstable-audio-tools) folks, where this codebase draws heavy inspiration from.\n\nThis is our public facing code repo. For access to development code used during the project, please reach out to znovack@ucsd.edu or brade@mit.edu.\n\n## Install\n\n```bash\n$ pip install .\n```\n\nRequires PyTorch 2.5+ (Flash \u002F Flex Attention). Developed against Python 3.10.\n\n## Models\n\nTwo attention regimes, each available as a plain finetune or as an ARC-forcing model:\n\n| Config | Attention | Type |\n| --- | --- | --- |\n| `saos_encdec.json` | enc-dec (bidirectional context) | finetune |\n| `saos_block_causal.json` | block-causal (sliding-window causal) | finetune |\n| `saos_arc_forcing_encdec.json` | enc-dec | ARC-forcing |\n| `saos_arc_forcing_block_causal.json` | block-causal | ARC-forcing |\n\nConfigs live in `stable_audio_tools\u002Fconfigs\u002Fmodel_configs\u002Ftxt2audio\u002F`.\n\n## Training\n\n```bash\npython train.py \\\n    --model-config stable_audio_tools\u002Fconfigs\u002Fmodel_configs\u002Ftxt2audio\u002F\u003Cconfig>.json \\\n    --dataset-config \u003Cyour_dataset>.json \\\n    --pretrained-ckpt-path \u002Fpath\u002Fto\u002Fbase.ckpt \\\n    --save-dir .\u002Fcheckpoints \\\n    --batch-size 40 --precision 16-mixed --name \u003Crun-name>\n```\n\nTraining should proceed in two stages:\n\n1. **Finetune:** use `saos_encdec.json` or `saos_block_causal.json`. This mirrors standard diffusion finetuning and has the same overall memory bandwidth. Initialize this with your standard favorite music diffusion model (SAO, SAO-Small).\n2.  **ARC-forcing:** use `saos_arc_forcing_encdec.json` or `saos_arc_forcing_block_causal.json`.\n  ARC configs set `training.arc.self_forcing` and pull the teacher\u002Fdiscriminator from the\n  base model; the attention regime is set by `training.inpainting.mask_kwargs.context_router_attention_pattern`. This should be initialized from your finetuned LMDM in the first step. Note that the memory bandwidth here will increase as a function of the rollout length, so plan accordingly.\n\nSee [train.sh](train.sh) for an end-to-end launch example. Training defaults are in [defaults.ini](defaults.ini).\n\n## Inference\n\nStreaming block-AR generation goes through\n[`generate_diffusion_cond_blockar`](stable_audio_tools\u002Finference\u002Fgeneration.py) — it denoises one\n`block_size` block at a time over a sliding context window, optionally reusing a KV cache for\nfast streaming. Set `context_router_attention_pattern` to match the model (`\"enc-dec\"` or\n`\"block-causal\"`) and pass `use_kv_cache=True` for streaming.\n\nA runnable end-to-end example (loading a checkpoint, building conditioning, calling the\nfunction, and decoding) is in [notebooks\u002Finference.ipynb](notebooks\u002Finference.ipynb).\n\n## Roadmap\n\n- [ ] Sketch-control training\n- [ ] More detailed accompaniment training support\n- [ ] ONNX export pipeline\n- [ ] Interface setup\n\n## Citation\n\nIf you use this repo, please cite us at:\n\n```bibtext\n@article{novack2026lmdm,\n  title         = {Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators},\n  author        = {Novack, Zachary and Brade, Stephen and Kim, Haven and Flores Garc{\\'i}a, Hugo and Shikarpur, Nithya and Talegaonkar, Chinmay and Kim, Suwan and Chen, Valerie K. and McAuley, Julian and Berg-Kirkpatrick, Taylor and Huang, Cheng-Zhi Anna},\n  journal       = {arXiv preprint arXiv:2605.22717},\n  year          = {2026},\n  archivePrefix = {arXiv},\n  eprint        = {2605.22717},\n  primaryClass  = {cs.SD},\n  url           = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22717}\n}\n```","Live Music Diffusion Models (LMDMs) 是一个用于生成实时音乐的流式自回归扩散模型。项目提供了训练和推理代码，能够逐块生成音频，并支持滑动窗口上下文，适用于实时音乐生成。该模型基于PyTorch 2.5+构建，采用Flash\u002FFlex Attention技术以提高性能。提供了两种注意力机制配置（编码-解码和块因果），每种机制都有基础微调版本和ARC-forcing版本。适合需要进行高质量实时音频生成的应用场景，如现场音乐表演、在线音频创作等。",2,"2026-06-11 04:02:27","CREATED_QUERY"]