[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80815":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":36,"readmeContent":37,"aiSummary":38,"trendingCount":15,"starSnapshotCount":15,"syncStatus":39,"lastSyncTime":40,"discoverSource":41},80815,"studiomi300","bladedevoff\u002Fstudiomi300","bladedevoff","Director Agent + vision critic + image, video, music & voice models - all on a single AMD Instinct MI300X.","https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flablab-ai-amd-developer-hackathon\u002Fstudiomi300",null,"Python",39,8,38,0,1,43.46,"MIT License",false,"main",true,[23,24,25,26,27,28,29,30,31,32,33,34,35],"ace-step","ai","amd","flux","kokoro","music-generation","qwen","rocm","studio","video","video-generation","video-pipeline","wan","2026-06-11 04:07:15","# studiomi300\n\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-blue.svg)](LICENSE)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11+-blue.svg)](https:\u002F\u002Fwww.python.org\u002F)\n[![ROCm](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FROCm-7.2-red.svg)](https:\u002F\u002Frocm.docs.amd.com\u002F)\n[![AMD MI300X](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FAMD-MI300X-orange.svg)](https:\u002F\u002Fwww.amd.com\u002Fen\u002Fproducts\u002Faccelerators\u002Finstinct\u002Fmi300\u002Fmi300x.html)\n[![Open Source](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fopen%20source-Apache%202.0%20%2F%20MIT-success.svg)](#license)\n\nOne prompt → 30s cinematic reel. End-to-end on a single AMD Instinct MI300X.\nBuilt solo for the AMD Developer Hackathon, May 2026.\n\n## architecture\n\n```mermaid\nflowchart LR\n    P([prompt]) --> D[Director Agent\u003Cbr\u002F>Qwen3.5-35B]\n    D --> CM[Character Masters\u003Cbr\u002F>FLUX.2 klein]\n    CM --> KF[Per-shot Keyframes\u003Cbr\u002F>FLUX.2 reference]\n    KF --> AN[Animation\u003Cbr\u002F>Wan2.2-I2V-A14B]\n    AN --> VC{Vision Critic\u003Cbr\u002F>Qwen3.5-35B}\n    VC -- score &lt; 7 --> AN\n    VC -- score ≥ 7 --> MX[ffmpeg mix]\n    D --> MU[Music\u003Cbr\u002F>ACE-Step v1]\n    D --> VO[Narration\u003Cbr\u002F>Kokoro-82M, 9 lang]\n    MU --> MX\n    VO --> MX\n    MX --> O([30s mp4])\n\n    classDef llm fill:#a78bfa,color:#020617,stroke:#7c3aed\n    classDef diff fill:#f472b6,color:#020617,stroke:#db2777\n    classDef vid fill:#fbbf24,color:#020617,stroke:#d97706\n    classDef io fill:#94a3b8,color:#020617,stroke:#475569\n    class D,VC llm\n    class CM,KF diff\n    class AN vid\n    class MU,VO,MX io\n```\n\nThe Director also doubles as the Vision Critic — same Qwen3.5-35B checkpoint, two roles, reloaded between phases. 192 GB HBM3 is what lets four different architectures share one card.\n\n## what it does\n\n```\npython generate.py --prompt \"a young woman walks through neon-lit Tokyo at night and meets two friends\" --out outputs\u002Fdemo --critic\n```\n\n~45 minutes later you get `outputs\u002Fdemo\u002Freel_final.mp4` - six 5-second shots,\ncharacter-consistent, with music and per-shot voice-over, mixed.\n\n## how it works\n\nEight stages, all on the same GPU:\n\n1. **Director** - `Qwen3.5-35B-A3B` via vLLM. Plans 6 shots, character portraits,\n   music brief, per-shot VO script, and the language to narrate in.\n2. **Masters** - `FLUX.2 [klein] 4B` text-to-image, one canonical frame per character.\n3. **Per-shot keyframes** - same klein, reference editing, conditioned on the master.\n4. **Animation** - `Wan2.2-I2V-A14B` with FBCache (lossless 2x) + selective\n   `torch.compile`. FLF2V mode on `cut: false` continuation arcs locks identity at\n   both ends.\n5. **Vision critic** - Qwen3.5 re-loads, scores 4 frames per clip with structured\n   labels (`STYLIZED_AI_LOOK`, `CHARACTER_DRIFT`, `CAMERA_IGNORED`, ...). Bumps\n   seed and re-renders if `overall \u003C 7`. Up to 3 attempts.\n6. **Music** - `ACE-Step v1` 3.5B, 30s instrumental from the brief.\n7. **Voice-over** - `Kokoro-82M`, 9 languages, one wav per shot, ffmpeg `adelay`s\n   them onto the music bed at clip-start offsets.\n8. **Mix** - `ffmpeg` concat + lanczos upscale + loudnorm.\n\nThe Director also doubles as the vision critic. Same checkpoint, two roles.\n\n## stack\n\n| Stage | Model | License |\n|---|---|---|\n| Planner \u002F critic | Qwen3.5-35B-A3B | Apache 2.0 |\n| Image | FLUX.2 [klein] 4B | Apache 2.0 |\n| Video | Wan2.2-I2V-A14B | Apache 2.0 |\n| Music | ACE-Step v1 3.5B | Apache 2.0 |\n| TTS | Kokoro-82M | Apache 2.0 |\n| Serving | vLLM 0.17 | Apache 2.0 |\n| Cache | ParaAttention FBCache | Apache 2.0 |\n| AMD kernels | AITER | MIT |\n\nOutputs are commercially usable. No NC weights anywhere.\n\n## models\n\nThe video stage goes through a registry. `models.yaml` ships two aliases out of the box:\n\n| Alias | Backend | Defaults |\n|---|---|---|\n| `wan2.2-full` (default) | Wan2.2-I2V-A14B + FBCache + selective compile | 832x480, 30\u002F24 steps hero\u002Fb-roll, dual-noise 3.5\u002F3.0 |\n| `wan2.2-hq` | same model, native final resolution | 1280x704, same steps \u002F cfg as full |\n\nPick at run time:\n\n```bash\npython generate.py --prompt \"...\" --out outputs\u002Ftest --video-model wan2.2-hq\n```\n\nCustom alias? Drop a YAML in `~\u002F.studiomi300\u002Fcustom\u002F` pointing at a Python file that implements `backends.base.VideoBackend`. Schema lives in `models.yaml` and `backends\u002Fbase.py`. Collisions with built-in aliases override the built-in (logged warning). Server reload picks up new YAMLs.\n\n## why a single MI300X\n\n192 GB HBM3 lets four very different architectures share one card sequentially.\nOn 24 GB consumer hardware you'd need 4-5 separate machines.\n\n| Phase | Peak VRAM |\n|---|---|\n| Director (Qwen3.5-35B BF16) | ~70 GB |\n| FLUX.2 klein 4B | ~8 GB |\n| Wan2.2-I2V-A14B | ~94 GB |\n| Critic (Qwen3.5 reload) | ~70 GB |\n| ACE-Step v1 | ~12 GB |\n| Kokoro-82M | \u003C1 GB |\n\nEach phase unloads cleanly via `gc.collect()` + `torch.cuda.empty_cache()`\nbefore the next one loads. Director runs in-process for planning then `del`s\nitself before Wan2.2 loads, otherwise OOM.\n\n## run it\n\nTested inside `rocm\u002Fvllm-dev:nightly_main_20260506` (vLLM 0.20.2rc1, torch 2.10,\nROCm 7.2 in-container) on AMD Developer Cloud.\n\n```bash\n# inside the rocm container\npip install -r requirements.txt --no-deps\nexport STUDIOMI_AITER_FP8=0\nexport VLLM_ROCM_USE_AITER=1\npython generate.py --prompt \"your reel idea\" --out outputs\u002Fmyreel --critic\n```\n\nROCm env is set in `generate.py` before any torch import - `PYTORCH_HIP_ALLOC_CONF=expandable_segments:True`,\n`TORCH_BLAS_PREFER_HIPBLASLT=1`, `MIOPEN_FIND_MODE=FAST`, `GPU_MAX_HW_QUEUES=2`,\n`HIP_FORCE_DEV_KERNARG=1`, `HSA_ENABLE_SDMA=0`. If you change those after import\nyou get a silent allocator-profile mismatch and lose ~3 GB.\n\n### multi-GPU\n\nStage routing via env vars (default `cuda:0` for everything):\n\n```bash\nSTUDIOMI_GPU_FLUX=cuda:1 STUDIOMI_GPU_WAN=cuda:0 python generate.py ...\n```\n\nTested only on single-MI300X. Hooks are there for 2+ card setups.\n\n### API server\n\n```bash\nSTUDIO_API_TOKEN=secret uvicorn server:app --host 0.0.0.0 --port 8000\n```\n\n`POST \u002Fjobs` with `{\"prompt\": \"...\", \"use_critic\": true}` returns `{\"job_id\"}`.\n`GET \u002Fjobs\u002F{id}\u002Fstream` is an SSE feed of every stage event (plan, masters,\nkeyframes, clip render, critic verdicts, music, VO, final). Granular artifact\nendpoints under `\u002Fjobs\u002F{id}\u002F{plan,master,keyframe,clip,music,vo,video}`.\n\n## what's optimised\n\n| Knob | Speedup | Note |\n|---|---|---|\n| ParaAttention FBCache (threshold 0.05) | 2.00x | lossless |\n| `torch.compile(transformer_2, mode=\"default\")` | 1.20x | 2.35 min one-time warmup |\n| ROCm env flags | 1.10x | hipBLASLt, expandable_segments, MIOpen FAST mode |\n| flow_shift=5 hero \u002F 8 b-roll | quality | upstream wan_i2v_A14B.py default; I used 12 first and got plastic skin |\n| FLUX.2 klein 4B vs FLUX.1-schnell | ~15x | sub-second keyframes |\n\nEnd-to-end: 25.9 min → 10.4 min per 720p clip on a single MI300X.\n\n## what's not used (and why)\n\n- **AITER FP8 on Wan2.2** - `gemm_a8w8_CK` segfaults on the cross-attn shape\n  (M=512, K=4096, N=5120) on ROCm 7.0; closed standalone on 7.2 but still crashes\n  inside the full pipeline graph (matches `ROCm\u002Faiter#2187`). Code stays behind\n  `STUDIOMI_AITER_FP8=1` env flag; production ships BF16.\n- **MagCache** - diffusers 0.38 calibration-step counter doesn't fire on Wan2.2's\n  dual-transformer schedule.\n- **cache-dit + TaylorSeer** - slower than baseline FBCache on ROCm.\n- **Wan2.2-Lightning I2V LoRA** - V1 only (Aug 2025), V2.0 phased-DMD never landed\n  for I2V; quality drop on hero shots vs full-step.\n- **AITER FA3** - JIT compile for 81×1280×704 attention never finishes.\n- **`torch.compile(mode=\"max-autotune\", fullgraph=True)`** - Dynamo error on\n  Wan2.2 (diffusers#12728).\n- **`channels_last`** - Wan2.2 transformer is rank-5; channels_last is rank-4 only.\n\n## files\n\n```\ngenerate.py        cli entry, runs the pipeline\ndirector.py        director agent + vision critic (one Qwen3.5-35B serves both)\nutils.py           pipeline core: masters, keyframes, render_clips, music, vo, mix\naiter_linear.py    fp8 nn.Linear drop-in for wan2.2 transformer (off by default)\nevents.py          stage event emit (jsonl + EVENT:: stdout marker)\nserver.py          fastapi wrapper with sse + per-artifact endpoints\napp.py             gradio app: showcase + stub generate\nincidents.md       running journal of failures, root causes, fixes\nbenchmarks\u002F        wan2.2 speedup table on mi300x\nspace\u002F             slim showcase-only gradio app for huggingface space\n```\n\n`incidents.md` is honest. Headless violinist and AITER segfault are in there.\n\n## license\n\nMIT for project code. All upstream models are Apache 2.0 \u002F MIT - see the stack table.\n","studiomi300 项目能够根据一个提示生成一段30秒的电影短片，整个过程在单个AMD Instinct MI300X加速卡上完成。其核心功能包括使用Qwen3.5-35B作为导演代理和视觉评判者，通过文本到图像、动画生成、音乐创作以及语音合成等步骤，自动生成高质量的视频内容。技术特点在于利用了192GB HBM3内存使得不同架构模型可以在同一GPU上运行，并且采用了多种先进的AI模型如FLUX.2、Wan2.2-I2V-A14B、ACE-Step v1及Kokoro-82M来处理各个环节的任务。该项目适用于需要快速创建带有配乐和旁白的短视频场景，例如广告制作、创意展示或个人视频日志等领域。",2,"2026-06-11 04:02:27","CREATED_QUERY"]