[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82835":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":26,"discoverSource":27},82835,"sega","rajabi2001\u002Fsega","rajabi2001","Official implementation of \"SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers\"",null,"Python",66,5,2,0,1,7,10,3,2.33,false,"main",[],"2026-06-12 02:04:28","# SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Frajabi2001.github.io\u002Fsega\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-5C6BC0?style=flat-square\" alt=\"Project Page\">\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22668\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-B31B1B?style=flat-square&logo=arxiv&logoColor=white\" alt=\"Paper\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fteaser.png\" alt=\"SEGA teaser\" width=\"900\">\n\u003C\u002Fp>\n\nOfficial inference code for **SEGA**, a training-free method that dynamically rescales attention across RoPE components from the latent's spatial-frequency content at each denoising step. SEGA improves high-resolution synthesis without retraining, new weights, or architecture changes. Implementations are provided for **FLUX** ([`flux_sega\u002F`](flux_sega\u002F)) and **Qwen-Image** ([`qwen_sega\u002F`](qwen_sega\u002F)).\n\n## Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Frajabi2001\u002Fsega.git\ncd sega\n\npip install torch torchvision --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\npip install -r requirements.txt\n```\n\nModel weights are fetched from Hugging Face on first run.\n\n## Usage\n\n**FLUX.1:**\n\n```bash\ncd flux_sega\npython run_flux.py --prompt \"Your prompt here.\" --height 4096 --width 4096\n```\n\n**Qwen-Image:**\n\n```bash\ncd qwen_sega\npython run_qwen.py --prompt \"Your prompt here.\" --height 4096 --width 4096\n```\n\nOutputs are saved under `outputs\u002F` in each subdirectory.\n\n## Multi-GPU inference\n\nGenerating ultra-high-resolution images can exceed the memory of a single GPU. Both `run_flux.py` and `run_qwen.py` accept a `--multi_gpu` flag that distributes the transformer blocks across **all visible CUDA devices** (CLIP and VAE stay on `cuda:0`; for Qwen the text encoder is offloaded to CPU). At least **2 GPUs** must be visible for this flag to take effect.\n\nAs a rule of thumb, you should pass `--multi_gpu` (with two or more GPUs visible) in these cases:\n\n- **Qwen-Image** at **4096×4096 or higher**, when the available GPU does not have enough VRAM for a single-device run.\n- **FLUX** at **6144×6144 or higher**, when the available GPU does not have enough VRAM for a single-device run.\n\nIf a single GPU has enough memory, you can omit `--multi_gpu` and run on one device. If you hit OOM, add `--multi_gpu` and make sure `CUDA_VISIBLE_DEVICES` exposes two or more GPUs.\n\n**Example — FLUX at 6144×6144:**\n\n```bash\ncd flux_sega\nCUDA_VISIBLE_DEVICES=0,1 python run_flux.py \\\n    --prompt \"Your prompt here.\" \\\n    --height 6144 --width 6144 \\\n    --multi_gpu\n```\n\n**Example — Qwen-Image at 4096×4096:**\n\n```bash\ncd qwen_sega\nCUDA_VISIBLE_DEVICES=0,1 python run_qwen.py \\\n    --prompt \"Your prompt here.\" \\\n    --height 4096 --width 4096 \\\n    --multi_gpu\n```\n\n## Citation\n\n```bibtex\n@article{rajabi2026sega,\n  title={SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers},\n  author={Rajabi, Javad and Shaban, Kimia and Roohi, Koorosh and Lindell, David B and Taati, Babak},\n  journal={arXiv preprint arXiv:2605.22668},\n  year={2026}\n}\n```\n\n## Acknowledgments\nThis repository adapts the inference layout and scripts from [DyPE](https:\u002F\u002Fgithub.com\u002Fguyyariv\u002FDyPE).\n","SEGA项目实现了“基于频谱能量引导的注意力机制以提高扩散变换器中的分辨率外推”。其核心功能是通过动态调整RoPE组件的注意力比例来根据潜在空间的频率内容在每个去噪步骤中优化高分辨率图像合成，无需重新训练或修改架构。该项目使用Python编写，并提供了FLUX和Qwen-Image两种实现方式。特别适用于需要生成超高清图像但受限于单GPU内存容量的场景，支持多GPU并行处理以扩展计算能力。","2026-06-11 04:09:21","CREATED_QUERY"]