[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1046":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":15,"starSnapshotCount":15,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},1046,"HY-SOAR","Tencent-Hunyuan\u002FHY-SOAR","Tencent-Hunyuan","HY-SOAR:Self-Correction for Optimal Alignment and Refinement in Diffusion Models","",null,"Python",629,64,44,0,7,8,148,21,9.44,"Other",false,"main",true,[26,27],"aigc","diffusion","2026-06-12 02:00:22","\u003Cdiv align=\"center\">\n\n# HY-SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.12617 target=\"_blank\">\u003Cimg src=https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FReport-b5212f.svg?logo=arxiv height=22px>\u003C\u002Fa>\n  \u003Ca href=https:\u002F\u002Fgithub.com\u002FTencent-Hunyuan\u002FHY-SOAR target=\"_blank\">\u003Cimg src=https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode-2ecc71?logo=github&logoColor=black height=22px>\u003C\u002Fa>\n  \u003Ca href=https:\u002F\u002Fhy-soar.github.io\u002F target=\"_blank\">\u003Cimg src=https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHY.SOAR-1abc9c?logo=homeassistant&logoColor=white height=22px>\u003C\u002Fa>\n  \u003Ca href=https:\u002F\u002Fx.com\u002FTencentHunyuan target=\"_blank\">\u003Cimg src=https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHunyuan-black.svg?logo=x height=22px>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fsoar1.png\" width=\"60%\">\n\u003C\u002Fp>\n\n**Beyond SFT and RL: Self-Correction during Generation without Reward Models, Preference Labels, or Negative Samples.**\n\n## 🔥 News\n\n- **April 2026**: 🎉 HY-SOAR open source - Training and evaluation code publicly available.\n\n## 🗂️ Contents\n\n- [🔥 News](#-news)\n- [📖 Introduction](#-introduction)\n- [✨ Key Features](#-key-features)\n- [🖼 Showcases](#-showcases)\n- [📑 Open-Source Plan](#-open-source-plan)\n- [🛠 Environment Setup](#-environment-setup)\n- [🎯 Reward Preparation](#-reward-preparation)\n- [🚀 Usage](#-usage)\n- [📊 Evaluation](#-evaluation)\n- [🧾 Data Format](#-data-format)\n- [📚 Citation](#-citation)\n- [🙏 Acknowledgement](#-acknowledgement)\n\n---\n\n## 📖 Introduction\n\n**HY-SOAR** (Self-Correction for Optimal Alignment and Refinement) is a reward-free post-training method for rectified-flow diffusion models. It targets exposure bias in the denoising trajectory: standard SFT trains the denoiser on ideal forward-noising states from real data, while inference conditions on states produced by the model's own earlier predictions. Once an early denoising step drifts, later steps must recover from states that were not directly optimized, so errors can compound across the trajectory.\n\nInstead of waiting for a terminal reward after a full rollout, SOAR teaches the model to correct its own trajectory errors at the timestep where they occur. Given a clean latent $z_0$, noise endpoint $z_1$, and condition $c$, SOAR:\n1. Samples an on-trajectory noisy state and performs one stop-gradient CFG rollout step with the current model\n2. Re-noises the resulting off-trajectory state toward the same noise endpoint $z_1$ to create auxiliary states\n3. Supervises the denoiser with the analytical correction target $v_{\\mathrm{corr}} = (z_{\\sigma_{t'}} - z_0) \u002F \\sigma_{t'}$\n\nThis gives SOAR an on-policy, dense, and reward-free training signal. The base objective subsumes standard SFT, while the auxiliary correction loss trains on nearby model-induced states, making SOAR a stronger first post-training stage that remains compatible with subsequent reward-based alignment.\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fsoar.png\" alt=\"HY-SOAR Teaser\"\u002F>\n\u003C\u002Fp>\n\n## ✨ Key Features\n\n* 🧭 **Exposure-Bias Correction:** SOAR directly addresses the mismatch between ground-truth training states and model-induced inference states, the source of many compounding denoising failures.\n\n* 🔁 **On-Policy Off-Trajectory Supervision:** Off-trajectory states are produced by the current model's own rollout, so the training distribution co-evolves with the model instead of staying fixed to the SFT data trajectory.\n\n* 🎯 **Reward-Free Dense Objective:** SOAR requires no reward model, preference labels, or negative samples. It provides per-timestep correction supervision and avoids terminal-reward credit assignment.\n\n* 📐 **Geometric Correction Target:** Re-noising uses the same noise endpoint as the base flow-matching pair, keeping auxiliary states near the original transport ray and yielding a concrete correction velocity anchored to $z_0$.\n\n* 🔧 **Compatible Post-Training Stage:** The SOAR loss extends the standard flow-matching objective, so it can replace SFT as a stronger first post-training stage while remaining compatible with later RL alignment.\n\n## 🖼 Showcases\n\n**Showcase 1: Aesthetic Reward Optimization**\n\nComparison of SOAR vs Flow-GRPO vs SFT across training steps, optimizing for aesthetic quality on diverse prompts (historical scenes, fantasy art, character portraits).\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fshowcase_aesthetic.png\" width=\"95%\">\n\u003C\u002Fdiv>\n\n**Showcase 2: CLIPScore Reward Optimization**\n\nComparison on design and poster generation prompts, optimizing for text-image alignment (CLIPScore). SOAR demonstrates stronger text rendering and compositional fidelity.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fshowcase_clipscore.png\" width=\"95%\">\n\u003C\u002Fdiv>\n\n**Showcase 3: WebUI \u002F Design Generation**\n\nSOAR results on web UI and graphic design generation, showing accurate layout, typography, and visual hierarchy.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fshowcase_webui.png\" width=\"95%\">\n\u003C\u002Fdiv>\n\n## 📑 Open-Source Plan\n\n- HY-SOAR\n  - [x] Training code\n  - [x] Evaluation code\n\n## 🛠 Environment Setup\n\nOur implementation is based on the [DiffusionNFT](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FDiffusionNFT) and [Flow-GRPO](https:\u002F\u002Fgithub.com\u002Fyifan123\u002Fflow_grpo) codebases, with most environments aligned.\n\nClone this repository and install packages by:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FTencent-Hunyuan\u002FHY-SOAR.git\ncd HY-SOAR\n\nconda create -n hy-soar python=3.10.16\nconda activate hy-soar\npip install torch==2.6.0 torchvision==0.21.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu126\npip install -e .\nexport PYTHONPATH=$PWD\u002Fsora:$PYTHONPATH\n```\n\n**Base Model:** The training script expects [stabilityai\u002Fstable-diffusion-3.5-medium](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-3.5-medium) as the pretrained model. You need to accept the model license on Hugging Face and authenticate via `huggingface-cli login` before training.\n\n## 🎯 Reward Preparation\n\nOur supported reward models include [GenEval](https:\u002F\u002Fgithub.com\u002Fdjghosh13\u002Fgeneval), [OCR](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleOCR), [PickScore](https:\u002F\u002Fgithub.com\u002Fyuvalkirstain\u002FPickScore), [ClipScore](https:\u002F\u002Fgithub.com\u002Fopenai\u002FCLIP), [HPSv2.1](https:\u002F\u002Fgithub.com\u002Ftgxs002\u002FHPSv2), [Aesthetic](https:\u002F\u002Fgithub.com\u002Fchristophschuhmann\u002Fimproved-aesthetic-predictor), and [ImageReward](https:\u002F\u002Fgithub.com\u002Fzai-org\u002FImageReward). We additionally support `HPSv2.1` on top of FlowGRPO, and simplify `GenEval` from remote server to local.\n\n### 📦 Checkpoints Downloading\n\n```bash\nmkdir reward_ckpts\ncd reward_ckpts\n# Aesthetic\nwget https:\u002F\u002Fgithub.com\u002Fchristophschuhmann\u002Fimproved-aesthetic-predictor\u002Fraw\u002Frefs\u002Fheads\u002Fmain\u002Fsac+logos+ava1-l14-linearMSE.pth\n# GenEval\nwget https:\u002F\u002Fdownload.openmmlab.com\u002Fmmdetection\u002Fv2.0\u002Fmask2former\u002Fmask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco\u002Fmask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco_20220504_001756-743b7d99.pth\n# ClipScore\nwget https:\u002F\u002Fhuggingface.co\u002Flaion\u002FCLIP-ViT-H-14-laion2B-s32B-b79K\u002Fresolve\u002Fmain\u002Fopen_clip_pytorch_model.bin\n# HPSv2.1\nwget https:\u002F\u002Fhuggingface.co\u002Fxswu\u002FHPSv2\u002Fresolve\u002Fmain\u002FHPS_v2.1_compressed.pt\ncd ..\n```\n\n### 🧪 Reward Environments\n\n```bash\n# GenEval\npip install -U openmim\nmim install mmengine\ngit clone https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmcv.git\ncd mmcv; git checkout 1.x\nMMCV_WITH_OPS=1 FORCE_CUDA=1 pip install -e . -v\ncd ..\n\ngit clone https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection.git\ncd mmdetection; git checkout 2.x\npip install -e . -v\ncd ..\n\npip install open-clip-torch clip-benchmark\n\n# OCR\npip install paddlepaddle-gpu==2.6.2\npip install paddleocr==2.9.1\npip install python-Levenshtein\n\n# HPSv2.1\npip install hpsv2x==1.2.0\n\n# ImageReward\npip install image-reward\npip install git+https:\u002F\u002Fgithub.com\u002Fopenai\u002FCLIP.git\n```\n\n## 🚀 Usage\n\n### 🔥 Training\n\nThe default SOAR v4 training setup uses a single-node 8-GPU run on high-aesthetic data (`score >= 6.8`). The per-GPU batch size is 4, so the global batch size is 32. The default rollout configuration is ODE-only (`--num_rollout_paths 1`) with 6 auxiliary points and 40 sampling steps.\n\n```bash\nexport ACCELERATE_USE_DEEPSPEED=true\nexport ACCELERATE_DEEPSPEED_CONFIG_FILE=\u002Fpath\u002Fto\u002Fds_zero2_config.json\nexport ACCELERATE_DEEPSPEED_ZERO_STAGE=2\n\ntorchrun \\\n    --nnodes=1 \\\n    --node_rank=0 \\\n    --nproc_per_node=8 \\\n    --master_addr=localhost \\\n    --master_port=29522 \\\n    -m sora.train_soar_sd3_5m \\\n    --pretrained_model_name_or_path \u002Fpath\u002Fto\u002Fstable-diffusion-3.5-medium \\\n    --jsonl_path \u002Fpath\u002Fto\u002Fhigh_aesthetic.jsonl \\\n    --image_dir \u002Fpath\u002Fto\u002Fimages \\\n    --output_dir .\u002Foutput\u002Fsd3.5m_soar_high_aesthetic \\\n    --resolution 512 \\\n    --train_batch_size 4 \\\n    --gradient_accumulation_steps 1 \\\n    --max_train_steps 5000 \\\n    --checkpointing_steps 1000 \\\n    --seed 42 \\\n    --learning_rate 2e-5 \\\n    --lr_scheduler constant \\\n    --lr_warmup_steps 0 \\\n    --adam_weight_decay 1e-2 \\\n    --max_grad_norm 1.0 \\\n    --weighting_scheme logit_normal \\\n    --mixed_precision bf16 \\\n    --dataloader_num_workers 16 \\\n    --gradient_checkpointing \\\n    --allow_tf32 \\\n    --report_to tensorboard \\\n    --num_rollout_paths 1 \\\n    --trajectory_length 6 \\\n    --num_sampling_steps 40 \\\n    --sde_rollout_type flow_sde \\\n    --sde_noise_scale 0.5 \\\n    --lambda_aux 1.0 \\\n    --cfg_scale_sampling 4.5\n```\n\nDefault training parameters:\n\n| Parameter | Default | Description |\n| --- | --- | --- |\n| GPUs | 8 | Single-node training |\n| Global batch size | 32 | `--train_batch_size 4` x 8 GPUs x `--gradient_accumulation_steps 1` |\n| `--max_train_steps` | 5000 | Total optimization steps |\n| `--checkpointing_steps` | 1000 | Checkpoint interval |\n| `--seed` | 42 | Training seed |\n| `--learning_rate` | 2e-5 | AdamW learning rate |\n| `--lr_scheduler` | constant | Learning-rate schedule |\n| `--lr_warmup_steps` | 0 | Warmup steps |\n| `--adam_weight_decay` | 1e-2 | AdamW weight decay |\n| `--max_grad_norm` | 1.0 | Gradient clipping norm |\n| `--weighting_scheme` | logit_normal | Timestep weighting scheme |\n| `--mixed_precision` | bf16 | Mixed precision mode |\n| `--dataloader_num_workers` | 16 | Data loader workers |\n| `--lambda_aux` | 1.0 | Weight for auxiliary SOAR loss |\n| `--num_rollout_paths` | 1 | Number of rollout paths (1=ODE only) |\n| `--trajectory_length` | 6 | Auxiliary points per path |\n| `--sde_rollout_type` | flow_sde | Stochastic rollout mode: `cps`, `simple`, `sde`, `flow_sde` |\n| `--sde_noise_scale` | 0.5 | Noise scale for stochastic rollout |\n| `--cfg_scale_sampling` | 4.5 | CFG scale for rollout velocity |\n| `--num_sampling_steps` | 40 | Total sampling steps (determines step size) |\n\n### 📊 Running Evaluation\n\nGenerate images and save results with distributed inference:\n\n```bash\ntorchrun --nproc_per_node=8 -m sora.evaluation \\\n    --checkpoint_path .\u002Foutput\u002Fsoar_sd3_5m\u002Fcheckpoint-5000 \\\n    --model_type sd3 \\\n    --dataset geneval \\\n    --guidance_scale 4.5 \\\n    --resolution 512 \\\n    --mixed_precision bf16 \\\n    --save_images \\\n    --output_dir .\u002Feval_output\n```\n\nThe `--dataset` flag supports `geneval`, `ocr`, `pickscore`, and `drawbench`.\n\n## 📊 Evaluation\n\n### 📋 Main Results on DrawBench\n\nFollowing [Flow-GRPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.05470), we evaluate image quality and human preference scores on **DrawBench** prompts. Task-specific metrics (GenEval, OCR) are evaluated on their respective test sets. All models are trained at 512×512 with cfg=4.5.\n\n| Model | #Iter | GenEval | OCR | PickScore | ClipScore | HPSv2.1 | Aesthetic | ImgRwd |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| SD-XL (1024²) | – | 0.55 | 0.14 | 22.42 | 0.287 | 0.280 | 5.60 | 0.76 |\n| SD3.5-L (1024²) | – | 0.71 | 0.68 | 22.91 | 0.289 | 0.288 | 5.50 | 0.96 |\n| FLUX.1-Dev | – | 0.66 | 0.59 | 22.84 | 0.295 | 0.274 | 5.71 | 0.96 |\n| SD3.5-M | – | 0.63 | 0.59 | 22.34 | 0.285 | 0.279 | 5.36 | 0.85 |\n| + SFT | 10k | 0.70 | 0.64 | 22.71 | 0.295 | 0.284 | 5.35 | 1.04 |\n| **+ SOAR (Ours)** | **10k** | **0.78** | **0.67** | **22.86** | **0.295** | **0.289** | **5.46** | **1.09** |\n\nOn DrawBench and the corresponding GenEval\u002FOCR test sets from [Flow-GRPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.05470), SOAR raises SD3.5-Medium's GenEval score from 0.70 to 0.78 (+11% relative) and OCR accuracy from 0.64 to 0.67, all without any reward model during training.\n\n### 📈 Reward-Specific Training Dynamics\n\nIn head-to-head comparisons on DrawBench Aesthetic Score and ClipScore, SOAR's final scores not only surpass SFT but also outperform Flow-GRPO, which explicitly uses these metrics as its reward signal (Aesthetic: 5.94 vs. SFT 5.74 \u002F Flow-GRPO 5.87; ClipScore: 0.300 vs. SFT 0.297 \u002F Flow-GRPO 0.296).\n\n\u003Cp align=\"center\">\n  \u003Ca href=\".\u002Fassets\u002Freward_curves_seaborn_preview.pdf\">\n    \u003Cimg src=\".\u002Fassets\u002Freward_curves_seaborn_preview.png\" width=\"85%\" alt=\"Reward curves for Aesthetic Score and ClipScore\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n## 🧾 Data Format\n\nThe training script expects a JSONL file where each line contains:\n\n```json\n{\"md5\": \"image_hash\", \"caption_en\": \"A photo of ...\", \"bw\": 512, \"bh\": 512}\n```\n\n- `md5`: Image filename (without extension). Images are stored as `{md5}.jpg` in `--image_dir`.\n- `caption_en`: English text prompt.\n- `bw`, `bh`: Bucket width and height (the resolution to resize\u002Fcrop the image to).\n\n## 📚 Citation\n\nIf you find HY-SOAR useful in your research, please cite our work:\n\n```bibtex\n@article{hy-soar,\n  title={SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models},\n  author={Qin, You and Wang, Linqing and Fei, Hao and Zimmermann, Roger and Bo, Liefeng and Lu, Qinglin and Wang, Chunyu},\n  journal={arXiv preprint arXiv:2604.12617},\n  year={2026},\n  eprint={2604.12617},\n  archivePrefix={arXiv},\n  primaryClass={cs.LG}\n}\n```\n\n## 🙏 Acknowledgement\n\nWe thank the [DiffusionNFT](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FDiffusionNFT) repository, the [Flow-GRPO](https:\u002F\u002Fgithub.com\u002Fyifan123\u002Fflow_grpo) project, and Hugging Face [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) for their open-source codebases.\n","HY-SOAR 是一种用于扩散模型的无奖励后训练方法，旨在纠正生成过程中的轨迹误差。其核心功能包括在生成过程中实时自我修正，无需依赖奖励模型、偏好标签或负样本。通过引入一个密集且无奖励的训练信号，HY-SOAR 能够监督去噪器，并纠正模型自身预测产生的状态误差。这种方法特别适用于需要高精度和稳定性的图像生成任务，如高质量图像合成和编辑等场景。",2,"2026-06-11 02:41:18","CREATED_QUERY"]