[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75713":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":13,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":22,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":13,"lastSyncTime":27,"discoverSource":28},75713,"Flow-OPD","CostaliyA\u002FFlow-OPD","CostaliyA","Official Repo of \"Flow-OPD: On-Policy Distillation for Flow Matching Models\"","https:\u002F\u002Fcostaliya.github.io\u002FFlow-OPD\u002F",null,"Python",227,2,7,0,5,114,53.93,"MIT License",false,"main",true,[],"2026-06-11 04:06:25","# Flow-OPD: On-Policy Distillation for Flow Matching Models\n\n\n\n\u003Cdiv align=\"center\">\n\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🌐_Project_WebPage-green)](https:\u002F\u002Fcostaliya.github.io\u002FFlow-OPD\u002F)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📄_Paper-arXiv:2605.08063-red)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.08063)\n[![Code](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🚀_Code-GitHub-blue)](https:\u002F\u002Fgithub.com\u002FCostaliyA\u002FFlow-OPD)\n[![Model](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗_Model-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002FCostaliyA\u002FFlow-OPD)\n\n> **Flow-OPD** integrates On-Policy Distillation into the Flow Matching pipeline, replacing sparse scalar rewards with dense, trajectory-level, multi-teacher vector field supervision. Evaluated on SD-3.5-Medium, Flow-OPD achieves **+18pt average improvement** over vanilla GRPO and surpasses individual teacher models on OCR and DeQA.\n\n\u003C\u002Fdiv>\n\n---\n## 🚀 Quick Started\n### 1. Environment Set Up\nClone this repository and install packages.\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FCostaliyA\u002FFlow-OPD.git\ncd Flow_OPD\nconda create -n flow_grpo python=3.10.16\npip install -e .\n```\n\n### 2. Model Download\nTo avoid redundant downloads and potential storage waste during multi-GPU training, please pre-download the required models in advance.\n\n**Models**\n* **SD3.5**: `stabilityai\u002Fstable-diffusion-3.5-medium`\n* **Flux**: `black-forest-labs\u002FFLUX.1-dev`\n* **GenEval Teacher**: `jieliu\u002FSD3.5M-FlowGRPO-GenEval`\n* **OCR Teacher**: `jieliu\u002FSD3.5M-FlowGRPO-Text`\n* **PickScore Teacher**: `jieliu\u002FSD3.5M-FlowGRPO-PickScore`\n\n**Reward Models**\n* **PickScore**:\n  * `laion\u002FCLIP-ViT-H-14-laion2B-s32B-b79K`\n  * `yuvalkirstain\u002FPickScore_v1`\n* **CLIPScore**: `openai\u002Fclip-vit-large-patch14`\n* **Aesthetic Score**: `openai\u002Fclip-vit-large-patch14`\n\n\n### 3. Reward Preparation\nThe steps above only install the current repository. Since each reward model may rely on different versions, combining them in one Conda environment can cause version conflicts. To avoid this, we adopt a remote server setup inspired by ddpo-pytorch. You only need to install the specific reward model you plan to use.\n\n#### GenEval\nPlease create a new Conda virtual environment and install the corresponding dependencies according to the instructions in [reward-server](https:\u002F\u002Fgithub.com\u002Fyifan123\u002Freward-server).\n\n#### OCR\nPlease install paddle-ocr:\n```bash\npip install paddlepaddle-gpu==2.6.2\npip install paddleocr==2.9.1\npip install python-Levenshtein\n```\nThen, pre-download the model using the Python command line:\n```python\nfrom paddleocr import PaddleOCR\nocr = PaddleOCR(use_angle_cls=False, lang=\"en\", use_gpu=False, show_log=False)\n```\n\n#### Pickscore\nPickScore requires no additional installation. Note that the original [pickscore](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fyuvalkirstain\u002Fpickapic_v1) dataset corresponds to `dataset\u002Fpickscore` in this repository, containing some NSFW prompts. We strongly recommend using [pickapic\\_v1\\_no\\_images\\_training\\_sfw](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FCarperAI\u002Fpickapic_v1_no_images_training_sfw), the SFW version of the Pick-a-Pic dataset, which corresponds to `dataset\u002Fpickscore_sfw` in this repository.\n\n#### DeQA\nPlease create a new Conda virtual environment and install the corresponding dependencies according to the instructions in [reward-server](https:\u002F\u002Fgithub.com\u002Fyifan123\u002Freward-server).\n\n#### UnifiedReward\nSince `sglang` may conflict with other environments, we recommend creating a new conda environment.\n```bash\nconda create -n sglang python=3.10.16\nconda activate sglang\npip install \"sglang[all]\"\n```\nWe use sglang to deploy the reward service. After installing sglang, please run the following command to launch UnifiedReward:\n```bash\npython -m sglang.launch_server --model-path CodeGoat24\u002FUnifiedReward-7b-v1.5 --api-key flowgrpo --port 17140 --chat-template chatml-llava --enable-p2p-check --mem-fraction-static 0.85\n```\n#### ImageReward\nPlease install imagereward:\n```bash\npip install image-reward\npip install git+https:\u002F\u002Fgithub.com\u002Fopenai\u002FCLIP.git\n```\n#### QwenVL score\nPlease create a new Conda virtual environment with vllm:\n```bash\npip install vllm\nbash scripts\u002Fsingle_node\u002Frun_qwen_model.sh\n```\nand then change Line 130 (base_url) in rewards.py\n\n### 4. Dataset Preparation\n\n> **Note:** All training and evaluation prompts are located in the `dataset\u002F` folder. Training prompts follow the format used in [flow-grpo](https:\u002F\u002Fgithub.com\u002Fyifan123\u002Fflow_grpo), and evaluation prompts follow [T2I-CompBench](https:\u002F\u002Fgithub.com\u002FKarine-Huang\u002FT2I-CompBench).\n\n### 5. Start Training\n#### 5.0 Cold Start (optional)\nBefore training, you can merge multiple expert LoRAs into a single cold-start LoRA to accelerate convergence:\n```bash\nbash scripts\u002Fsingle_node\u002Fmerge.sh\n```\nAfter merging, set the merged LoRA path in the training config:\n```python\nconfig.train.lora_path = \"path\u002Fto\u002Fmerged\u002Flora\"\n```\n#### 5.1 GRPO-mix\nFirst, the GenEval rewarder and deqa services need to be deployed on other nodes.\n```bash\n# Master node\nbash scripts\u002Fmulti_node\u002Fsd3_mix.sh 0\n# Other nodes\nbash scripts\u002Fmulti_node\u002Fsd3_mix.sh 1\nbash scripts\u002Fmulti_node\u002Fsd3_mix.sh 2\nbash scripts\u002Fmulti_node\u002Fsd3_mix.sh 3\n```\n\n#### 5.2 Flow-OPD\n```bash\n# Single-teacher OPD (local single-node, single GPU or multi-GPU)\nbash scripts\u002Fsingle_node\u002Fsd3_opd_example.sh\n\n# Multi-teacher OPD (local single-node, multi-GPU)\nbash scripts\u002Fsingle_node\u002Fsd3_opd_mix_local.sh\n```\n- **Single-teacher**: Uses a single `kl_ref_lora_path` reference for OPD KL reward.\n- **Multi-teacher**: Uses `alternate` training mode with per-dataset `kl_ref_lora_path` — each dataset (e.g., OCR, GenEval) uses its own teacher LoRA. Currently configured with 8 GPUs in `mix_opd_8gpu`. Reduce `num_processes` in the shell script and adjust batch sizes in `config\u002Fgrpo.py:mix_opd_8gpu` for fewer GPUs.\n\n## 📊 Evaluation\n\nThis section describes how to evaluate your trained LoRA model on **T2I-CompBench**, based on the evaluation pipeline from [STAGE](https:\u002F\u002Fgithub.com\u002Fkrennic999\u002FSTAGE).\n\n### 1. Generate Images\n\nFirst, run `run_eval.sh` to generate images for all T2I-CompBench categories:\n\n```bash\nbash scripts\u002Fsingle_node\u002Frun_eval.sh\n```\n\nModify `run_eval.sh` to set your LoRA path and output directory:\n\n```bash\ntorchrun --nproc_per_node=8 scripts\u002Feval_t2icompbench.py \\\n    --lora \"path\u002Fto\u002Fyour\u002Flora\" \\\n    --benchmark t2i_compbench \\\n    --output_dir .\u002Feval_results\u002Fcompbench_images\n```\n\nImages will be saved under `{output_dir}\u002F{category}\u002Fsamples\u002F`.\n\n### 2. Install T2I-CompBench\n\nClone the T2I-CompBench repository and install its dependencies:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FKarine-Huang\u002FT2I-CompBench.git\ncd T2I-CompBench\n# Follow the installation instructions in their repository\n```\n\n### 3. Score Images\n\nSet `T2I_COMP_CODE_ROOT` in `cal_t2i_compbench_value.sh` to point to the cloned T2I-CompBench folder:\n\n```bash\nT2I_COMP_CODE_ROOT=\"\u002Fpath\u002Fto\u002FT2I-CompBench\"\n```\n\nThen run the scoring script:\n\n```bash\nbash cal_t2i_compbench_value.sh\n```\n\nResults for each category will be saved as txt files under the corresponding annotation directories.\n\n---\n\n## 🎯 Key Results\n\n| Model | GenEval | OCR Acc. | DeQA | PickScore | Average |\n|---|---|---|---|---|---|\n| SD-3.5-M (base) | 0.63 | 0.59 | 4.07 | 21.64 | 0.72 |\n| GRPO-Mix (best baseline) | 0.73 | 0.83 | 4.33 | 21.84 | 0.82 |\n| **Flow-OPD (Merge Init)** | **0.92** | **0.94** | **4.35** | **23.08** | **0.90** |\n\n- ✨ **+18pt** average improvement over base model\n- 🚀 **+8pt** improvement over GRPO-Mix (best baseline)\n- 📊 **0.92** GenEval score (base: 0.63)\n- 📝 **0.94** OCR accuracy (base: 0.59)\n\n---\n\n## 🔬 Method Overview\n\nFlow-OPD decouples expertise acquisition from model unification through a two-stage process:\n\n1. **🧊 Cold Start Initialization** — SFT or Model Merging to initialize the student model\n2. **👨‍🏫 Multi-Teacher On-Policy Distillation** — Dense vector field supervision from multiple teachers\n\nThe key innovations include:\n\n- **⚡ On-Policy Sampling (SDE)**: Stochastic exploration via SDE for diverse trajectory sampling\n- **🔀 Multi-Teacher Dense Labeling**: Each teacher (GenEval, OCR, DeQA, PickScore) acts as a Generative Reward Model returning a full vector field\n- **🎨 MAR (Manifold Anchor Regularization)**: KL regularization from a frozen aesthetic teacher prevents aesthetic degradation\n\n---\n\n## 📋 Todo List\nThe code is being gradually open-sourced, optimized, and refactored. Please feel free to contact me if you have any questions.\n\n### 🔄 In Progress\n\n- [ ] Release full training code\n\n### ✅ Completed\n\n- [x] Release model weights ([HuggingFace](https:\u002F\u002Fhuggingface.co\u002FCostaliyA\u002FFlow-OPD))\n- [x] Release paper ([arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.08063))\n\n---\n\n## 🎨 Qualitative Results\n\n### Overview\n\n![Teaser](assets\u002Fteaser.png)\n\n### Comparison\n\n![Comparison](assets\u002Fcompare.png)\n\n### More Results (1\u002F3)\n\n![More Results 1](assets\u002Fmore1.png)\n\n### More Results (2\u002F3)\n\n![More Results 2](assets\u002Fmore2.png)\n\n### More Results (3\u002F3)\n\n![More Results 3](assets\u002Fmore3.png)\n\n---\n\n## 📚 Citation\n\n```bibtex\n@article{fang2026flow,\n  title={Flow-OPD: On-Policy Distillation for Flow Matching Models},\n  author={Fang, Zhen and Huang, Wenxuan and Zeng, Yu and Zhao, Yiming and Chen, Shuang and Feng, Kaituo and Lin, Yunlong and Chen, Lin and Chen, Zehui and Cao, Shaosheng and others},\n  journal={arXiv preprint arXiv:2605.08063},\n  year={2026}\n}\n```\n\n---\n\n## 🙏 Acknowledgements\n\nThis repo is based on [flow-grpo](https:\u002F\u002Fgithub.com\u002Fyifan123\u002Fflow_grpo). We also build upon [STAGE](https:\u002F\u002Fgithub.com\u002Fkrennic999\u002FSTAGE) for T2I-CompBench evaluation. We thank the authors for their valuable contributions to the AIGC community.\n","Flow-OPD 是一个将在线策略蒸馏集成到流匹配模型中的项目。它通过使用密集的、基于轨迹的多教师向量场监督来替代稀疏的标量奖励，从而优化了流匹配过程。该项目采用 Python 语言编写，并在 SD-3.5-Medium 上实现了相较于基础 GRPO 模型平均提升 18 个百分点的效果，同时在 OCR 和 DeQA 任务上也超越了单个教师模型的表现。Flow-OPD 适合需要高精度和高效训练的场景，如生成对抗网络（GANs）的改进、自然语言处理中的文本生成以及图像识别等应用领域。","2026-06-11 03:53:08","CREATED_QUERY"]