[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80730":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":13,"stars7d":12,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":14,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":15,"starSnapshotCount":15,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},80730,"Flash-GRPO","Shredded-Pork\u002FFlash-GRPO","Shredded-Pork","[ICML 2026] Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization",null,"Python",49,4,1,3,0,6,2.1,false,"main",true,[],"2026-06-12 02:04:06","\u003Cdiv align=\"center\" style=\"font-family: charter;\">\n\n\u003Ch1>🦕 Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization\u003C\u002Fh1>\n\n\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15980\" target=\"_blank\">\n    \u003Cimg alt=\"arXiv\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Flash--GRPO-red?logo=arxiv\" height=\"20\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fshredded-pork.github.io\u002FFlash-GRPO.github.io\u002F\" target=\"_blank\">\n    \u003Cimg alt=\"Website\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F💻_Project-Flash--GRPO-blue.svg\" height=\"20\" \u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n**Flash-GRPO**, a single-step training framework that outperforms full trajectory training in alignment quality under low computational budgets while substantially improving training efficiency.\n\n\u003Cdiv style=\"text-align: center;\">\n    \u003Cimg src=\"asset\u002Fteaser.png\" alt=\"LOGO\">\n\u003C\u002Fdiv>\n\n\u003Cdiv style=\"text-align: center;\">\n    \u003Cimg src=\"asset\u002Fmethod.png\" alt=\"LOGO\">\n\u003C\u002Fdiv>\n\n## 🗺️ Roadmap for Flash-GRPO\n> Flash-GRPO, a single-step training framework that outperforms full trajectory trainingin alignment quality under low computational budgets while substantially improving training efficiency. Flash-GRPO addresses two critical challenges: iso-temporal grouping eliminates timestep-confounded variance by enforcing prompt-wise temporal consistency, decoupling policy performance\nfrom timestep difficulty; temporal gradient rectification neutralizes the time-dependent scaling factor that causes vastly inconsistent gradient magnitudes across timesteps. Experiments on 1.3B to 14B parameter models validate Flash-GRPO’s effectiveness, demonstrating substantial training acceleration with consistent stability and state-of-the-art alignment qualit\n> \n> Welcome Ideas and Contributions. Stay tuned!\n\n## 🆕 News\n\n> We have presented a single-step training framework, **Flash-GRPO**.\n- **[2026-05-11]**  We release the code of our paper, and we will release a 8 gpus version of Flash-GRPO (can achieve the same performance, and only need ~40hours). 🔥🔥🔥\n- **[2026-05-28]**  we have released a 8 gpus (~40 hours) version of Flash-GRPO (The reward curve is as following) !\n\u003Cp align=\"center\">\n  \u003Cimg src=\"asset\u002Ftrain.jpg\" width=\"48%\" \u002F>\n  \u003Cimg src=\"asset\u002Feval.jpg\" width=\"48%\" \u002F>\n\u003C\u002Fp>\n\n\n## 📕 Training & Evaluation\n### Preparation\nDownload the reward model [HPSV3](https:\u002F\u002Fgithub.com\u002FMizzenAI\u002FHPSv3) and base model [Wan2.1-1.3B](https:\u002F\u002Fhuggingface.co\u002FWan-AI\u002FWan2.1-T2V-1.3B-Diffusers).\n\n### Training\n#### Reward server\n```bash\ncd flow_grpo\u002Freward-server\ngunicorn \"app_hpsv3:create_app()\" \n```\n#### Wan2.1-1.3B\n```bash\n# Flash-GRPO 96GPUs\nbash scripts\u002Fmulti_node\u002Ftrain_wan2_1_flash.sh\n```\n#### Wan2.1-1.3B-1node\n```bash\n# Flash-GRPO 8GPUs\nbash scripts\u002Fmulti_node\u002Ftrain_wan2_1_flash_1node.sh\n```\n\n## 📊 Experimental Performance\n\u003Cimg src=\"asset\u002Fexp1.png\" alt=\"Performance\" width=\"800\"\u002F>\n\n## 📺 Visualization\n\u003Cimg src=\"asset\u002Fvis.png\" alt=\"Visualization\" width=\"1024\"\u002F> \n\n- For more details please read our paper.\n\n# Acknowledgements\n[Flow-GRPO](https:\u002F\u002Fgithub.com\u002Fyifan123\u002Fflow_grpo): The first method integrating online reinforcement learning (RL) into flow matching models.\n","Flash-GRPO 是一个针对视频扩散模型的单步策略优化框架，旨在提高对齐质量和训练效率。该项目通过引入等时分组和时间梯度校正技术，有效解决了时间步长带来的方差问题，并在低计算成本下实现了比全轨迹训练更好的效果。Flash-GRPO 适用于需要高效训练大规模视频生成模型的场景，特别是在计算资源有限的情况下。实验结果表明，该方法在1.3B到14B参数规模的模型上均表现出色，显著加速了训练过程同时保持了良好的稳定性和最先进的对齐质量。",2,"2026-06-11 04:01:48","CREATED_QUERY"]