[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80889":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":22,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":30,"readmeContent":31,"aiSummary":32,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":33,"discoverSource":34},80889,"AutoRubric-as-Reward","OpenEnvision\u002FAutoRubric-as-Reward","OpenEnvision","Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria","https:\u002F\u002Fopenenvision.github.io\u002FAutoRubric-as-Reward\u002F",null,"Python",36,4,34,1,0,2,2.1,"Apache License 2.0",false,"main",true,[24,25,26,27,28,29],"imageediting","mllm","reward","rl","rubric","t2i","2026-06-12 02:04:08","\u003Ch1 align=\"center\">Auto-Rubric as Reward\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.08354v1\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-ARR-B31B1B?logo=arxiv&logoColor=white\" alt=\"arXiv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FOpenEnvisionLab\u002FAuto-Rubric-as-Reward\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-ARR-FFD21E?logo=huggingface&logoColor=white\" alt=\"Hugging Face\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fopenenvision.github.io\u002FAutoRubric-as-Reward\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Website-2F6FED?logo=googlechrome&logoColor=white\" alt=\"Project Website\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg\" alt=\"License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  The official implementation for \u003Cstrong>Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#what-this-repo-does\">Overview\u003C\u002Fa> |\n  \u003Ca href=\"#quick-start\">Quick Start\u003C\u002Fa> |\n  \u003Ca href=\"#auto-rubric-documentation\">Auto-Rubric Docs\u003C\u002Fa> |\n  \u003Ca href=\"#training\">Training\u003C\u002Fa> |\n  \u003Ca href=\"#acknowledgements\">Acknowledgements\u003C\u002Fa>\n\u003C\u002Fp>\n\n## What This Repo Does\n\nAuto-Rubric provides a compact implementation of Auto-Rubric as Reward for visual generation. It turns a small set of labeled visual preference examples into explicit, inspectable rubric text, then uses a frozen VLM judge conditioned on those rubrics to produce pairwise rewards for RPO.\n\n```text\nlabeled visual pairs\n  -> auto-generate rubrics\n  -> verify and revise criteria\n  -> structure\u002Freuse rubric text\n  -> VLM judge returns ranks\n  -> RPO receives pairwise rewards\n```\n\nThis release focuses on:\n\n\n| Area          | Included                                                                        |\n| ------------- | ------------------------------------------------------------------------------- |\n| Auto-Rubric   | Generation, verification, revision, categorization, grading, reward conversion. |\n| Text-to-image | FLUX.1-dev LoRA RPO with pairwise ARR rewards.                                  |\n| Image editing | Qwen-Image-Edit LoRA RPO with source-image-aware pairwise ARR rewards.          |\n| VLM judging   | OpenAI-compatible local or hosted vision endpoints.                             |\n\n\nLarge checkpoints, processed embeddings, and training outputs are intentionally not committed.\n\n## Key Features\n\n- **Explicit reward criteria**: The \"reward model\" is readable rubric text rather than a hidden scalar model.\n- **Verifiable generation loop**: Candidate rubrics are checked against labeled examples and revised when they fail.\n- **Pairwise visual rewards**: Rank 1 receives `1.0`; rank 2 receives `-0.1` for RPO.\n- **T2I and edit support**: Prompt-only FLUX and source-image-aware Qwen-Image-Edit paths are both wired.\n- **Reusable rubric files**: Generate rubrics once, inspect them, and reuse the same file for deterministic training launches.\n- **OpenAI-compatible VLMs**: Use local Qwen3-VL through vLLM or hosted GPT\u002FGemini-compatible endpoints.\n\n## Repository Map\n\n\n| Path                               | Purpose                                                                             |\n| ---------------------------------- | ----------------------------------------------------------------------------------- |\n| `judger.py`                        | CLI and Python entry point for rubric generation, evaluation, and reward tensors.   |\n| `rubric_pipeline\u002F`                 | Auto-Rubric prompts, VLM graders, model client, categorization, and utilities.      |\n| `fastvideo\u002Ftrain_rpo_flux.py`      | FLUX RPO training with ARR rewards.                                                 |\n| `fastvideo\u002Ftrain_rpo_qwen_edit.py` | Qwen-Image-Edit RPO training with ARR rewards.                                      |\n| `scripts\u002Fpreprocess\u002F`              | Embedding preprocessing for FLUX and Qwen-Image-Edit.                               |\n| `scripts\u002Ffinetune\u002F`                | 8-GPU launcher examples with paper-aligned RPO defaults.                            |\n| `docs\u002Fauto_rubric\u002F`                | Detailed Auto-Rubric guide: VLM choice, rubric design, reuse, workflows, debugging. |\n\n\n## Quick Start\n\nCreate the environment:\n\n```bash\ncd \u002Fpath\u002Fto\u002FAutoRubric-as-Reward\nconda create -n autorubric-as-reward python=3.10 -y\nconda activate autorubric-as-reward\nbash env_setup.sh\n```\n\nIf you already installed a different CUDA\u002FPyTorch stack, install the repo dependencies directly:\n\n```bash\npip install -r requirements.txt\npip install -e .\n```\n\nCreate the expected data folder:\n\n```bash\nmkdir -p data rubric_pipeline\u002Frubrics\n```\n\nDownload base models as needed:\n\n\n| Model           | Local path            | Link                                                                                                       |\n| --------------- | --------------------- | ---------------------------------------------------------------------------------------------------------- |\n| FLUX.1-dev      | `data\u002Fflux`           | [https:\u002F\u002Fhuggingface.co\u002Fblack-forest-labs\u002FFLUX.1-dev](https:\u002F\u002Fhuggingface.co\u002Fblack-forest-labs\u002FFLUX.1-dev) |\n| Qwen-Image-Edit | `data\u002Fqwenimage_edit` | [https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit)                 |\n| Qwen3-VL judge  | local or HF cache     | [https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Instruct](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Instruct)       |\n\n\n## Start A VLM Judge\n\nAuto-Rubric talks to an OpenAI-compatible vision API.\n\nLocal Qwen3-VL:\n\n```bash\nMODEL_PATH=Qwen\u002FQwen3-VL-8B-Instruct TP_SIZE=1 PORT=8000 \\\n  bash rubric_pipeline\u002Fvllm_serve.sh\n\nexport OPENAI_API_KEY=EMPTY\n```\n\nHosted endpoint examples:\n\n```yaml\nmodel_name: \"gpt-5\"\nbase_url: \"https:\u002F\u002Fapi.openai.com\u002Fv1\"\napi_key: \"${OPENAI_API_KEY}\"\n```\n\n```yaml\nmodel_name: \"gemini-3.1-pro\"\nbase_url: \"https:\u002F\u002Fgenerativelanguage.googleapis.com\u002Fv1beta\u002Fopenai\u002F\"\napi_key: \"${GEMINI_API_KEY}\"\n```\n\nMore guidance: [VLM Selection](docs\u002Fauto_rubric\u002Fvlm_selection.md).\n\n## Generate And Test Rubrics\n\nText-to-image:\n\n```bash\npython judger.py \\\n  --config_path rubric_pipeline\u002Fconfig\u002Fqwen3vl_8B_instruct_t2i.yaml \\\n  --seed_dataset examples\u002Fseed_t2i_pairwise.json \\\n  --test_dataset examples\u002Ftest_t2i_pairwise.json \\\n  --base_url http:\u002F\u002Flocalhost:8000\u002Fv1 \\\n  --concurrency_limit 4\n```\n\nImage editing:\n\n```bash\npython judger.py \\\n  --config_path rubric_pipeline\u002Fconfig\u002Fqwen3vl_8B_instruct_edit.yaml \\\n  --seed_dataset examples\u002Fseed_edit_pairwise.json \\\n  --test_dataset examples\u002Ftest_edit_pairwise.json \\\n  --base_url http:\u002F\u002Flocalhost:8000\u002Fv1 \\\n  --concurrency_limit 4\n```\n\nFor long runs, save the generated rubric text to `rubric_pipeline\u002Frubrics\u002F*.txt` and load it through `rubrics_file`. See [Rubric Reuse](docs\u002Fauto_rubric\u002Frubric_reuse.md).\n\n## Training\n\nFLUX:\n\n```bash\nbash scripts\u002Fpreprocess\u002Fpreprocess_flux_rl_embeddings.sh\nbash scripts\u002Ffinetune\u002Ffinetune_flux_rpo_8gpus.sh\n```\n\nQwen-Image-Edit:\n\n```bash\nbash scripts\u002Fpreprocess\u002Fpreprocess_qwen_image_edit_rl_embeddings.sh\nbash scripts\u002Ffinetune\u002Ffinetune_qwen_image_edit_rpo_8gpus.sh\n```\n\nThe launchers use:\n\n\n| Task            | LR     | Steps | Clip  | KL     | LoRA    |\n| --------------- | ------ | ----- | ----- | ------ | ------- |\n| FLUX T2I        | `5e-5` | `8`   | `0.2` | `0.01` | rank 16 |\n| Qwen-Image-Edit | `1e-5` | `10`  | `0.2` | `0.02` | rank 32 |\n\n\nPairwise RPO expects `--use_arr`, `--num_generations 2`, and `--use_group`.\n\n## Auto-Rubric Documentation\n\n\n| Guide                                                  | Covers                                                             |\n| ------------------------------------------------------ | ------------------------------------------------------------------ |\n| [Overview](docs\u002Fauto_rubric\u002Foverview.md)               | Method summary and paper-to-code map.                              |\n| [VLM Selection](docs\u002Fauto_rubric\u002Fvlm_selection.md)     | Local vs hosted judges, JSON reliability, latency, cost.           |\n| [Rubric Design](docs\u002Fauto_rubric\u002Frubric_design.md)     | Seed pair selection, task descriptions, good\u002Fbad rubric patterns.  |\n| [Rubric Reuse](docs\u002Fauto_rubric\u002Frubric_reuse.md)       | Saved rubric files, versioning, validation before training.        |\n| [Workflows](docs\u002Fauto_rubric\u002Fworkflows.md)             | End-to-end commands for generation, testing, saving, and training. |\n| [Troubleshooting](docs\u002Fauto_rubric\u002Ftroubleshooting.md) | Common failures and fixes.                                         |\n| [Data Formats](docs\u002Fdata_formats.md)                   | JSON\u002FJSONL layouts accepted by the vision utilities.               |\n| [Training Guide](docs\u002Ftraining.md)                     | Preprocess and RPO launch details.                                 |\n\n\n## Acknowledgements\n\nThis code builds on and is inspired by:\n\n- [DanceGRPO](https:\u002F\u002Fgithub.com\u002FXueZeyue\u002FDanceGRPO)\n- [FastVideo](https:\u002F\u002Fgithub.com\u002Fhao-ai-lab\u002FFastVideo)\n- [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)\n- [OpenJudge](https:\u002F\u002Fgithub.com\u002Fagentscope-ai\u002FOpenJudge)\n\n## Citation\n\n```bibtex\n@misc{tian2026autorubricrewardimplicitpreferences,\n      title={Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria}, \n      author={Juanxi Tian and Fengyuan Liu and Jiaming Han and Yilei Jiang and Yongliang Wu and Yesheng Liu and Haodong Li and Furong Xu and Wanhua Li},\n      year={2026},\n      eprint={2605.08354},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.08354}, \n}\n```\n","Auto-Rubric as Reward 项目旨在将少量标记的视觉偏好示例转换为明确可检查的标准文本，进而通过冻结的视觉语言模型（VLM）判断这些标准以生成成对奖励。其核心功能包括从隐式偏好到显式生成标准的自动转换、基于这些标准进行验证和修订的过程以及使用这些标准为强化学习策略优化（RPO）提供奖励。技术特点涵盖自动生成且可复用的评估标准文件、支持文本到图像生成及图像编辑任务，并兼容OpenAI风格的本地或托管视觉端点。该项目适用于需要将人类偏好转化为机器可理解的生成规则的场景，如高质量图像生成与编辑等。","2026-06-11 04:02:41","CREATED_QUERY"]