[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82047":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":13,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},82047,"Qwen-Image-Bench","QwenLM\u002FQwen-Image-Bench","QwenLM",null,"Python",84,6,33,1,0,14,42,12,2.54,"Apache License 2.0",false,"main",[],"2026-06-12 02:04:22","# Qwen-Image-Bench\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2605.28091\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-b31b1b?logo=arxiv\" alt=\"Paper\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQwen\u002FQwen-Image-Bench\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDataset-HuggingFace-ffd21e?logo=huggingface\" alt=\"Dataset\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Bench\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJudge_Model-HuggingFace-ffd21e?logo=huggingface\" alt=\"Model\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.modelscope.cn\u002Fdatasets\u002FQwen\u002FQwen-Image-Bench\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDataset-ModelScope-624aff?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMjIyIiBoZWlnaHQ9IjIyMiIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJNNzEuNTU2IDcyLjg4OGgzOC42Njd2MzguNjY3SDcxLjU1NnpNMTExLjU1NiAxMTIuODg4aDM4LjY2N3YzOC42NjdoLTM4LjY2N3pNNzEuNTU2IDExMi44ODhoMzguNjY3djM4LjY2N0g3MS41NTZ6IiBmaWxsPSIjNjI0QUZGIi8+PC9zdmc+\" alt=\"ModelScope\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue\" alt=\"License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\nAn evaluation toolkit for text-to-image (T2I) generation models. It uses a fine-tuned **Q-Judger** (Qwen3.6-27B) to score generated images across **5 hierarchical dimensions** (Quality, Aesthetics, Alignment, Real-world Fidelity, Creative Generation) covering **56 fine-grained facets**.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fshow_case.png\" alt=\"Qwen-Image-Bench dimension framework and representative model outputs\">\n\u003C\u002Fp>\n\n## Key Features\n\n- **Evaluate any T2I model** — run the judge model on your own generated images and get structured, multi-dimensional scores\n- **Compute scores from pre-generated responses** — reproduce the leaderboard from the released benchmark dataset\n- **Powered by ms-swift** — uses the same inference setup that produced the benchmark responses\n\n## Quick Start\n\n```bash\n# 1. Clone the repo\ngit clone https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen-Image-Bench.git\ncd Qwen-Image-Bench\n\n# 2. Install dependencies\nuv venv myenv --python 3.11 && source myenv\u002Fbin\u002Factivate\n# Install PyTorch first: https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F\nuv pip install -r requirements.txt\n\n# 3. Run judge on your images\npython judge.py \\\n  --input your_data.jsonl \\\n  --model Qwen\u002FQwen-Image-Bench\n```\n\nYour input file should be a CSV\u002FJSON\u002FJSONL with three columns:\n\n| Column | Type | Description |\n|--------|------|-------------|\n| `ID` | int | Prompt identifier (1–1000), must match [benchmark metadata](metadata\u002Fbench_metadata.json) |\n| `prompt` | str | The text prompt used to generate the image |\n| `image_path` | str | Path to the generated image file |\n\n\n## Installation\n\n\n### Step-by-step\n\n**1. Create and activate a virtual environment:**\n\n```bash\nuv venv myenv --python 3.11\nsource myenv\u002Fbin\u002Factivate\n```\n\n**2. Install PyTorch** (select the command matching your CUDA version):\n\nSee the official guide: [https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F)\n\n**3. Install Python dependencies:**\n\n```bash\nuv pip install -r requirements.txt\n```\n\nThis installs all required dependencies including ms-swift.\n\n\n## Usage\n\n### Evaluate Your Own T2I Model (`judge.py`)\n\n#### Run Judge Inference\n\n```bash\npython judge.py \\\n  --input your_data.jsonl \\\n  --model Qwen\u002FQwen-Image-Bench\n```\n\n#### CLI Options\n\n| Argument | Default | Description |\n|----------|---------|-------------|\n| `--input` | *(required)* | Input CSV\u002FJSON\u002FJSONL with `ID`, `prompt`, `image_path` |\n| `--model` | *(required)* | HuggingFace model ID or local path |\n| `--hf-bench-repo` | — | HF dataset repo for bench metadata |\n| `--local-metadata` | — | Local metadata file path (overrides default) |\n| `--max-batch-size` | 24 | ms-swift `PtEngine` max_batch_size |\n| `--max-new-tokens` | 4096 | Max generation tokens |\n\n#### Output Files\n\nAfter running `judge.py`, three files are written next to your input:\n\n| File | Contents |\n|------|----------|\n| `\u003Cinput>_judged.{jsonl,csv}` | Per-row results: original fields + `judge_model_output` (combined raw scores JSON) + `\u003Cdim>_judge_output` (raw judge text per L1 dimension) |\n| `\u003Cinput>_bench_scores.json` | Aggregated scores: `level1`, `level2`, `total` |\n| `\u003Cinput>_bench_scores.xlsx` | Same scores in Excel: `Level-1 Summary` sheet + one sheet per L1 dimension with L2 detail |\n\n### Compute Scores from Pre-generated Responses (`compute_scores.py`)\n\n\n```bash\n# From local file\npython compute_scores.py --input qwen_image_bench_hf_v0518.jsonl\n\n# Or download from HuggingFace\npython compute_scores.py --hf-repo Qwen\u002FQwen-Image-Bench\n```\n\nOutput: `scores_result.xlsx` + `scores_detail.json`\n\n\n## Top-5 Models\n\n| Model | Quality | Aesthetics | Alignment | Real-world Fidelity | Creative Generation | **Overall** |\n|-------|:-------:|:----------:|:---------:|:-------------------:|:-------------------:|:-----------:|\n| GPT Image 2 | **58.65** | **67.53** | **65.85** | **57.38** | **75.23** | **64.69** |\n| Nano Banana 2.0 | 54.77 | 61.08 | 62.40 | 54.28 | 67.05 | 59.82 |\n| GPT Image 1.5 | 55.14 | 60.88 | 61.72 | 53.95 | 66.35 | 59.65 |\n| Nano Banana Pro | 55.67 | 60.26 | 61.25 | 54.07 | 66.23 | 59.45 |\n| Qwen Image 2.0 Pro | 54.39 | 58.67 | 59.28 | 51.83 | 64.94 | 57.84 |\n\nFull results for all 18 models are available in the paper.\n\n\n## Inference Parameters\n\nThe judge model uses fixed inference parameters for reproducibility:\n\n| Parameter | Value |\n|-----------|-------|\n| `seed` | 42 |\n| `temperature` | 0 |\n| `top_k` | 1 |\n| `top_p` | 1.0 |\n| `repetition_penalty` | 1.05 |\n| `max_new_tokens` | 4096 |\n| `enable_thinking` | True |\n| `max_batch_size` | 24 |\n\n\n## Project Structure\n\n```\n.\n├── judge.py                 # Run judge model inference on new images\n├── compute_scores.py        # Compute scores from pre-generated responses\n├── score_utils.py           # Score extraction, mapping, correction, aggregation\n├── checklists.py            # Evaluation prompts and dimension definitions\n├── backends\u002F\n│   └── ms_swift_backend.py  # ms-swift inference engine\n├── metadata\u002F\n│   └── bench_metadata.json  # ID → dims_en metadata for judge inference\n├── requirements.txt\n└── assets\u002F                  # Figures for documentation\n```\n\n\n## Evaluation Framework\n\nThe benchmark uses a **3-level hierarchical scoring system** with 5 L1 dimensions, 23 L2 sub-capabilities, and 56 L3 facets:\n\n| L1 Dimension | L2 Sub-capabilities |\n|--------------|---------------------|\n| **Quality** | Realism, Detail, Resolution |\n| **Aesthetics** | Composition, Color Harmony, Lighting, Anatomical Portraiture, Emotional Expression, Style Control |\n| **Alignment** | Attributes, Actions, Layout, Relations, Scene |\n| **Real-world Fidelity** | Fairness, Safety & Compliance, World Knowledge |\n| **Creative Generation** | Imagination, Feature Matching, Logical Resolution, Text Rendering, Design Applications, Visual Storytelling |\n\n**Scoring**: Each L3 facet is rated 0 (Fail → 0), 1 (Pass → 60), or 2 (Excel → 100), with N\u002FA excluded. Scores aggregate bottom-up: L3 → L2 → L1 → Overall.\n\nFor the complete dimension hierarchy and detailed analysis, see the [benchmark dataset card](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQwen\u002FQwen-Image-Bench).\n\n\n## Citation\n\nIf you find this benchmark useful, please cite our paper:\n\n```bibtex\n@misc{li2026qwenimagebenchgenerationcreationtexttoimage,\n      title={Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation}, \n      author={Niantong Li and Guangzheng Hu and Weixu Qiao and Ying Ba and Qichen Hong and Shijun Shen and Jinlin Wang and Fan Zhou and Jianye Kang and Xin Shang and Ziyi He and Wei Wang and Dalin Li and Jiahao Li and Jie Zhang and Kaiyuan Gao and Kun Yan and Lihan Jiang and Ningyuan Tang and Shengming Yin and Tianhe Wu and Xiao Xu and Xiaoyue Chen and Yuxiang Chen and Yan Shu and Yanran Zhang and Yilei Chen and Yixian Xu and Zekai Zhang and Zhendong Wang and Zihao Liu and Zikai Zhou and Hongzhu Shi and Yi Wang and Bing Zhao and Hu Wei and Lin Qu and Chenfei Wu},\n      year={2026},\n      eprint={2605.28091},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.28091}, \n}\n```\n\n## License\n\nThis project is licensed under the [Apache License 2.0](https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0).\n","Qwen-Image-Bench 是一个用于评估文本到图像生成模型的工具包。它使用经过微调的 Q-Judger（基于 Qwen3.6-27B）对生成的图像进行评分，涵盖质量、美学、一致性、现实逼真度和创造性生成五个层次维度，共计56个细粒度方面。该工具包支持对任何T2I模型的评估，并能从预生成的响应中计算分数以重现基准测试数据集的结果，同时依托ms-swift进行推理设置。适用于需要对文本到图像生成模型进行全面且结构化评估的研究和开发场景。",2,"2026-06-11 04:07:35","CREATED_QUERY"]