[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80690":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":12,"contributorsCount":12,"subscribersCount":12,"size":12,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":12,"forks30d":12,"starsTrendScore":17,"compositeScore":12,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":12,"starSnapshotCount":12,"syncStatus":14,"lastSyncTime":25,"discoverSource":26},80690,"GenEvolve","MeiGen-AI\u002FGenEvolve","MeiGen-AI","Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation",null,"Python",63,0,46,2,8,17,9,"Other",false,"main",[],"2026-06-12 02:04:05","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"assets\u002Flogo_genevolve.png\" alt=\"GenEvolve\" width=\"160\">\n\n\u003Ch1>GenEvolve\u003C\u002Fh1>\n\n\u003Cp>\u003Cstrong>\u003Cem>Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation\u003C\u002Fem>\u003C\u002Fstrong>\u003C\u002Fp>\n\n\u003Cp>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.21605\">\n    \u003Cimg alt=\"Paper\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📄_Paper-arXiv:2605.21605-b31b1b\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fephemeral182.github.io\u002FGenEvolve\u002F\">\n    \u003Cimg alt=\"Project Page\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🌐_Project-Page-1f6feb\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FMeiGen-AI\u002FGenEvolve\">\n    \u003Cimg alt=\"Weights\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗_HuggingFace-GenEvolve-FFD21E\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMeiGen-AI\u002FGenEvolve-Data-Bench\">\n    \u003Cimg alt=\"Dataset\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗_Dataset-GenEvolve--Data-FFD21E\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMeiGen-AI\u002FGenEvolve\">\n    \u003Cimg alt=\"GitHub\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F💾_GitHub-Code-181717\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n\n## 👥 Authors\n\n> [**Sixiang Chen**](https:\u002F\u002Fephemeral182.github.io\u002F)\u003Csup>1,2\u003C\u002Fsup>, [**Zhaohu Xing**](https:\u002F\u002Fge-xing.github.io\u002F)\u003Csup>1\u003C\u002Fsup>, [**Tian Ye**](https:\u002F\u002Fowen718.github.io\u002F)\u003Csup>1\u003C\u002Fsup>, [**Xinyu Geng**](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=rYB-IBwAAAAJ&hl=zh-CN)\u003Csup>3\u003C\u002Fsup>, [**Yunlong Lin**](https:\u002F\u002Flyl1015.github.io\u002F), [**Jianyu Lai**](https:\u002F\u002Falexlai2860.github.io\u002F)\u003Csup>1,2\u003C\u002Fsup>, [**Xuanhua He**](https:\u002F\u002Fxuanhuahe.github.io\u002F)\u003Csup>3\u003C\u002Fsup>, [**Fuxiang Zhai**](https:\u002F\u002Ffuxiangzhai.github.io\u002F)\u003Csup>1\u003C\u002Fsup>, [**Jialin Gao**](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=sj4FqEgAAAAJ&hl=zh-CN)\u003Csup>4,‡\u003C\u002Fsup>, [**Lei Zhu**](https:\u002F\u002Fsites.google.com\u002Fsite\u002Findexlzhu\u002Fhome)\u003Csup>1,3,†\u003C\u002Fsup>\n>\n> \u003Csup>1\u003C\u002Fsup>The Hong Kong University of Science and Technology (Guangzhou)\n>\n> \u003Csup>2\u003C\u002Fsup>Meituan\n>\n> \u003Csup>3\u003C\u002Fsup>The Hong Kong University of Science and Technology\n>\n> \u003Csup>4\u003C\u002Fsup>National University of Singapore\n\n**Project Leader:** Junfeng Luo (Meituan)\n\n\n---\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"assets\u002Fteaser.jpg\" alt=\"GenEvolve teaser\" width=\"100%\">\n\n\u003Cp>\u003Cem>The same trained agent policy paired with two reference-conditioned generators ⟶\u003Cbr>\n\u003Cstrong>Qwen-Image-Edit (open)\u003C\u002Fstrong> &nbsp;·&nbsp; \u003Cstrong>Nano Banana Pro (strong)\u003C\u002Fstrong>\u003C\u002Fem>\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n## 🌟 What is GenEvolve?\n\nGenEvolve formulates open-ended image generation as a **tool-orchestrated visual trajectory**. The agent gathers external textual evidence, retrieves visual references, performs **internal knowledge activation** through callable generation skills, and synthesizes a **prompt-reference program** $z = (g, R)$ that any reference-conditioned generator can render.\n\nThe released `GenEvolve` policy is based on Qwen3-VL-8B and is designed to be **generator-transferable**: the same agent output can be rendered by the open Qwen-Image-Edit backend or by a stronger proprietary renderer such as Nano Banana Pro.\n\n## 🎁 What's released\n\n| Component | Where |\n|---|---|\n| 🧠 Trained agent policy `GenEvolve` (Qwen3-VL-8B-based) | 🤗 [`MeiGen-AI\u002FGenEvolve`](https:\u002F\u002Fhuggingface.co\u002FMeiGen-AI\u002FGenEvolve) |\n| ⚡ Standalone inference runtime (`GenEvolveAgent`, OpenAI-compatible) | this repo |\n| 🛠️ Three tools (`search`, `image_search`, `query_knowledge`) | this repo |\n| 📚 The eight skill markdown files used at training time | this repo |\n| 🎨 Reference-conditioned generator wrappers (Qwen-Image-Edit + Nano Banana Pro) | this repo |\n| 📦 SFT trajectories (9,000 records) | 🤗 [`MeiGen-AI\u002FGenEvolve-Data-Bench`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMeiGen-AI\u002FGenEvolve-Data-Bench) \u002F `GenEvolve-Data-SFT\u002F` |\n| 🎯 Self-evolution prompts + GT images (3,175 records) | 🤗 [`MeiGen-AI\u002FGenEvolve-Data-Bench`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMeiGen-AI\u002FGenEvolve-Data-Bench) \u002F `GenEvolve-Data-RL\u002F` |\n| 📊 Held-out evaluation benchmark (594 prompts + GT images) | 🤗 [`MeiGen-AI\u002FGenEvolve-Data-Bench`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMeiGen-AI\u002FGenEvolve-Data-Bench) \u002F `GenEvolve-Bench\u002F` |\n\n## 📋 Requirements\n\nGenEvolve has a main runtime environment for policy serving, agent rollouts, tool execution, and benchmark inference. This is not the only process used in a full image-generation pipeline: for reproducible Qwen rendering, run Qwen-Image-Edit as a separate FastAPI\u002Fdiffusers service and call it from GenEvolve through `--service-url`.\n\n### Main GenEvolve runtime - `genevolve`\n\nUse this environment for the released agent code path: serving `GenEvolve`, running the agent, calling tools, using the Nano client, and calling a Qwen service endpoint. Install it once using the Quickstart commands below.\n\n| Component | Version | Notes |\n|---|---|---|\n| Python | 3.11 |\n| CUDA stack | CUDA 12.x; our logs used PyTorch CUDA 12.8 wheels |\n| `torch` \u002F `torchvision` | `2.8.0` \u002F `0.23.0` |\n| `transformers` | `4.57.1` |\n| `vllm` | `0.11.0` |\n| `ray` | `2.54.1` |\n| `flash-attn` | `2.8.3` |\n\nThis environment does not install or launch external services such as Qwen-Image-Edit, Serper, or the Google image API. Those are configured separately.\n\n### External services\n\n| Service | Variable | Used for |\n|---|---|---|\n| [serper.dev](https:\u002F\u002Fserper.dev) | `SERPER_API_KEY` | required for `search` and `image_search` |\n| [Google Generative Language API](https:\u002F\u002Fai.google.dev\u002Fapi) | `GOOGLE_API_KEY` or `GEMINI_API_KEY` | only for `--backend nano-banana-pro` |\n| Qwen-Image-Edit FastAPI service | `--service-url` | only for `--backend qwen-image-edit-service` |\n\n### Qwen-Image-Edit service environment\n\nFor Qwen rendering, use a separate service environment instead of mixing the diffusion stack into the vLLM server. A typical working stack is Python 3.11, PyTorch\u002Ftorchvision `2.6.0`\u002F`0.21.0` with CUDA 12.4 wheels, `diffusers>=0.38`, `transformers>=4.57`, `accelerate`, `fastapi`, `uvicorn`, `pillow`, and `requests`.\n\n```bash\nconda create -n qwenimage python=3.11 -y\nconda activate qwenimage\npip install torch==2.6.0 torchvision==0.21.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\npip install \"diffusers>=0.38\" \"transformers>=4.57\" accelerate fastapi uvicorn pillow requests\n```\n\nStart any Qwen-Image-Edit FastAPI service compatible with `POST \u002Fgenerate`; a common deployment is one Qwen pipeline per visible GPU, with one HTTP endpoint such as `http:\u002F\u002Fhost:8001`. GenEvolve sends requests with `--backend qwen-image-edit-service --service-url http:\u002F\u002Fhost:8001`.\n\n## 🚀 Quickstart\n\n### 1. Install the main GenEvolve runtime\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FMeiGen-AI\u002FGenEvolve.git\ncd GenEvolve\n\nconda create -n genevolve python=3.11 -y\nconda activate genevolve\npip install -U pip setuptools wheel packaging psutil ninja\npip install torch==2.8.0 torchvision==0.23.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\npip install --no-build-isolation -r requirements.txt\npip install -e .\n```\n\nThis installs only the main GenEvolve runtime: vLLM serving, the agent tools, and lightweight generator clients\u002Fwrappers. It does not install or start the separate Qwen-Image-Edit service; set up that service from the Qwen environment section above when using `--backend qwen-image-edit-service`.\n\n### 2. Serve the released checkpoint\n\nPut the Hugging Face checkpoint directory in `MODEL_PATH`. The serving scripts support both tensor parallelism (`TP`) and data parallel replicas (`DP`).\n\n- `TP` shards one model replica across multiple GPUs.\n- `DP` launches multiple model replicas to improve throughput for many concurrent prompts.\n- Total GPU usage is `TP × DP`.\n- Use a larger `DP` when `scripts\u002Frun_agent.py --parallel` is large and each request fits on one GPU.\n- Use a larger `TP` when one model replica needs more memory or longer context than one GPU can provide.\n\n```bash\n# Single GPU \u002F single replica.\nMODEL_PATH=MeiGen-AI\u002FGenEvolve PORT=8000 TP=1 DP=1 bash scripts\u002Fserve_vllm.sh\n\n# Higher throughput on one 8-GPU node: 8 replicas, one GPU per replica.\nMODEL_PATH=MeiGen-AI\u002FGenEvolve PORT=8000 TP=1 DP=8 bash scripts\u002Fserve_vllm.sh\n\n# If one replica needs more memory: 4 replicas, two GPUs per replica.\nMODEL_PATH=MeiGen-AI\u002FGenEvolve PORT=8000 TP=2 DP=4 bash scripts\u002Fserve_vllm.sh\n```\n\nFor example, `TP=8 DP=1` is one model replica sharded over 8 GPUs. It is not 8 independent services. For throughput on one 8-GPU node, prefer `TP=1 DP=8` if the model fits on one GPU; use `TP=2 DP=4` or `TP=4 DP=2` when each replica needs multiple GPUs.\n\n### 3. Run an end-to-end example\n\n```bash\nexport SERPER_API_KEY=\u003Cyour_key>             # required for search and image_search\nexport GOOGLE_API_KEY=\u003Cyour_key>             # or GEMINI_API_KEY; only for Nano Banana Pro\n\npython examples\u002Fquickstart.py \\\n    --backend nano-banana-pro \\\n    --base-url http:\u002F\u002Flocalhost:8000\u002Fv1 \\\n    --model GenEvolve \\\n    --prompt \"A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \\\"PARIS\\\" rendered in bold serif type.\" \\\n    --output paris.png\n```\n\nFor the open-generator path, use `--backend qwen-image-edit-service` with one or more Qwen-Image-Edit service endpoints:\n\n```bash\npython examples\u002Fquickstart.py \\\n    --backend qwen-image-edit-service \\\n    --service-url http:\u002F\u002Fyour-qwen-service:8001 \\\n    --base-url http:\u002F\u002Flocalhost:8000\u002Fv1 \\\n    --model GenEvolve \\\n    --output paris_qwen.png\n```\n\n`--backend qwen-image-edit` is kept only as a local diffusers debug path when the Qwen-Image-Edit dependencies are installed in the active environment.\n\n### 4. Batch pipeline\n\nThe agent rollout and the heavy image rendering are split into two stages so they can run on different machines.\n\n```bash\n# Stage 1: agent rollouts -> results.json.\npython scripts\u002Frun_agent.py \\\n    --input examples\u002Fexample_prompts.jsonl \\\n    --output-dir runs\u002Fexample \\\n    --base-url http:\u002F\u002Flocalhost:8000\u002Fv1 \\\n    --model GenEvolve \\\n    --parallel 4\n\n# Stage 2a: render through one or more Qwen-Image-Edit services.\n# Repeating --service-url enables round-robin dispatch; --parallel sends\n# concurrent requests so multiple service workers can be used.\npython scripts\u002Fgenerate_images.py \\\n    --input runs\u002Fexample\u002Fresults.json \\\n    --output-dir runs\u002Fexample_qwen_service \\\n    --backend qwen-image-edit-service \\\n    --service-url http:\u002F\u002Fyour-qwen-service-1:8001 \\\n    --service-url http:\u002F\u002Fyour-qwen-service-2:8001 \\\n    --parallel 8\n\n# Stage 2b: render with Nano Banana Pro.\npython scripts\u002Fgenerate_images.py \\\n    --input runs\u002Fexample\u002Fresults.json \\\n    --output-dir runs\u002Fexample_nano \\\n    --backend nano-banana-pro \\\n    --parallel 4\n```\n\nCurrent script support:\n\n| Stage | Script | Scaling knobs |\n|---|---|---|\n| Agent model serving | `scripts\u002Fserve_vllm.sh` | `TP`, `DP`, `PORT`, `MAX_MODEL_LEN`, `MODEL_PATH` |\n| Agent rollouts | `scripts\u002Frun_agent.py` | `--parallel`, `--base-url`, `--model` |\n| Remote Qwen rendering | `scripts\u002Fgenerate_images.py --backend qwen-image-edit-service` | repeat `--service-url` and set `--parallel` |\n| Local Qwen debug rendering | `scripts\u002Fgenerate_images.py --backend qwen-image-edit` | single local process; requires a Qwen-compatible diffusers environment |\n| Nano rendering | `scripts\u002Fgenerate_images.py --backend nano-banana-pro` | `--parallel`, subject to API quota\u002Frate limits |\n\n### 5. Benchmark scoring\n\nTo reproduce benchmark metrics, download the public dataset and pass the\nbenchmark JSONL directly to the agent runner. The public benchmark uses\n`question` as the prompt field; `scripts\u002Frun_agent.py` accepts both `question`\nand `prompt`, preserves extra fields such as `gt_image`, `eval_type`,\n`category`, and `difficulty`, and the rendering script copies them into its\noutput `results.json`.\n\nThe scorer in `scripts\u002Fevaluate_images.py` is the paper-compatible Gemini judge:\nit uses the same rubric prompt, the same image order (Image 1 = generated,\nImage 2 = GT), the same OpenAI-compatible multimodal chat-completions call, and\nthe same score normalization and weighted overall formula used for the reported\nbenchmark numbers. No service endpoint or API key is hard-coded.\n\nPublic benchmark row format:\n\n```jsonl\n{\"id\": \"0\", \"question\": \"A detailed image-generation request...\", \"gt_image\": \"images\u002Fcase_00000.jpg\", \"eval_type\": \"Knowledge-Anchored\", \"category\": \"architecture_landmark\", \"difficulty\": \"hard\"}\n```\n\nRun the same two-stage pipeline, then score the rendered images with Gemini:\n\n```bash\nhuggingface-cli download MeiGen-AI\u002FGenEvolve-Data-Bench \\\n    --repo-type dataset \\\n    --local-dir .\u002FGenEvolve-Data-Bench\n\n# Stage 1: agent rollouts.\npython scripts\u002Frun_agent.py \\\n    --input .\u002FGenEvolve-Data-Bench\u002FGenEvolve-Bench\u002Ftest.jsonl \\\n    --output-dir runs\u002Fbench_agent \\\n    --base-url http:\u002F\u002Flocalhost:8000\u002Fv1 \\\n    --model GenEvolve \\\n    --parallel 16\n\n# Stage 2: render images, for example through Qwen-Image-Edit services.\npython scripts\u002Fgenerate_images.py \\\n    --input runs\u002Fbench_agent\u002Fresults.json \\\n    --output-dir runs\u002Fbench_qwen \\\n    --backend qwen-image-edit-service \\\n    --service-url http:\u002F\u002Fyour-qwen-service:8001 \\\n    --parallel 16\n\n# Stage 3: Gemini judge.\n# Use an OpenAI-compatible Gemini chat-completions endpoint.\nexport OPENAI_API_KEY=\u003Cyour_eval_api_key>\nexport OPENAI_API_BASE=\u003Cyour_openai_compatible_base_url>\npython scripts\u002Fevaluate_images.py \\\n    --results runs\u002Fbench_qwen\u002Fresults.json \\\n    --gt-root .\u002FGenEvolve-Data-Bench\u002FGenEvolve-Bench \\\n    --model gemini-3.1-pro-preview \\\n    --max-workers 16 \\\n    --rpm 60 \\\n    --resume\n```\n\n`scripts\u002Fevaluate_images.py` writes:\n\n| File | Contents |\n|---|---|\n| `results_eval.json` | per-sample judge output and rationale |\n| `summary.json` | aggregate metrics |\n| `summary.csv` | the same metrics in table form |\n\n`results_eval.json` also appends benchmark split summaries such as\n`eval_type:Knowledge-Anchored`, `eval_type:Quality-Anchored`, and\n`overall_avg`.\n\nThe reported metrics are `faithfulness`, `visual_correctness`,\n`text_accuracy`, `aesthetics`, and the weighted `overall` score:\n\n```text\noverall = 0.1 * faithfulness\n        + 0.4 * visual_correctness\n        + 0.4 * text_accuracy\n        + 0.1 * aesthetics\n```\n\n`overall_missing_zero` keeps the full denominator and treats missing or failed\ncases as zero. The summary also reports metrics by `eval_type`, `category`,\nand `difficulty` when those fields are present.\n\n## 🧩 Optional Python Usage\n\nIf you only want to run the provided scripts, you can skip this section. This is for users who want to call the agent and renderer directly from their own Python pipeline instead of going through `scripts\u002Frun_agent.py` and `scripts\u002Fgenerate_images.py`.\n\n```python\nfrom genevolve import GenEvolveAgent\nfrom genevolve.generator import QwenImageEditServiceGenerator  # or NanoBananaProGenerator\n\nagent = GenEvolveAgent(\n    model=\"GenEvolve\",\n    base_url=\"http:\u002F\u002Flocalhost:8000\u002Fv1\",\n    api_key=\"EMPTY\",\n)\nresult = agent.run(\"A cyberpunk version of the Sydney Opera House at sunset.\")\n\n# z = (gen_prompt, reference_images)\nprint(result.gen_prompt)\nfor r in result.reference_images:\n    print(r[\"img_id\"], r[\"local_path\"], r[\"note\"])\n\nbackend = QwenImageEditServiceGenerator([\"http:\u002F\u002Fyour-qwen-service:8001\"])\nimage = backend.generate(\n    result.gen_prompt,\n    [r[\"local_path\"] for r in result.reference_images if r.get(\"local_path\")],\n)\nimage.save(\"opera.png\")\n```\n\n## 🧠 Method overview\n\n\u003Cp align=\"center\">\u003Cimg src=\"assets\u002Foverview.png\" alt=\"GenEvolve method overview\" width=\"92%\">\u003C\u002Fp>\n\nFor a user request $x$, the agent samples a multi-turn trajectory\n\n$$\\tau = (a_1, o_1, \\ldots, a_T, o_T, z), \\qquad z = (g, R),$$\n\nwhere each $a_t$ is one of the three actions below and $o_t$ is the corresponding observation. The downstream generator renders $\\hat{y} = G(g, R)$.\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\u003Cth>Tool\u003C\u002Fth>\u003Cth>Role\u003C\u002Fth>\u003Cth>Output\u003C\u002Fth>\u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>\u003Ccode>search(queries)\u003C\u002Fcode>\u003C\u002Ftd>\n      \u003Ctd>External textual evidence - entities, dates, facts.\u003C\u002Ftd>\n      \u003Ctd>Markdown digest.\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Ccode>image_search(query)\u003C\u002Fcode>\u003C\u002Ftd>\n      \u003Ctd>Visual references; each result gets a unique \u003Ccode>IMG_###\u003C\u002Fcode> id.\u003C\u002Ftd>\n      \u003Ctd>Image list with local paths.\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Ccode>query_knowledge(skill_name)\u003C\u002Fcode>\u003C\u002Ftd>\n      \u003Ctd>\u003Cstrong>Internal knowledge activation\u003C\u002Fstrong> - invokes one of the eight callable generation skills.\u003C\u002Ftd>\n      \u003Ctd>Skill instructions in Markdown.\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\nThe final answer is a JSON object, the **prompt-reference program**:\n\n```json\n{\n  \"gen_prompt\": \"... a targeted instruction that refers to references by ordinal phrases ('the first reference image', 'the second reference image') ...\",\n  \"reference_images\": [\n    {\"img_id\": \"IMG_001\", \"note\": \"what to copy from this reference\"}\n  ]\n}\n```\n\n## 📦 Data\n\nWe release the training data and benchmark in one Hugging Face dataset repository: [`MeiGen-AI\u002FGenEvolve-Data-Bench`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMeiGen-AI\u002FGenEvolve-Data-Bench). The total trajectory data is too large for GitHub but installs in one line via 🤗 `datasets` \u002F `huggingface-cli`.\n\n| Dataset | Records | Size | Purpose |\n|---|---|---|---|\n| `GenEvolve-Data-SFT\u002F` | 9,000 records | ~7.4 GB | Multi-turn tool-orchestrated trajectories used for the SFT cold start. Each record: `messages` (chat-format ReAct trajectory ending in `\u003Canswer>{gen_prompt, reference_images}`) + `images` (reference jpegs). |\n| `GenEvolve-Data-RL\u002F` | 3,175 records | ~680 MB | Open-ended user requests paired with curated GT images. Used for GRPO + Visual Experience Distillation, where multiple agent rollouts per prompt are scored against the GT. |\n| `GenEvolve-Bench\u002F` | 594 prompts | ~120 MB | Held-out evaluation benchmark. Contains both **Knowledge-Anchored** (335) and **Quality-Anchored** (259) tracks plus per-prompt category, difficulty, and skill metadata. |\n\n### Quick load\n\n```bash\npip install -U huggingface_hub datasets\n\nhuggingface-cli download MeiGen-AI\u002FGenEvolve-Data-Bench \\\n    --repo-type dataset \\\n    --local-dir .\u002FGenEvolve-Data-Bench\n```\n\n```python\nfrom datasets import load_dataset\n\nrepo_id = \"MeiGen-AI\u002FGenEvolve-Data-Bench\"\n\nbench = load_dataset(repo_id, \"bench\", split=\"test\")\nprint(bench[0][\"question\"], bench[0][\"gt_image\"])\n\nrl = load_dataset(repo_id, \"rl\", split=\"train\")\nsft = load_dataset(repo_id, \"sft\", split=\"train\")\nprint(sft[0][\"messages\"])\nprint(sft[0][\"images\"])\n```\n\nAll paths inside the datasets are relative, for example `images\u002Fcase_00512.jpg` or `images\u002Ftraj_00213\u002FIMG_001.jpg`; resolve them against the dataset directory you downloaded to. Per-dataset usage notes live on each dataset's Hub page.\n\nThe full training scripts are not included in this repository, but the released SFT\u002FRL datasets, model weights, tools, and runtime let you reproduce the path from a user request to a rendered image.\n\n## 🖼️ Visual results\n\n\u003Cp align=\"center\">\u003Cimg src=\"assets\u002Fvisual_comparison.png\" alt=\"Qualitative comparison\" width=\"100%\">\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Csub>The same \u003Ccode>GenEvolve\u003C\u002Fcode> policy paired with two different reference-conditioned generators. \u003Cspan style=\"color:#D97706\">Orange\u003C\u002Fspan> marks external\u002Funcommon knowledge, \u003Cspan style=\"color:#2563EB\">blue\u003C\u002Fspan> marks internal generation-knowledge requirements.\u003C\u002Fsub>\u003C\u002Fp>\n\n### 🎨 Extended gallery - paired with Nano Banana Pro\n\n\u003Cp align=\"center\">\u003Cimg src=\"assets\u002Fgallery_nano.jpg\" alt=\"GenEvolve + Nano Banana Pro gallery\" width=\"100%\">\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Csub>Additional qualitative results of \u003Ccode>GenEvolve\u003C\u002Fcode> with Nano Banana Pro as the downstream renderer. The agent autonomously orchestrates search, reference selection, and skill activation across diverse open-ended categories: spatial layout, text rendering, quantity counting, attribute binding, anatomy\u002Fpose, creative transfer, material physics, and aesthetic drawing.\u003C\u002Fsub>\u003C\u002Fp>\n\n### 🎨 Extended gallery - paired with Qwen-Image-Edit (open)\n\n\u003Cp align=\"center\">\u003Cimg src=\"assets\u002Fgallery_qwen.jpg\" alt=\"GenEvolve + Qwen-Image-Edit gallery\" width=\"100%\">\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Csub>The same trained agent policy paired with the open-source Qwen-Image-Edit-2511 renderer. Consistent quality across both generators demonstrates that \u003Ccode>GenEvolve\u003C\u002Fcode> learns generator-transferable tool orchestration rather than overfitting to one specific renderer.\u003C\u002Fsub>\u003C\u002Fp>\n\n## ⚙️ Configuration\n\n| Variable | Purpose | Default |\n|---|---|---|\n| `OPENAI_BASE_URL` | OpenAI-compatible chat-completions endpoint | `http:\u002F\u002Flocalhost:8000\u002Fv1` |\n| `OPENAI_API_KEY` | API key for the inference server or the OpenAI-compatible evaluator endpoint | `EMPTY` for local inference |\n| `OPENAI_API_BASE` | OpenAI-compatible Gemini judge endpoint used by `scripts\u002Fevaluate_images.py` | provider-specific |\n| `SERPER_API_KEY` | [serper.dev](https:\u002F\u002Fserper.dev) key for text and image search | required |\n| `SERPER_BASE_URL` | Override for Serper-compatible gateways | `https:\u002F\u002Fgoogle.serper.dev` |\n| `IMAGE_DOWNLOAD_DIR` | Local cache for `image_search` downloads | `\u002Ftmp\u002Fgenevolve_images` |\n| `GOOGLE_API_KEY` \u002F `GEMINI_API_KEY` | Google Generative Language API key | required for Nano backend |\n\n## 🧯 Troubleshooting\n\n| Symptom | Check |\n|---|---|\n| `search` \u002F `image_search` returns authentication errors | Set `SERPER_API_KEY` or configure `SERPER_BASE_URL` for your internal Serper-compatible gateway. |\n| Agent cannot connect to the model | Confirm the vLLM server is running and `OPENAI_BASE_URL` or `--base-url` ends with `\u002Fv1`. |\n| Qwen local renderer fails at import time | Use a separate Qwen-Image-Edit service environment and call it with `qwen-image-edit-service`; avoid mixing incompatible `xformers` \u002F `flash-attn` combinations into the renderer env. |\n| Qwen renderer says it needs a reference image | Qwen-Image-Edit is reference-conditioned; rerun the agent or use Nano Banana Pro for no-reference prompts. |\n| `evaluate_images.py` cannot find GT images | Keep `gt_image` in each input record and pass `--gt-root` pointing to the downloaded benchmark directory. |\n| `flash-attn` build fails | Install a PyTorch\u002FCUDA wheel first, then run `pip install flash-attn==2.8.3 --no-build-isolation`. |\n| Batch rendering resumes after interruption | `scripts\u002Fgenerate_images.py` writes `results.json` incrementally under the output directory. |\n\n## 🗂️ Repository layout\n\n```\ngenevolve\u002F\n├── genevolve\u002F\n│   ├── agent.py               # GenEvolveAgent: ReAct loop on top of an OpenAI-compatible server\n│   ├── system_prompt.py       # system prompt used by the released agent\n│   ├── knowledge_tool.py      # query_knowledge: eight callable generation skills\n│   ├── tools\u002Fweb_search.py    # search + image_search (Serper-compatible)\n│   ├── generator.py           # Qwen-Image-Edit + Nano Banana Pro backends\n│   └── knowledge\u002Fskills\u002F      # skill markdown files\n├── scripts\u002F\n│   ├── serve_vllm.sh          # serve the checkpoint with vLLM\n│   ├── run_agent.py           # batch agent rollouts -> results.json\n│   ├── generate_images.py     # render images from results.json\n│   └── evaluate_images.py     # Gemini judge scoring and metric summary\n├── examples\u002F\n│   ├── quickstart.py          # single-prompt end-to-end example\n│   └── example_prompts.jsonl\n├── assets\u002F                    # README figures\n├── requirements.txt\n├── setup.py\n└── README.md\n```\n\n## 🙏 Acknowledgements\n\nWe thank the authors and maintainers of **[Gen-Searcher](https:\u002F\u002Fgithub.com\u002FRUCAIBox\u002FGen-Searcher)**, **Qwen3-VL**, **Qwen-Image-Edit**, **vLLM**, Serper.dev, and the Google Generative Language API.\n\n## 📝 Citation\n\n```bibtex\n@misc{chen2026genevolveselfevolvingimagegeneration,\n      title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation}, \n      author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},\n      year={2026},\n      eprint={2605.21605},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.21605}, \n}\n```\n\n## 📜 License\n\nCode is released under the [Apache 2.0](LICENSE) license. Released model weights inherit the upstream license of `Qwen3-VL-8B-Instruct`. Search results returned by Serper.dev and images rendered by Nano Banana Pro \u002F Qwen-Image-Edit are governed by the respective upstream service terms.\n","GenEvolve 是一个通过工具编排视觉体验蒸馏实现自我进化的图像生成代理项目。它利用外部文本证据和视觉参考，结合内部知识激活机制，生成一个可被任何参考条件生成器渲染的提示-参考程序。该项目基于Qwen3-VL-8B模型开发，具备生成器可移植性，即同一个代理输出既可以通过开放的Qwen-Image-Edit后端渲染，也可以通过更强大的专有渲染器如Nano Banana Pro来实现。此技术特别适用于需要高度定制化和不断迭代优化图像生成效果的场景，比如艺术创作、广告设计以及虚拟现实内容制作等。","2026-06-11 04:01:39","CREATED_QUERY"]