[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72551":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},72551,"Step1X-Edit","stepfun-ai\u002FStep1X-Edit","stepfun-ai","A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.","https:\u002F\u002Fstep1x-edit.github.io",null,"Python",2224,102,18,31,0,4,9,24,12,28.04,"Apache License 2.0",false,"main",[26,27,28],"image-editing","reasoning","visual-reasoning","2026-06-12 02:03:04","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo.png\"  height=100>\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fstep1x-edit.github.io\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Project%20Page&message=Web&color=green\">\u003C\u002Fa> &ensp;\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.17761\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Step1X-Edit&message=Arxiv&color=red\">\u003C\u002Fa> &ensp;\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.22625\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=ReasonEdit&message=Arxiv&color=red\">\u003C\u002Fa> &ensp;\n  \u003Ca href=\"assets\u002FWeChat.jpg\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=WeChat&message=Add%20Me&color=green&logo=wechat&logoColor=white\">\n  \u003C\u002Fa>\n  \n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FStep1X-Edit\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Model&message=HuggingFace&color=yellow\">\u003C\u002Fa> &ensp;\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fstepfun-ai\u002FGEdit-Bench\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=GEdit-Bench&message=HuggingFace&color=yellow\">\u003C\u002Fa> &ensp;\n  [![Run on Replicate](https:\u002F\u002Freplicate.com\u002Fzsxkib\u002Fstep1x-edit\u002Fbadge)](https:\u002F\u002Freplicate.com\u002Fzsxkib\u002Fstep1x-edit) &ensp;\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Fj3qzuAyn\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Discord%20Channel&message=Discord&color=purple\">\u003C\u002Fa> &ensp;\n\u003C\u002Fdiv>\n\n\n## 🔥🔥🔥 News!!\n* Apr 29, 2026: 🎉 Step Image Edit 2 is now live — a lightweight model designed for ultra-fast response and high-quality output, delivering a real-time interactive creation experience. It can complete image generation and editing tasks within 2 seconds. Feel free to try it out and share your feedback ✨✨✨\n\n  Try it here (StepFun Open Platform): [https:\u002F\u002Fplatform.stepfun.com\u002Fdocs\u002Fzh\u002Fguides\u002Fmodels\u002Fstep-image-edit-2](https:\u002F\u002Fplatform.stepfun.com\u002Fdocs\u002Fzh\u002Fguides\u002Fmodels\u002Fstep-image-edit-2)\n\n  API documentation: [https:\u002F\u002Fplatform.stepfun.com\u002Fdocs\u002Fzh\u002Fstep-plan\u002Fintegrations\u002Fimage-api](https:\u002F\u002Fplatform.stepfun.com\u002Fdocs\u002Fzh\u002Fstep-plan\u002Fintegrations\u002Fimage-api)\n\n* Dec 29, 2025: 🎉 [RegionE](https:\u002F\u002Fgithub.com\u002FPeyton-Chen\u002FRegionE) delivers a 2.5× speedup for Step1X-Edit inference with no accuracy degradation, achieved with just five lines of code.\n* Nov 26, 2025: 👋 We release [Step1X-Edit-v1p2](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FStep1X-Edit-v1p2) (referred to as **ReasonEdit-S** in the paper), a native reasoning edit model with better performance on KRIS-Bench and GEdit-Bench. Technical report can be found [here](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.22625).\n  \u003Ctable>\n  \u003Cthead>\n  \u003Ctr>\n    \u003Cth rowspan=\"2\">Models\u003C\u002Fth>\n    \u003Cth colspan=\"3\"> \u003Cdiv align=\"center\">GEdit-Bench\u003C\u002Fdiv> \u003C\u002Fth>\n    \u003Cth colspan=\"4\"> \u003Cdiv align=\"center\">Kris-Bench\u003C\u002Fdiv> \u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth>G_SC⬆️\u003C\u002Fth> \u003Cth>G_PQ⬆️ \u003C\u002Fth> \u003Cth>G_O⬆️\u003C\u002Fth> \u003Cth>FK⬆️\u003C\u002Fth> \u003Cth>CK⬆️\u003C\u002Fth> \u003Cth>PK⬆️ \u003C\u002Fth> \u003Cth>Overall⬆️\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n  \u003Ctr>  \n    \u003Ctd>Flux-Kontext-dev \u003C\u002Ftd> \u003Ctd>7.16\u003C\u002Ftd> \u003Ctd>7.37\u003C\u002Ftd> \u003Ctd>6.51\u003C\u002Ftd> \u003Ctd>53.28\u003C\u002Ftd> \u003Ctd>50.36\u003C\u002Ftd> \u003Ctd>42.53\u003C\u002Ftd> \u003Ctd>49.54\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>   \n    \u003Ctd>Qwen-Image-Edit-2509 \u003C\u002Ftd> \u003Ctd>8.00\u003C\u002Ftd> \u003Ctd>7.86\u003C\u002Ftd> \u003Ctd>7.56\u003C\u002Ftd> \u003Ctd>61.47\u003C\u002Ftd> \u003Ctd>56.79\u003C\u002Ftd> \u003Ctd>47.07\u003C\u002Ftd> \u003Ctd>56.15\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>Step1X-Edit v1.1 \u003C\u002Ftd> \u003Ctd>7.66\u003C\u002Ftd> \u003Ctd>7.35\u003C\u002Ftd> \u003Ctd>6.97\u003C\u002Ftd> \u003Ctd>53.05\u003C\u002Ftd> \u003Ctd>54.34\u003C\u002Ftd> \u003Ctd>44.66\u003C\u002Ftd> \u003Ctd>51.59\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>Step1x-edit-v1p2-preview \u003C\u002Ftd> \u003Ctd>8.14\u003C\u002Ftd> \u003Ctd>7.55\u003C\u002Ftd> \u003Ctd>7.42\u003C\u002Ftd> \u003Ctd>60.49\u003C\u002Ftd> \u003Ctd>58.81\u003C\u002Ftd> \u003Ctd>41.77\u003C\u002Ftd> \u003Ctd>52.51\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>Step1x-edit-v1p2 (base) \u003C\u002Ftd> \u003Ctd>7.77\u003C\u002Ftd> \u003Ctd>7.65\u003C\u002Ftd> \u003Ctd>7.24\u003C\u002Ftd> \u003Ctd>58.23\u003C\u002Ftd> \u003Ctd>60.55\u003C\u002Ftd> \u003Ctd>46.21\u003C\u002Ftd> \u003Ctd>56.33\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>Step1x-edit-v1p2 (thinking) \u003C\u002Ftd> \u003Ctd>8.02\u003C\u002Ftd> \u003Ctd>7.64\u003C\u002Ftd> \u003Ctd>7.36\u003C\u002Ftd> \u003Ctd>59.79\u003C\u002Ftd> \u003Ctd>62.76\u003C\u002Ftd> \u003Ctd>49.78\u003C\u002Ftd> \u003Ctd>58.64\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>Step1x-edit-v1p2 (thinking + reflection) \u003C\u002Ftd> \u003Ctd>8.18\u003C\u002Ftd> \u003Ctd>7.85\u003C\u002Ftd> \u003Ctd>7.58\u003C\u002Ftd> \u003Ctd>62.44\u003C\u002Ftd> \u003Ctd>65.72\u003C\u002Ftd> \u003Ctd>50.42\u003C\u002Ftd> \u003Ctd>60.93\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003C\u002Ftable>\n\n* Sep 08, 2025: 👋 We release [step1x-edit-v1p2-preview](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FStep1X-Edit-v1p2-preview), a new version of Step1X-Edit with reasoning edit ability and better performance (report to be released soon), featuring:\n  - Native Reasoning Edit Model: Combines instruction reasoning with reflective correction to handle complex edits more accurately. Performance on KRIS-Bench:\n    |    Models    |   Factual Knowledge ⬆️   |  Conceptual Knowledge ⬆️ | Procedural Knowledge ⬆️   |  Overall ⬆️ | \n    |:------------:|:------------:|:------------:| :------------:|:------------:| \n    | Step1X-Edit v1.1  | 53.05 |  54.34 | 44.66 | 51.59 |   \n    | Step1x-edit-v1p2-preview  | 60.49 | 58.81 | 41.77 | 52.51 | \n    | Step1x-edit-v1p2-preview (thinking)  | 62.24 | 62.25 | 44.43 | 55.21| \n    | Step1x-edit-v1p2-preview (thinking + reflection) | 62.94 |  61.82 |  44.08 |  55.64 | \n  - Improved image editing quality and better instruction-following performance. Performance on GEdit-Bench:\n    |     Models    |     G_SC ⬆️   |  G_PQ ⬆️ | G_O ⬆️   |  Q_SC ⬆️ | Q_PQ ⬆️   |  Q_O ⬆️ |\n    |:------------:|:------------:|:------------:| :------------:|:------------:| :------------:|:------------:|\n    | Step1X-Edit (v1.0)  |    7.13   | 7.00 |   6.44   | 7.39 |    7.28   | 7.07 | \n    | Step1X-Edit (v1.1)  |    7.66   | 7.35 |   6.97   | 7.65 |    7.41   | 7.35 | \n    | Step1x-edit-v1p2-preview  |    8.14   | 7.55 |   7.42   | 7.90 |   7.34   | 7.40   |\n* Jul 09, 2025: 👋 We’ve updated the step1x-edit model and released it as [step1x-edit-v1p1](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FStep1X-Edit) (diffusers version see [here](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FStep1X-Edit-v1p1-diffusers)), featuring:\n  - Added support for text-to-image (T2I) generation tasks\n  - Improved image editing quality and better instruction-following performance.\n  Quantitative evaluation on GEdit-Bench-EN (Full set). G_SC, G_PQ, and G_O refer to the metrics evaluated by GPT-4.1, while Q_SC, Q_PQ, and Q_O refer to the metrics evaluated by Qwen2.5-VL-72B. To facilitate reproducibility, we have released the [intermediate results](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FShiyu95\u002Fgedit_results) of our model evaluations.\n    |     Models    |     G_SC ⬆️   |  G_PQ ⬆️ | G_O ⬆️   |  Q_SC ⬆️ | Q_PQ ⬆️   |  Q_O ⬆️ |\n    |:------------:|:------------:|:------------:| :------------:|:------------:| :------------:|:------------:|\n    | Step1X-Edit (v1.0)  |    7.13   | 7.00 |   6.44   | 7.39 |    7.28   | 7.07 | \n    | Step1X-Edit (v1.1)  |    7.66   | 7.35 |   6.97   | 7.65 |    7.41   | 7.35 | \n* Jun 17, 2025: 👋 Support for Teacache and parallel inference has been added.\n* May 22, 2025: 👋 Step1X-Edit now supports Lora finetuning on a single 24GB GPU now! A hand-fixing Lora for anime characters has also been released. [Download Lora](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FStep1X-Edit)\n* Apr 30, 2025: 🎉 Step1X-Edit ComfyUI Plugin is available now, thanks for the community contribution! [quank123wip\u002FComfyUI-Step1X-Edit](https:\u002F\u002Fgithub.com\u002Fquank123wip\u002FComfyUI-Step1X-Edit) & [raykindle\u002FComfyUI_Step1X-Edit](https:\u002F\u002Fgithub.com\u002Fraykindle\u002FComfyUI_Step1X-Edit).\n* Apr 27, 2025: 🎉 With community support, we update the inference code and model weights of Step1X-Edit-FP8. [meimeilook\u002FStep1X-Edit-FP8](https:\u002F\u002Fhuggingface.co\u002Fmeimeilook\u002FStep1X-Edit-FP8) & [rkfg\u002FStep1X-Edit-FP8](https:\u002F\u002Fhuggingface.co\u002Frkfg\u002FStep1X-Edit-FP8).\n* Apr 26, 2025: 🎉 Step1X-Edit is now live — you can try editing images directly in the online demo! [Online Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fstepfun-ai\u002FStep1X-Edit)\n* Apr 25, 2025: 👋 We release the evaluation code and benchmark data of Step1X-Edit. [Download GEdit-Bench](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fstepfun-ai\u002FGEdit-Bench)\n* Apr 25, 2025: 👋 We release the inference code and model weights of Step1X-Edit. [ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002Fstepfun-ai\u002FStep1X-Edit) & [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FStep1X-Edit) models.\n* Apr 25, 2025: 👋 We have made our technical report available as open source. [Read](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.17761)\n\n\u003C!-- ## Image Edit Demos -->\n\n\n\u003C!-- ## 📑 Open-source Plan\n- [x] Inference & Checkpoints\n- [x] Online demo (Gradio)\n- [x] Fine-tuning scripts\n- [x] Multi-gpus Sequence Parallel inference\n- [x] FP8 Quantified weight\n- [x] ComfyUI\n- [x] Diffusers -->\n\n\n\n## 📖 Introduction\nWe introduce a state-of-the-art image editing model, **Step1X-Edit**, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash. \nMore specifically, we adopt the Multimodal LLM to process the reference image and user's editing instruction. A latent embedding has been extracted and integrated with a diffusion image decoder to obtain  the target image. To train the model, we build a data generation pipeline to produce a high-quality dataset. \nFor evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world user instructions. Experimental results on GEdit-Bench demonstrate that Step1X-Edit outperforms existing open-source baselines by a substantial margin and approaches the performance of leading proprietary models, thereby making significant contributions to the field of image editing. \nMore details please refer to our [technical report](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.17761).\n\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"720\" alt=\"demo\" src=\"assets\u002Fimage_edit_demo.gif\">\n\u003Cp>\u003Cb>Step1X-Edit:\u003C\u002Fb> a unified image editing model performs impressively on various genuine user instructions. \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\n## ⚡️ Quick Start\n1. Make sure your `transformers==4.55.0` (we tested on this version)\n2. Install the `diffusers` package locally, according model version you want to use\n\n\n### Step1X-Edit-v1p2 (v1.2)\nInstall the `diffusers` package from the following command:\n```bash\ngit clone -b step1xedit_v1p2 https:\u002F\u002Fgithub.com\u002FPeyton-Chen\u002Fdiffusers.git\ncd diffusers\npip install -e .\n\npip install RegionE # optional, for faster inference\n```\nHere is an example for using the `Step1X-Edit-v1p2` model to edit images:\n```python\nimport torch\nfrom diffusers import Step1XEditPipelineV1P2\nfrom diffusers.utils import load_image\nfrom RegionE import RegionEHelper\n\npipe = Step1XEditPipelineV1P2.from_pretrained(\"stepfun-ai\u002FStep1X-Edit-v1p2\", torch_dtype=torch.bfloat16)\npipe.to(\"cuda\")\n\n# Import the RegionEHelper\nregionehelper = RegionEHelper(pipe)\nregionehelper.set_params()   # default hyperparameter\nregionehelper.enable()\n\nprint(\"=== processing image ===\")\nimage = load_image(\"examples\u002F0000.jpg\").convert(\"RGB\")\nprompt = \"add a ruby pendant on the girl's neck.\"\nenable_thinking_mode=True\nenable_reflection_mode=True\npipe_output = pipe(\n    image=image,\n    prompt=prompt,\n    num_inference_steps=50,\n    true_cfg_scale=6,\n    generator=torch.Generator().manual_seed(42),\n    enable_thinking_mode=enable_thinking_mode,\n    enable_reflection_mode=enable_reflection_mode,\n)\nif enable_thinking_mode:\n    print(\"Reformat Prompt:\", pipe_output.reformat_prompt)\nfor image_idx in range(len(pipe_output.images)):\n    pipe_output.images[image_idx].save(f\"0001-{image_idx}.jpg\", lossless=True)\n    if enable_reflection_mode:\n        print(pipe_output.think_info[image_idx])\n        print(pipe_output.best_info[image_idx])\npipe_output.final_images[0].save(f\"0001-final.jpg\", lossless=True)\n\nregionehelper.disable()\n```\nThe results looks like:\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"1080\" alt=\"results\" src=\"assets\u002Fv1p2_vis.jpeg\">\n\u003C\u002Fdiv>\n\n### Step1X-Edit-v1p2-preview (v1.2-preview)\nInstall the `diffusers` package from the following command:\n```bash\ngit clone -b dev\u002FMergeV1-2 https:\u002F\u002Fgithub.com\u002FPeyton-Chen\u002Fdiffusers.git\ncd diffusers\npip install -e .\n```\n\nHere is an example for using the `Step1X-Edit-v1p2-preview` model to edit images:\n\n```python\nimport torch\nfrom diffusers import Step1XEditPipelineV1P2\nfrom diffusers.utils import load_image\npipe = Step1XEditPipelineV1P2.from_pretrained(\"stepfun-ai\u002FStep1X-Edit-v1p2-preview\", torch_dtype=torch.bfloat16)\npipe.to(\"cuda\")\nprint(\"=== processing image ===\")\nimage = load_image(\"examples\u002F0000.jpg\").convert(\"RGB\")\nprompt = \"add a ruby ​​pendant on the girl's neck.\"\nenable_thinking_mode=True\nenable_reflection_mode=True\npipe_output = pipe(\n    image=image,\n    prompt=prompt,\n    num_inference_steps=28,\n    true_cfg_scale=4,\n    generator=torch.Generator().manual_seed(42),\n    enable_thinking_mode=enable_thinking_mode,\n    enable_reflection_mode=enable_reflection_mode,\n)\nif enable_thinking_mode:\n    print(\"Reformat Prompt:\", pipe_output.reformat_prompt)\nfor image_idx in range(len(pipe_output.images)):\n    pipe_output.images[image_idx].save(f\"0001-{image_idx}.jpg\", lossless=True)\n    if enable_reflection_mode:\n        print(pipe_output.think_info[image_idx])\n```\n\n\n### Step1X-Edit-v1p1 (v1.1)\nInstall the `diffusers` package from the following command:\n```bash\ngit clone -b step1xedit https:\u002F\u002Fgithub.com\u002FPeyton-Chen\u002Fdiffusers.git\ncd diffusers\npip install -e .\n```\n\nHere is an example for using the `Step1X-Edit-v1p1` model to edit images:\n```python\nimport torch\nfrom diffusers import Step1XEditPipeline\nfrom diffusers.utils import load_image\n\n\npipe = Step1XEditPipeline.from_pretrained(\"stepfun-ai\u002FStep1X-Edit-v1p1-diffusers\", torch_dtype=torch.bfloat16)\npipe.to(\"cuda\")\n\nprint(\"=== processing image ===\")\nimage = load_image(\"examples\u002F0000.jpg\").convert(\"RGB\")\nprompt = \"给这个女生的脖子上戴一个带有红宝石的吊坠。\"\nimage = pipe(\n    image=image,\n    prompt=prompt,\n    num_inference_steps=28,\n    size_level=1024,\n    guidance_scale=6.0,\n    generator=torch.Generator().manual_seed(42),\n).images[0]\nimage.save(\"0000.jpg\")\n```\n\nThe results will look like:\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"1080\" alt=\"results\" src=\"assets\u002Fresults_show.png\">\n\u003C\u002Fdiv>\n\n\n## 🌟 Advanced Usage\nWe use the original [Step1X-Edit](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FStep1X-Edit) model as an example to demonstrate some advanced uses of the model. Other versions of the model may have different inference processes.\n\n### A1. Requirements\nWe test our model using torch==2.3.1 and torch==2.5.1 with cuda-12.1.\nInstall requirements:\n  \n``` bash\npip install -r requirements.txt\n```\n\nInstall [`flash-attn`](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention), here we provide a script to help find the pre-built wheel suitable for your system. \n    \n```bash\npython scripts\u002Fget_flash_attn.py\n```\n\nThe script will generate a wheel name like `flash_attn-2.7.2.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl`, which could be found in [the release page of flash-attn](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases).\n\nThen you can download the corresponding pre-built wheel and install it following the instructions in [`flash-attn`](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention).\n\n\n\n### A2. Reduce GPU Memory Usage\nYou can use the following scripts to edit images with reduced GPU memory usage.\n\n```\nbash scripts\u002Frun_examples.sh\n```\nThe default script runs the inference code with non-quantified weights. If you want to save the GPU memory usage, you can 1)  set the `--quantized` flag in the script, which will quantify the weights to fp8, or 2) set the `--offload` flag in the script to offload some modules to CPU.\n\nThe following table shows the GPU Memory Usage and speed for running Step1X-Edit model (batch size = 1, with cfg) with different configurations:\n\n|     Model    |     Peak GPU Memory (512 \u002F 786 \u002F 1024)  | 28 steps w flash-attn(512 \u002F 786 \u002F 1024) |\n|:------------:|:------------:|:------------:|\n| Step1X-Edit   |                42.5GB \u002F 46.5GB \u002F 49.8GB  | 5s \u002F 11s \u002F 22s |\n| Step1X-Edit (FP8)   |             31GB \u002F 31.5GB \u002F 34GB     | 6.8s \u002F 13.5s \u002F 25s | \n| Step1X-Edit (offload)   |       25.9GB \u002F 27.3GB \u002F 29.1GB | 49.6s \u002F 54.1s \u002F 63.2s |\n| Step1X-Edit (FP8 + offload)   |   18GB \u002F 18GB \u002F 18GB | 35s \u002F 40s \u002F 51s |\n\n* The model is tested on one H800 GPU.\n* We recommend to use GPUs with 80GB of memory for better generation quality and efficiency.\n\n\n### A3. Multi-GPU inference\nFor multi-GPU inference, you can use the following script:\n```\nbash scripts\u002Frun_examples_parallel.sh\n```\nYou can change the number of GPUs (`GPU`), the configuration of xDiT (`--ulysses_degree` or `--ring_degree` or `--cfg_degree`), and whether to enable TeaCache acceleration (`--teacache`) in the script.\nThe table below presents the speedup of several efficient methods on the Step1X-Edit model.\n\n|     Model    |     Peak GPU Memory   |  28 steps |\n|:------------:|:------------:|:------------:|\n| Step1X-Edit + TeaCache     |    49.6GB   | 16.78s | \n| Step1X-Edit + xDiT (GPU=2) |    50.2GB   | 12.81s |\n| Step1X-Edit + xDiT (GPU=4) |    52.9GB   | 8.17s |\n| Step1X-Edit + TeaCache + xDiT (GPU=2)  |  50.7GB    | 8.94s |\n| Step1X-Edit + TeaCache + xDiT (GPU=4)  |  54.2GB |  5.82s |\n\n* The model was tested on H800 series GPUs with a resolution of 1024.\n* TeaCache's default threshold of 0.2 provides a good balance between efficiency and performance.\n* xDiT employs both CFG Parallelism and Ring Attention when using 4 GPUs, but only utilizes CFG Parallelism when operating with 2 GPUs.\n\nThis default script runs the inference code on example inputs. The results will look like:\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"1080\" alt=\"results\" src=\"assets\u002Fefficient_teasar.png\">\n\u003C\u002Fdiv>\n\n\n\u003C!-- ### 2.4 Gradio Scripts\n\nChange the `model_path` in `gradio_app.py` to the local path of Step1X-Edit. Then run\n\n```bash\npython gradio_app.py\n```\n\nThen the gradio demo will run on `localhost:32800`. -->\n\n\n\n\n\n### A4. Finetuning\n#### Lora training script\n\nHere is the the GPU memory cost during training with lora rank as 64 and batchsize as 1:\n\n|     Precision of DiT    |     bf16 (512 \u002F 786 \u002F 1024)  | fp8 (512 \u002F 786 \u002F 1024) |\n|:------------:|:------------:|:------------:|\n| GPU Memory   |                29.7GB \u002F 31.6GB \u002F 33.8GB  | 19.8GB \u002F 21.3GB \u002F 23.6GB |\n\nThe script `.\u002Fscripts\u002Ffinetuning.sh` shows how to fine-tune the Step1X-Edit model. With our default strategy, it is possible to fine-tune Step1X-Edit with 1024 resolution on a single 24GB GPU. Our fine-tuning script is adapted from  [kohya-ss\u002Fsd-scripts](https:\u002F\u002Fgithub.com\u002Fkohya-ss\u002Fsd-scripts).\n\n```bash\nbash .\u002Fscripts\u002Ffinetuning.sh\n```\n\nThe custom dataset is organized by `.\u002Flibrary\u002Fdata_configs\u002Fstep1x_edit.toml`. Here `metadata_file` contains all the training sampels, including the absolute paths of source images, absolute paths of target images and instructions.\n\nThe `metadata_file` should be a json file containing a dict as follows:\n\n```\n{\n  \u003Ctarget image path, str>: {\n    'ref_image_path': \u003Csource image path, str>\n    'caption': \u003Cthe editing instruction, str>\n  }, \n  ...\n}\n```\n\n#### Inference with Lora\nTo inference with Lora, simply add `--lora \u003Cpath to your lora weights>` when using `inference.py`. For example:\n\n```bash\npython inference.py --input_dir .\u002Fexamples \\\n    --model_path \u002Fdata\u002Fwork_dir\u002Fstep1x-edit\u002F \\\n    --json_path .\u002Fexamples\u002Fprompt_cn.json \\\n    --output_dir .\u002Foutput_cn \\\n    --seed 1234 --size_level 1024 \\\n    --lora 20250521_001-lora256-alpha128-fix-hand-per-epoch\u002Fstep1x-edit_test.safetensors\n```\n\nHere is an example for our [pretrained Lora weights](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FStep1X-Edit\u002Ftree\u002Fmain\u002Flora), which is designed for fixing corrupted hands of anime characters.\n\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"1080\" alt=\"results\" src=\"assets\u002Flora_teaser.png\">\n\u003C\u002Fdiv>\n\nTo reproduce the cases above, you can run the following scripts:\n```bash \nbash scripts\u002Frun_examples_fix_hand.sh\n```\n\n\n## 📊 Benchmark\nWe release [GEdit-Bench](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fstepfun-ai\u002FGEdit-Bench) as a new benchmark, grounded in real-world usages is developed to support more authentic and comprehensive evaluation. This benchmark, which is carefully curated to reflect actual user editing needs and a wide range of editing scenarios, enables more authentic and comprehensive evaluations of image editing models.\nThe evaluation process and related code can be found in [GEdit-Bench\u002FEVAL.md](GEdit-Bench\u002FEVAL.md). Part results of the benchmark are shown below:\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"1080\" alt=\"results\" src=\"assets\u002Feval_res_en.png\">\n\u003C\u002Fdiv>\n\n\n## 🧩 Community Contributions\n\nIf you develop\u002Fuse Step1X-Edit in your projects, welcome to let us know 🎉.\n\n- A detailed introduction blog of Step1X-Edit: [Step1X-Edit执行流程](https:\u002F\u002Fliwenju0.com\u002Fposts\u002FStep1X-Edit%E6%89%A7%E8%A1%8C%E6%B5%81%E7%A8%8B-%E4%B8%80.html) by [liwenju0](https:\u002F\u002Fliwenju0.com\u002Fabout.html)\n- FP8 model weights: [meimeilook\u002FStep1X-Edit-FP8](https:\u002F\u002Fhuggingface.co\u002Fmeimeilook\u002FStep1X-Edit-FP8) by [meimeilook](https:\u002F\u002Fhuggingface.co\u002Fmeimeilook);  [rkfg\u002FStep1X-Edit-FP8](https:\u002F\u002Fhuggingface.co\u002Frkfg\u002FStep1X-Edit-FP8) by [rkfg](https:\u002F\u002Fhuggingface.co\u002Frkfg)\n- Step1X-Edit ComfyUI Plugin: [quank123wip\u002FComfyUI-Step1X-Edit](https:\u002F\u002Fgithub.com\u002Fquank123wip\u002FComfyUI-Step1X-Edit) by [quank123wip](https:\u002F\u002Fgithub.com\u002Fquank123wip); [raykindle\u002FComfyUI_Step1X-Edit](https:\u002F\u002Fgithub.com\u002Fraykindle\u002FComfyUI_Step1X-Edit) by [raykindle](https:\u002F\u002Fgithub.com\u002Fraykindle)\n- Training scripts: [hobart07\u002FStep1X-Edit_train](https:\u002F\u002Fgithub.com\u002Fhobart07\u002FStep1X-Edit_train) by [hobart07](https:\u002F\u002Fgithub.com\u002Fhobart07)\n\n## 📚 Citation\nIf you find the Step1X-Edit series helpful for your research or applications, please consider ⭐ starring the repository and citing our paper.\n```\n@article{yin2025reasonedit,\n  title={ReasonEdit: Towards Reasoning-Enhanced Image Editing Models}, \n  author={Fukun Yin, Shiyu Liu, Yucheng Han, Zhibo Wang, Peng Xing, Rui Wang, Wei Cheng, Yingming Wang, Aojie Li, Zixin Yin, Pengtao Chen, Xiangyu Zhang, Daxin Jiang, Xianfang Zeng, Gang Yu},\n  journal={arXiv preprint arXiv:2511.22625},\n  year={2025}\n}\n\n@article{wu2025kris,\n  title={KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models},\n  author={Wu, Yongliang and Li, Zonghui and Hu, Xinting and Ye, Xinyu and Zeng, Xianfang and Yu, Gang and Zhu, Wenbo and Schiele, Bernt and Yang, Ming-Hsuan and Yang, Xu},\n  journal={arXiv preprint arXiv:2505.16707},\n  year={2025}\n}\n\n@article{liu2025step1x-edit,\n  title={Step1X-Edit: A Practical Framework for General Image Editing}, \n  author={Shiyu Liu and Yucheng Han and Peng Xing and Fukun Yin and Rui Wang and Wei Cheng and Jiaqi Liao and Yingming Wang and Honghao Fu and Chunrui Han and Guopeng Li and Yuang Peng and Quan Sun and Jingwei Wu and Yan Cai and Zheng Ge and Ranchen Ming and Lei Xia and Xianfang Zeng and Yibo Zhu and Binxing Jiao and Xiangyu Zhang and Gang Yu and Daxin Jiang},\n  journal={arXiv preprint arXiv:2504.17761},\n  year={2025}\n}\n\n```\n\n## Acknowledgement\nWe would like to express our sincere thanks to the contributors of [Kohya](https:\u002F\u002Fgithub.com\u002Fkohya-ss\u002Fsd-scripts\u002Ftree\u002Fsd3), [SD3](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-3-medium), [FLUX](https:\u002F\u002Fgithub.com\u002Fblack-forest-labs\u002Fflux), [Qwen](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen2.5), [xDiT](https:\u002F\u002Fgithub.com\u002Fxdit-project\u002FxDiT), [TeaCache](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FTeaCache), [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) and [HuggingFace](https:\u002F\u002Fhuggingface.co) teams, for their open research and exploration.\n\n\n## Disclaimer\nThe results produced by this image editing model are entirely determined by user input and actions. The development team and this open-source project are not responsible for any outcomes or consequences arising from its use.\n\n## LICENSE\nStep1X-Edit is licensed under the Apache License 2.0. You can find the license files in the respective github and  HuggingFace repositories.\n","Step1X-Edit 是一个开源的图像编辑模型，旨在提供与GPT-4o和Gemini 2 Flash等闭源模型相媲美的性能。该项目使用Python开发，具备先进的视觉推理能力，能够实现高质量的图像生成与编辑任务，并在多项基准测试中表现出色。通过引入RegionE技术，Step1X-Edit还实现了显著的推理速度提升，同时保持了高精度。该模型适用于需要快速响应和高质量输出的实时交互式图像创作场景，如在线设计工具、内容生成平台以及科研领域中的图像处理任务。",2,"2026-06-11 03:42:31","high_star"]