[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83131":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":14,"stars7d":15,"stars30d":15,"stars90d":13,"forks30d":13,"starsTrendScore":16,"compositeScore":17,"rankGlobal":8,"rankLanguage":8,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":13,"starSnapshotCount":13,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},83131,"Crafter","HaozheZhao\u002FCrafter","HaozheZhao",null,"Python",105,8,1,0,6,40,22,75.86,"MIT License",false,"main",true,[],"2026-06-12 04:01:40","\u003Cdiv align=\"center\">\n\n# Crafter\n\n**A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs**\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-coming%20soon-b31b1b?style=for-the-badge&logo=arxiv&logoColor=white)](#)\n[![HuggingFace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20HuggingFace-CraftBench-orange?style=for-the-badge)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBleachNick\u002FCraftBench)\n\nHaozhe Zhao, Shuzheng Si, Zhenhailong Wang, Zheng Wang, Liang Chen,\nXiaotong Li, Zhixiang Liang, Maosong Sun, Minjia Zhang\n\n\u003C\u002Fdiv>\n\n---\n\nScientific figures are structured compositions of discrete semantic components,\nso the localized errors image generators make on such layouts call not for a\nstronger backbone but for a *harness* around it. We instantiate this idea in\ntwo complementary systems that share one design:\n\n- **Crafter** — a multi-agent harness for figure **generation** that\n  generalizes across figure types (academic figures, posters, infographics)\n  and input conditions (text-to-image, mask completion, key-element\n  composition, sketch refinement) without architectural changes.\n- **CraftEditor** — applies the same harness pattern to convert raster\n  outputs into **coordinate-faithful editable SVGs**.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fcrafter_architecture.png\" width=\"92%\">\n  \u003Cbr>\u003Csub>\u003Cb>Figure 1.\u003C\u002Fb> The Crafter generation harness.\u003C\u002Fsub>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Feditable_output_pipeline.png\" width=\"92%\">\n  \u003Cbr>\u003Csub>\u003Cb>Figure 2.\u003C\u002Fb> The CraftEditor raster-to-SVG pipeline.\u003C\u002Fsub>\n\u003C\u002Fp>\n\nWe also release **CraftBench** — 279 samples spanning three figure types and\nfour input conditions, each with a human-drawn target.\n\n## 🛠️ Setup\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FHaozheZhao\u002FCrafter.git\ncd Crafter\npip install -e .\nexport OPENROUTER_API_KEY=\"sk-or-...\"\n```\n\nAll chat \u002F VLM \u002F image calls go through a single OpenAI-compatible endpoint\n(OpenRouter). The role mapping lives in\n[`configs\u002Fdefault.yaml`](configs\u002Fdefault.yaml).\n\nCraftEditor additionally needs a text-prompted SAM3 grounding server. Start\none on any machine with a CUDA-capable GPU:\n\n```bash\n# 1. Install the official SAM3 package\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsam3 && cd sam3\npip install -e . && pip install timm ftfy iopath portalocker flask\n\n# 2. Run a small Flask wrapper that exposes \u002Fhealth, \u002Fsegment_text, \u002Fsegment_points\npython sam3_server.py --port 8765 --host 0.0.0.0\n\n# 3. Point Crafter at the server\nexport SAM3_SERVER_URL=\"http:\u002F\u002F\u003Chost>:8765\"\n```\n\nCraftEditor requires the SAM3 server. If you do not run one, use only the\ngeneration half (the commands below).\n\n## 🚀 Quick start\n\nThe bundled [`examples\u002F`](examples\u002F) folder has inputs for three end-to-end\nruns. Cases #1 and #3 share a SceneSelect figure (CraftEditor's top-scoring\ncase in Figure 3); case #2 is the NC-TTT poster inpaint case from Figure 3.\n\nAll three commands use the same task templates the CraftBench evaluation\nscript feeds the model, so end-to-end behaviour matches benchmark runs:\n\n```bash\n# 1. Text-to-image — generate the method figure from text only.\npython demo.py --paper examples\u002Fsample_paper.txt \\\n               --instruction-file examples\u002Fsample_instruction_t2i.txt \\\n               --out examples_out\u002Ffigure.png\n\n# 2. Mask completion (inpaint) — fill the blanked-out 'Methodology' column of the poster.\npython demo.py --paper examples\u002Fsample_inpaint_paper.txt \\\n               --instruction-file examples\u002Fsample_instruction_inpaint.txt \\\n               --reference examples\u002Fsample_inpaint_input.png \\\n               --out examples_out\u002Ffigure_inpainted.png\n\n# 3. Convert a raster figure into an editable SVG.\npython convert.py --img examples\u002Fsample_figure.png --out-dir examples_out\u002Feditable\u002F\n```\n\n## ✏️ Generation\n\n```bash\ncrafter generate --caption \"Figure 1: Overall workflow of our method.\" \\\n                 --paper-text-file paper.txt --out figure.png\n```\n\nAdd `--reference sketch.png` to condition on a sketch, partial figure, or icon\ncollage.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Use a paper PDF instead of a plain-text extract \u003Ci>(beta)\u003C\u002Fi>.\u003C\u002Fb>\u003C\u002Fsummary>\n\n`demo.py` also accepts a PDF as the `--paper` argument; text is extracted via\n``pypdf``. LaTeX-rendered PDFs work cleanly; scanned PDFs and dense two-column\nlayouts may need manual text extraction first. We recommend the plain-text path\nabove for reproducible runs.\n\n```bash\npython demo.py --paper paper.pdf --instruction \"...\" --out figure.png\n```\n\n\u003C\u002Fdetails>\n\n## 🎨 Editable conversion (CraftEditor)\n\nDefault pipeline: \u003Cb>extraction (gpt-image-2)\u003C\u002Fb> → \u003Cb>grounding (SAM3)\u003C\u002Fb> →\n\u003Cb>composition\u003C\u002Fb>.\n\n```bash\n# the bundled figure CraftEditor scores highest on in Figure 3.\npython convert.py --img examples\u002Fsample_figure.png --out-dir examples_out\u002Feditable\u002F\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Feditor_qualitative.png\" width=\"92%\">\n  \u003Cbr>\u003Csub>\u003Cb>Figure 3.\u003C\u002Fb> CraftEditor (rightmost column) versus Edit-Banana and AutoFigure-Edit on five representative cases. \u003Ccode>examples\u002Fsample_figure.png\u003C\u002Fcode> is the input raster of the top row (academic \u002F t2i, the highest-scoring case).\u003C\u002Fsub>\n\u003C\u002Fp>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Skip the gpt-image-2 extraction phase (SAM-only).\u003C\u002Fb>\u003C\u002Fsummary>\n\n`--sam-only` passes the raster straight to SAM3 grounding, bypassing\ngpt-image-2 icon extraction. Trades quality for speed and skips one external\nprovider dependency.\n\n```bash\npython convert.py --img figure.png --out-dir editable\u002F --sam-only\n```\n\n\u003C\u002Fdetails>\n\n## 📊 CraftBench\n\n**CraftBench** — 279 samples spanning three figure types and four input\nconditions, each with a human-drawn target. The dataset lives on the\n[HuggingFace Hub](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBleachNick\u002FCraftBench)\nand is downloaded automatically by both `inference.py` and `run_eval`. The\n[`craftbench\u002F`](craftbench\u002F) folder in this repo bundles three illustrative\nsamples (one per task) plus the evaluation scripts.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fcraftbench_examples.png\" width=\"92%\">\n  \u003Cbr>\u003Csub>\u003Cb>Figure 4.\u003C\u002Fb> Sample tasks from CraftBench.\u003C\u002Fsub>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fcraftbench_distribution.png\" width=\"78%\">\n  \u003Cbr>\u003Csub>\u003Cb>Figure 5.\u003C\u002Fb> CraftBench distribution by figure type and input condition.\u003C\u002Fsub>\n\u003C\u002Fp>\n\n### Run inference + evaluation\n\n```bash\n# 1. Generate Crafter outputs over the bench (writes \u003Cid>.png per sample).\npython inference.py --bench craftbench --out runs\u002Fcrafter_cb\n\n# 2. Score against the human-drawn targets (referenced VLM judge via OpenRouter).\npython -m craftbench.evaluation.run_eval --runs runs\u002Fcrafter_cb --out cb.json\n```\n\n`run_eval` reports an overall win-rate and a per-task breakdown.\n\n## ⚙️ Configuration\n\nThree model slots in [`configs\u002Fdefault.yaml`](configs\u002Fdefault.yaml):\n\n| Slot | Default |\n| :--- | :--- |\n| `llm` | `anthropic\u002Fclaude-opus-4.6` |\n| `vlm` | `google\u002Fgemini-3.1-pro-preview` |\n| `generator` | `google\u002Fgemini-3-pro-image-preview` (Nano Banana Pro) |\n\n`OPENROUTER_API_KEY` is the only required secret; the YAML never holds keys.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Use \u003Ccode>gpt-image-2\u003C\u002Fcode> instead of Nano Banana Pro.\u003C\u002Fb>\u003C\u002Fsummary>\n\n`gpt-image-2` produces sharper text and supports arbitrary pixel resolutions,\nbut on OpenRouter it is rate-limited and clamped to a small enum\n(`aspect_ratio` ∈ `{1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9}`,\n`image_size` ∈ `{1K, 2K}`).\n\nWe recommend deploying `gpt-image-2` on your own **Azure OpenAI** resource and\nexporting the four standard variables — when all four are set, gpt-image-2\ncalls bypass OpenRouter and go straight to Azure (everything else keeps\nusing OpenRouter):\n\n```bash\nexport AZURE_OPENAI_ENDPOINT=\"https:\u002F\u002F\u003Cyour-resource>.openai.azure.com\"\nexport AZURE_OPENAI_API_KEY=\"\u003Cyour-key>\"\nexport AZURE_OPENAI_DEPLOYMENT=\"\u003Cyour-deployment-name>\"\n# optional: override the api version (default: 2025-04-01-preview)\n# export AZURE_OPENAI_API_VERSION=\"2025-04-01-preview\"\n# optional: force an exact pixel size (overrides the aspect map)\n# export CRAFTER_AZURE_IMAGE_SIZE=\"1024x512\"\n```\n\nThen point the `generator` slot at `gpt-image-2` in\n`configs\u002Fdefault.yaml`:\n\n```yaml\ngenerator: openai\u002Fgpt-5.4-image-2\n```\n\nIf you do not have Azure, you can still use OpenRouter for `gpt-image-2` by\nswapping the `generator` slot to `openai\u002Fgpt-5.4-image-2` — outputs will be\nclamped to the enum above and may rate-limit under load.\n\n\u003C\u002Fdetails>\n\n## 📁 Repo layout\n\n```\nCrafter\u002F\n├── crafter\u002F{generation, editor, shared}\u002F    # the package\n├── craftbench\u002F                              # 279-sample bench + self-contained eval\n├── configs\u002Fdefault.yaml                     # 3-slot model config\n├── demo.py · convert.py · inference.py      # entry-point scripts\n├── examples\u002F                                # sample paper PDF + sketch ref\n├── assets\u002F                                  # paper figures\n└── README · pyproject · requirements · LICENSE\n```\n\n## 📑 Citation\n\n```bibtex\n@article{zhao_crafter,\n  title  = {Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs},\n  author = {Zhao, Haozhe and Si, Shuzheng and Wang, Zhenhailong and Wang, Zheng\n            and Chen, Liang and Li, Xiaotong and Liang, Zhixiang and Sun, Maosong\n            and Zhang, Minjia},\n}\n```\n\n## 📄 License\n\nMIT — see [LICENSE](LICENSE).\n","Crafter 是一个基于多代理架构的科学图表生成工具，能够从多种输入条件生成可编辑的科学图表。其核心功能包括跨图表类型（如学术图表、海报、信息图）和输入条件（文本到图像、遮罩补全、关键元素组合、草图细化）进行通用生成，无需更改架构。此外，配套的CraftEditor系统可以将光栅输出转换为坐标保真的可编辑SVG格式。该项目使用Python编写，并通过OpenRouter API与外部服务交互，适合需要灵活且高质量科学图表生成及编辑的科研人员或设计师使用。",2,"2026-06-11 04:10:14","CREATED_QUERY"]