[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74239":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":46,"readmeContent":47,"aiSummary":48,"trendingCount":16,"starSnapshotCount":16,"syncStatus":49,"lastSyncTime":50,"discoverSource":51},74239,"paperbanana","llmsresearch\u002Fpaperbanana","llmsresearch","Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.","",null,"Python",1971,291,13,15,0,172,186,563,516,20.4,"MIT License",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,5,40,41,42,43,44,45],"academic-diagrams","academic-research","agentic-ai","arxiv","diagram-generation","gemini","google-gemini","llm","llms","mcp","mcp-server","multiagent","neurips","python-ai-research-tools","research-automation","research-tools","scientific-visualization","text-to-image","vlm","2026-06-12 02:03:24","\u003C!-- mcp-name: io.github.llmsresearch\u002Fpaperbanana -->\n\u003Ctable align=\"center\" width=\"100%\" style=\"border: none; border-collapse: collapse;\">\n  \u003Ctr>\n    \u003Ctd width=\"220\" align=\"left\" valign=\"middle\" style=\"border: none;\">\n      \u003Cimg src=\"https:\u002F\u002Fdwzhu-pku.github.io\u002FPaperBanana\u002Fstatic\u002Fimages\u002Flogo.jpg\" alt=\"PaperBanana Logo\" width=\"180\"\u002F>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"left\" valign=\"middle\" style=\"border: none;\">\n      \u003Ch1>PaperBanana\u003C\u002Fh1>\n      \u003Cp>\u003Cstrong>Automated Academic Illustration for AI Scientists\u003C\u002Fstrong>\u003C\u002Fp>\n      \u003Cp>\n        \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fllmsresearch\u002Fpaperbanana\u002Factions\u002Fworkflows\u002Fci.yml\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fllmsresearch\u002Fpaperbanana\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg\" alt=\"CI\"\u002F>\u003C\u002Fa>\n        \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fpaperbanana\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fpaperbanana?label=PyPI%20downloads&logo=pypi&logoColor=white\" alt=\"PyPI Downloads\"\u002F>\u003C\u002Fa>\n        \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fllmsresearch\u002Fpaperbanana\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-HuggingFace-yellow?logo=huggingface&logoColor=white\" alt=\"Demo\"\u002F>\u003C\u002Fa>\n        \u003Cbr\u002F>\n        \u003Ca href=\"https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10%2B-blue?logo=python&logoColor=white\" alt=\"Python 3.10+\"\u002F>\u003C\u002Fa>\n        \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.23265\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2601.23265-b31b1b?logo=arxiv&logoColor=white\" alt=\"arXiv\"\u002F>\u003C\u002Fa>\n        \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green?logo=opensourceinitiative&logoColor=white\" alt=\"License: MIT\"\u002F>\u003C\u002Fa>\n        \u003Cbr\u002F>\n        \u003Ca href=\"https:\u002F\u002Fpydantic.dev\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPydantic-v2-e92063?logo=pydantic&logoColor=white\" alt=\"Pydantic v2\"\u002F>\u003C\u002Fa>\n        \u003Ca href=\"https:\u002F\u002Ftyper.tiangolo.com\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCLI-Typer-009688?logo=gnubash&logoColor=white\" alt=\"Typer\"\u002F>\u003C\u002Fa>\n        \u003Ca href=\"https:\u002F\u002Fai.google.dev\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGemini-Free%20Tier-4285F4?logo=google&logoColor=white\" alt=\"Gemini Free Tier\"\u002F>\u003C\u002Fa>\n      \u003C\u002Fp>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n> **Disclaimer**: This is an **unofficial, community-driven open-source implementation** of the paper\n> *\"PaperBanana: Automating Academic Illustration for AI Scientists\"* by Dawei Zhu, Rui Meng, Yale Song,\n> Xiyu Wei, Sujian Li, Tomas Pfister, and Jinsung Yoon ([arXiv:2601.23265](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.23265)).\n> This project is **not affiliated with or endorsed by** the original authors or Google Research.\n> The implementation is based on the publicly available paper and may differ from the original system.\n\nAn agentic framework for generating publication-quality academic diagrams and statistical plots from text descriptions. Supports OpenAI (GPT-5.2 + GPT-Image-1.5), Azure OpenAI \u002F Foundry, and Google Gemini providers.\n\n- Two-phase multi-agent pipeline with iterative refinement\n- Multiple VLM and image generation providers (OpenAI, Azure, Gemini)\n- Input optimization layer for better generation quality\n- Auto-refine mode and run continuation with user feedback\n- CLI, Python API, and MCP server for IDE integration\n- **Batch generation** from a manifest file (YAML\u002FJSON) for multiple diagrams in one run\n- **Batch plots** — `paperbanana plot-batch` runs many statistical plots from one manifest (CSV\u002FJSON per item)\n- **PDF inputs** for methodology context (optional `paperbanana[pdf]` \u002F PyMuPDF), with per-page selection\n- **PaperBanana Studio** — local Gradio web UI (`paperbanana studio`) for diagrams, plots, evaluation, batch, and run browser\n- Claude Code skills for `\u002Fgenerate-diagram`, `\u002Fgenerate-plot`, and `\u002Fevaluate-diagram`\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fimg\u002Fhero_image.png\" alt=\"PaperBanana takes paper as input and provide diagram as output\" style=\"max-width: 960px; width: 100%; height: auto;\"\u002F>\n\u003C\u002Fp>\n\n---\n\n## Quick Start\n\n### Prerequisites\n\n- Python 3.10+\n- An OpenAI API key ([platform.openai.com](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys)) or Azure OpenAI \u002F Foundry endpoint\n- Or a Google Gemini API key (free, [Google AI Studio](https:\u002F\u002Fmakersuite.google.com\u002Fapp\u002Fapikey))\n\n### Step 1: Install\n\n```bash\npip install paperbanana\n```\n\nOr install from source for development:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fllmsresearch\u002Fpaperbanana.git\ncd paperbanana\npip install -e \".[dev,openai,google]\"\n```\n\n### Step 2: Get Your API Key\n\n```bash\ncp .env.example .env\n# Edit .env and add your API key:\n#   OPENAI_API_KEY=your-key-here\n#   GOOGLE_API_KEY=your-key-here\n#\n# For Azure OpenAI \u002F Foundry:\n#   OPENAI_BASE_URL=https:\u002F\u002F\u003Cresource>.openai.azure.com\u002Fopenai\u002Fv1\n#\n# Optional Gemini overrides:\n#   GOOGLE_BASE_URL=https:\u002F\u002Fyour-gemini-proxy.example.com\n#   GOOGLE_VLM_MODEL=gemini-2.0-flash\n#   GOOGLE_IMAGE_MODEL=gemini-3-pro-image-preview\n```\n\nOr use the setup wizard for Gemini:\n\n```bash\npaperbanana setup\n```\n\n### Step 3: Generate a Diagram\n\n```bash\npaperbanana generate \\\n  --input examples\u002Fsample_inputs\u002Ftransformer_method.txt \\\n  --caption \"Overview of our encoder-decoder architecture with sparse routing\"\n```\n\nWith input optimization and auto-refine:\n\n```bash\npaperbanana generate \\\n  --input my_method.txt \\\n  --caption \"Overview of our encoder-decoder framework\" \\\n  --optimize --auto\n```\n\nOutput is saved to `outputs\u002Frun_\u003Ctimestamp>\u002Ffinal_output.png` along with all intermediate iterations and metadata.\n\n### PaperBanana Studio (local web UI)\n\nInstall the optional Gradio dependency, then start the app:\n\n```bash\npip install 'paperbanana[studio]'\npaperbanana studio\n```\n\nOpen the URL shown in the terminal (default `http:\u002F\u002F127.0.0.1:7860\u002F`). The Studio exposes the same workflows as the CLI: methodology diagrams, statistical plots, comparative evaluation, continuing a prior run, batch manifests (methodology or **plot** batch via the Batch tab), and a simple browser for `run_*` \u002F `batch_*` output folders. Use `--host`, `--port`, `--config`, and `--output-dir` as needed.\n\n---\n\n## How It Works\n\nPaperBanana implements a multi-agent pipeline with up to 7 specialized agents:\n\n**Phase 0 -- Input Optimization (optional, `--optimize`):**\n\n0. **Input Optimizer** runs two parallel VLM calls:\n   - **Context Enricher** structures raw methodology text into diagram-ready format (components, flows, groupings, I\u002FO)\n   - **Caption Sharpener** transforms vague captions into precise visual specifications\n\n**Phase 1 -- Linear Planning:**\n\n1. **Retriever** selects the most relevant reference examples from a curated set of 13 methodology diagrams spanning agent\u002Freasoning, vision\u002Fperception, generative\u002Flearning, and science\u002Fapplications domains\n2. **Planner** generates a detailed textual description of the target diagram via in-context learning from the retrieved examples\n3. **Stylist** refines the description for visual aesthetics using NeurIPS-style guidelines (color palette, layout, typography)\n\n**Phase 2 -- Iterative Refinement:**\n\n4. **Visualizer** renders the description into an image\n5. **Critic** evaluates the generated image against the source context and provides a revised description addressing any issues\n6. Steps 4-5 repeat for a fixed number of iterations (default 3), or until the critic is satisfied (`--auto`)\n\n## Providers\n\nPaperBanana supports multiple VLM and image generation providers:\n\n| Component | Provider | Model | Notes |\n|-----------|----------|-------|-------|\n| VLM (planning, critique) | OpenAI | `gpt-5.2` | Default |\n| Image Generation | OpenAI | `gpt-image-1.5` | Default |\n| VLM | Google Gemini | `gemini-2.0-flash` | Free tier |\n| Image Generation | Google Gemini | `gemini-3-pro-image-preview` | Free tier |\n| VLM \u002F Image | OpenRouter | Any supported model | Flexible routing |\n\nAzure OpenAI \u002F Foundry endpoints are auto-detected — set `OPENAI_BASE_URL` to your endpoint.\nGemini-compatible gateways are also supported — set `GOOGLE_BASE_URL` when needed.\n\n---\n\n## CLI Reference\n\n### `paperbanana generate` -- Methodology Diagrams\n\n```bash\n# Basic generation\npaperbanana generate \\\n  --input method.txt \\\n  --caption \"Overview of our framework\"\n\n# With input optimization and auto-refine\npaperbanana generate \\\n  --input method.txt \\\n  --caption \"Overview of our framework\" \\\n  --optimize --auto\n\n# Continue the latest run with user feedback\npaperbanana generate --continue \\\n  --feedback \"Make arrows thicker and colors more distinct\"\n\n# Continue a specific run\npaperbanana generate --continue-run run_20260218_125448_e7b876 \\\n  --iterations 3\n\n# PDF as input (install PyMuPDF: pip install 'paperbanana[pdf]')\npaperbanana generate \\\n  --input paper.pdf \\\n  --caption \"Overview of our method\" \\\n  --pdf-pages \"3-8\"\n```\n\n| Flag | Short | Description |\n|------|-------|-------------|\n| `--input` | `-i` | Path to methodology text file or PDF (required for new runs) |\n| `--caption` | `-c` | Figure caption \u002F communicative intent (required for new runs) |\n| `--output` | `-o` | Output image path (default: auto-generated in `outputs\u002F`) |\n| `--iterations` | `-n` | Number of Visualizer-Critic refinement rounds (default: 3) |\n| `--auto` | | Loop until critic is satisfied (with `--max-iterations` safety cap) |\n| `--max-iterations` | | Safety cap for `--auto` mode (default: 30) |\n| `--optimize` | | Preprocess inputs with parallel context enrichment and caption sharpening |\n| `--continue` | | Continue from the latest run in `outputs\u002F` |\n| `--continue-run` | | Continue from a specific run ID |\n| `--feedback` | | User feedback for the critic when continuing a run |\n| `--pdf-pages` | | PDF input only: 1-based pages (e.g. `1-5`, `2,4,6-8`; default: all) |\n| `--vlm-provider` | | VLM provider name (default: `openai`) |\n| `--vlm-model` | | VLM model name (default: `gpt-5.2`) |\n| `--image-provider` | | Image gen provider (default: `openai_imagen`) |\n| `--image-model` | | Image gen model (default: `gpt-image-1.5`) |\n| `--format` | `-f` | Output format: `png`, `jpeg`, or `webp` (default: `png`) |\n| `--config` | | Path to YAML config file (see `configs\u002Fconfig.yaml`) |\n| `--verbose` | `-v` | Show detailed agent progress and timing |\n| `--progress-json` | | Emit JSON progress events to stdout during generation |\n\n### `paperbanana plot` -- Statistical Plots\n\n```bash\npaperbanana plot \\\n  --data results.csv \\\n  --intent \"Bar chart comparing model accuracy across benchmarks\"\n```\n\n| Flag | Short | Description |\n|------|-------|-------------|\n| `--data` | `-d` | Path to data file, CSV or JSON (required) |\n| `--intent` | | Communicative intent for the plot (required) |\n| `--output` | `-o` | Output image path |\n| `--iterations` | `-n` | Refinement iterations (default: 3) |\n\n### `paperbanana batch` -- Batch Generation\n\nGenerate multiple methodology diagrams from a single manifest file (YAML or JSON). Each item runs the full pipeline; outputs are written under `outputs\u002Fbatch_\u003Cid>\u002Frun_\u003Cid>\u002F` and a `batch_report.json` summarizes all runs.\n\n```bash\npaperbanana batch --manifest examples\u002Fbatch_manifest.yaml --optimize\n```\n\nManifest format (YAML or JSON with an `items` list):\n\n```yaml\nitems:\n  - input: path\u002Fto\u002Fmethod1.txt\n    caption: \"Overview of our encoder-decoder\"\n    id: fig1\n  - input: method2.txt\n    caption: \"Training pipeline\"\n    id: fig2\n  - input: paper.pdf\n    caption: \"System overview\"\n    id: fig3\n    pdf_pages: \"4-9\" # optional; PDF inputs only\n```\n\nPaths in the manifest are resolved relative to the manifest file's directory.\n\n**Composite figures:** Add an optional `composite` section to automatically stitch all generated panels into a single labeled figure after the batch completes:\n\n```yaml\ncomposite:\n  layout: \"1x3\"          # rows x cols, or \"auto\"\n  labels: auto            # (a), (b), (c)... or explicit list, or null\n  spacing: 20             # pixels between panels\n  label_position: bottom  # top or bottom\n  output: \"composite.png\"\n\nitems:\n  - input: method_encoder.txt\n    caption: \"Encoder architecture\"\n    id: panel_a\n  # ...\n```\n\nThe composite image is saved alongside the individual panels in the batch output directory. See `examples\u002Fcomposite_batch_manifest.yaml` for a complete example.\n\n**Generate a human-readable report** from an existing batch run (Markdown or HTML):\n\n```bash\npaperbanana batch-report --batch-dir outputs\u002Fbatch_20250109_123456_abc --format markdown\n# or by batch ID (under default output dir)\npaperbanana batch-report --batch-id batch_20250109_123456_abc --format html --output report.html\n```\n\nDiagram batch reports include `batch_kind: methodology`; plot batches use `batch_kind: statistical_plot`. Human-readable reports (`paperbanana batch-report`) show the batch kind when present.\n\n**Sweep reports** produced by `paperbanana sweep` can be rendered the same way:\n\n```bash\npaperbanana sweep-report --sweep-dir outputs\u002Fsweep_20250109_123456_abc --format html\n# or by sweep ID\npaperbanana sweep-report --sweep-id sweep_20250109_123456_abc --format markdown\n```\n\nRendered sweep reports include a summary, a top-5 ranked table, the full variants table (with per-variant provider\u002Fmodel, iterations, critic-suggestion count, proxy score, and output path), and the `quality_proxy_score` note. Dry-run reports render a simplified \"Planned Variants\" section.\n\n| Flag | Short | Description |\n|------|-------|-------------|\n| `--manifest` | `-m` | Path to manifest file (required) |\n| `--output-dir` | `-o` | Parent directory for batch run (default: outputs) |\n| `--config` | | Path to config YAML |\n| `--iterations` | `-n` | Refinement iterations per item |\n| `--optimize` | | Preprocess inputs for each item |\n| `--auto` | | Loop until critic satisfied per item |\n| `--format` | `-f` | Output image format (png, jpeg, webp) |\n| `--auto-download-data` | | Download expanded reference set if needed |\n\n### `paperbanana plot-batch` -- Batch Statistical Plots\n\nGenerate multiple plots from a manifest (YAML or JSON). Each item specifies a **data** file (CSV or JSON) and an **intent** string, mirroring `paperbanana plot`. Outputs live under `outputs\u002Fbatch_\u003Cid>\u002Frun_\u003Cid>\u002F` with the same `batch_report.json` and `paperbanana batch-report` workflow as diagram batches.\n\n```bash\npaperbanana plot-batch --manifest examples\u002Fplot_batch_manifest.yaml --optimize\n```\n\nManifest format (`items` list):\n\n```yaml\nitems:\n  - data: path\u002Fto\u002Fresults.csv\n    intent: \"Bar chart comparing accuracy across models\"\n    id: fig_acc\n  - data: other.json\n    intent: \"Scatter plot with trend line\"\n    aspect_ratio: \"16:9\"   # optional per item; CLI --aspect-ratio is the default when omitted\n```\n\nPaths are resolved relative to the manifest file’s directory.\n\n| Flag | Short | Description |\n|------|-------|-------------|\n| `--manifest` | `-m` | Path to manifest (required) |\n| `--output-dir` | `-o` | Parent directory for `batch_*` (default: outputs) |\n| `--config` | | Path to config YAML |\n| `--vlm-provider` | | VLM provider (default: gemini) |\n| `--vlm-model` | | VLM model override |\n| `--image-provider` | | Image gen provider |\n| `--image-model` | | Image gen model |\n| `--iterations` | `-n` | Refinement iterations per item |\n| `--auto` | | Loop until critic satisfied per item |\n| `--max-iterations` | | Safety cap for `--auto` |\n| `--optimize` | | Input optimization per item |\n| `--format` | `-f` | png, jpeg, or webp |\n| `--save-prompts` \u002F `--no-save-prompts` | | Persist prompts (default: on, same as `plot`) |\n| `--venue` | | Venue style (neurips, icml, acl, ieee, custom) |\n| `--aspect-ratio` | `-ar` | Default aspect ratio when not set in the manifest |\n| `--verbose` | `-v` | Verbose logging |\n\n### `paperbanana orchestrate` -- Full-Paper Figure Package\n\nGenerate a publication-focused figure bundle from a full paper source, with optional data-driven plots. The command:\n- parses the paper (`.txt`, `.md`, or `.pdf`)\n- plans multiple methodology figures from section structure\n- optionally discovers CSV\u002FJSON files to plan statistical plots\n- runs generation for all planned items\n- writes a package folder containing `figure_package.json`, `figures\u002F`, `figures.tex`, and `captions.md`\n\n```bash\npaperbanana orchestrate \\\n  --paper paper.pdf \\\n  --data-dir .\u002Fresults \\\n  --max-method-figures 4 \\\n  --max-plot-figures 3 \\\n  --optimize\n```\n\nUse `--dry-run` to only plan and inspect `orchestration_plan.json` without API calls.\nUse `--resume-orchestrate \u003Cid-or-path>` to continue an interrupted orchestration from checkpoint state.\n\n| Flag | Description |\n|------|-------------|\n| `--paper` \u002F `-p` | Paper source path (`.txt`, `.md`, or `.pdf`) |\n| `--resume-orchestrate` | Resume an existing orchestration by ID or directory |\n| `--retry-failed` | When resuming, include previously failed tasks |\n| `--max-retries` | Extra retries per task after first failure |\n| `--data-dir` | Optional directory containing CSV\u002FJSON files for plot planning |\n| `--output-dir` \u002F `-o` | Parent output directory (creates `orchestrate_*`) |\n| `--max-method-figures` | Max methodology figures to plan\u002Fgenerate |\n| `--max-plot-figures` | Max plot figures to plan\u002Fgenerate |\n| `--pdf-pages` | PDF-only page selection (e.g. `1-5`, `2,4,6-8`) |\n| `--optimize` | Enable input optimization for generated items |\n| `--iterations` \u002F `-n` | Refinement iterations per generated item |\n| `--auto` + `--max-iterations` | Critic-driven auto-refine mode with safety cap |\n| `--concurrency` | Parallel figure generation workers |\n| `--format` \u002F `-f` | Output format (`png`, `jpeg`, `webp`) |\n| `--dry-run` | Plan package only; no generation calls |\n\n### `paperbanana composite` -- Compose Multi-Panel Figures\n\nStitch multiple images into a single labeled figure with `(a)`, `(b)`, `(c)` sub-panel labels:\n\n```bash\npaperbanana composite \\\n  panel_a.png panel_b.png panel_c.png \\\n  --layout 1x3 \\\n  --output figure2.png\n```\n\n| Flag | Short | Description |\n|------|-------|-------------|\n| `IMAGES` | | Positional: paths to images to compose |\n| `--layout` | `-l` | Grid layout: `RxC` (e.g. `1x3`, `2x2`) or `auto` (default: auto) |\n| `--labels` | | Comma-separated labels, or `none` to disable (default: auto `(a),(b),...`) |\n| `--spacing` | `-s` | Pixel spacing between panels (default: 20) |\n| `--label-position` | | `top` or `bottom` (default: bottom) |\n| `--label-font-size` | | Font size for labels (default: 32) |\n| `--output` | `-o` | Output path (default: composite_output.png) |\n\nThis command works on any existing images — no API calls needed. It is also triggered automatically when a batch manifest includes a `composite` section (see `paperbanana batch` above).\n\n### `paperbanana evaluate` -- Quality Assessment\n\nComparative evaluation of a generated diagram against a human reference using VLM-as-a-Judge:\n\n```bash\npaperbanana evaluate \\\n  --generated diagram.png \\\n  --reference human_diagram.png \\\n  --context method.txt \\\n  --caption \"Overview of our framework\"\n```\n\n| Flag | Short | Description |\n|------|-------|-------------|\n| `--generated` | `-g` | Path to generated image (required) |\n| `--reference` | `-r` | Path to human reference image (required) |\n| `--context` | | Path to source context text file or PDF (required) |\n| `--caption` | `-c` | Figure caption (required) |\n| `--pdf-pages` | | PDF context only: 1-based page selection (default: all) |\n\nScores on 4 dimensions (hierarchical aggregation per the paper):\n- **Primary**: Faithfulness, Readability\n- **Secondary**: Conciseness, Aesthetics\n\n### `paperbanana studio` -- Local web UI\n\nRequires `pip install 'paperbanana[studio]'` (Gradio).\n\n```bash\npaperbanana studio\npaperbanana studio --port 8080 --output-dir .\u002Fmy_outputs\n```\n\n| Flag | Description |\n|------|-------------|\n| `--host` | Bind address (default `127.0.0.1`) |\n| `--port` | Port (default `7860`) |\n| `--share` | Create a temporary public Gradio link (do not use with sensitive data) |\n| `--config` | Path to YAML config |\n| `--output-dir` \u002F `-o` | Default output directory for runs |\n| `--root-path` | URL subpath when behind a reverse proxy |\n\n### `paperbanana setup` -- First-Time Configuration\n\n```bash\npaperbanana setup\n```\n\nInteractive wizard that first asks whether to use the official Gemini API.\nIf you choose official API, it follows the default AI Studio key flow; if not, it asks for a custom Gemini-compatible URL and API key.\n\n---\n\n## Python API\n\n```python\nimport asyncio\nfrom paperbanana import PaperBananaPipeline, GenerationInput, DiagramType\nfrom paperbanana.core.config import Settings\n\nsettings = Settings(\n    vlm_provider=\"openai\",\n    vlm_model=\"gpt-5.2\",\n    image_provider=\"openai_imagen\",\n    image_model=\"gpt-image-1.5\",\n    optimize_inputs=True,   # Enable input optimization\n    auto_refine=True,       # Loop until critic is satisfied\n)\n\npipeline = PaperBananaPipeline(settings=settings)\n\nresult = asyncio.run(pipeline.generate(\n    GenerationInput(\n        source_context=\"Our framework consists of...\",\n        communicative_intent=\"Overview of the proposed method.\",\n        diagram_type=DiagramType.METHODOLOGY,\n    )\n))\n\nprint(f\"Output: {result.image_path}\")\n```\n\n**Progress callbacks:** `generate()` and `continue_run()` accept an optional `progress_callback` argument. The pipeline invokes it with `PipelineProgressEvent` objects (stage, message, seconds, iteration, extra) at each step (optimizer, retriever, planner, stylist, visualizer, critic), so you can show progress in UIs or log timing without patching agents.\n\nTo continue a previous run:\n\n```python\nfrom paperbanana.core.resume import load_resume_state\n\nstate = load_resume_state(\"outputs\", \"run_20260218_125448_e7b876\")\nresult = asyncio.run(pipeline.continue_run(\n    resume_state=state,\n    additional_iterations=3,\n    user_feedback=\"Make the encoder block more prominent\",\n))\n```\n\nSee `examples\u002Fgenerate_diagram.py` and `examples\u002Fgenerate_plot.py` for complete working examples.\n\n---\n\n## MCP Server\n\nPaperBanana includes an MCP server for use with Claude Code, Cursor, or any MCP-compatible client. Add the following config to use it via `uvx` without a local clone:\n\n```json\n{\n  \"mcpServers\": {\n    \"paperbanana\": {\n      \"command\": \"uvx\",\n      \"args\": [\"--from\", \"paperbanana[mcp]\", \"paperbanana-mcp\"],\n      \"env\": { \"GOOGLE_API_KEY\": \"your-google-api-key\" }\n    }\n  }\n}\n```\n\nFour MCP tools are exposed: `generate_diagram`, `generate_plot`, `evaluate_diagram`, and `evaluate_plot`.\n\nThe repo also ships with 3 Claude Code skills:\n- `\u002Fgenerate-diagram \u003Cfile> [caption]` - generate a methodology diagram from a text file\n- `\u002Fgenerate-plot \u003Cdata-file> [intent]` - generate a statistical plot from CSV\u002FJSON data\n- `\u002Fevaluate-diagram \u003Cgenerated> \u003Creference>` - evaluate a diagram against a human reference\n\nSee [`mcp_server\u002FREADME.md`](mcp_server\u002FREADME.md) for full setup details (Claude Code, Cursor, local development).\n\n---\n\n## Configuration\n\nDefault settings are in `configs\u002Fconfig.yaml`. Override via CLI flags or a custom YAML:\n\n```bash\npaperbanana generate \\\n  --input method.txt \\\n  --caption \"Overview\" \\\n  --config my_config.yaml\n```\n\nKey settings:\n\n```yaml\nvlm:\n  provider: openai           # openai, gemini, or openrouter\n  model: gpt-5.2\n\nimage:\n  provider: openai_imagen    # openai_imagen, google_imagen, or openrouter_imagen\n  model: gpt-image-1.5\n\npipeline:\n  num_retrieval_examples: 10\n  refinement_iterations: 3\n  # auto_refine: true        # Loop until critic is satisfied\n  # max_iterations: 30       # Safety cap for auto_refine mode\n  # optimize_inputs: true    # Preprocess inputs for better generation\n  output_resolution: \"2k\"\n\nreference:\n  path: data\u002Freference_sets\n\noutput:\n  dir: outputs\n  save_iterations: true\n  save_metadata: true\n```\n\nEnvironment variables (`.env`):\n\n```bash\n# OpenAI (default)\nOPENAI_API_KEY=your-key\nOPENAI_BASE_URL=https:\u002F\u002Fapi.openai.com\u002Fv1    # or Azure endpoint\nOPENAI_VLM_MODEL=gpt-5.2                      # override model\nOPENAI_IMAGE_MODEL=gpt-image-1.5              # override model\n\n# Google Gemini (alternative, free)\nGOOGLE_API_KEY=your-key\nGOOGLE_BASE_URL=                            # optional custom Gemini-compatible endpoint\nGOOGLE_VLM_MODEL=gemini-2.0-flash          # override Gemini VLM model\nGOOGLE_IMAGE_MODEL=gemini-3-pro-image-preview  # override Gemini image model\n```\n\n---\n\n## Project Structure\n\n```\npaperbanana\u002F\n├── paperbanana\u002F\n│   ├── core\u002F          # Pipeline orchestration, types, config, resume, utilities\n│   ├── agents\u002F        # Optimizer, Retriever, Planner, Stylist, Visualizer, Critic\n│   ├── providers\u002F     # VLM and image gen provider implementations\n│   │   ├── vlm\u002F       # OpenAI, Gemini, OpenRouter VLM providers\n│   │   └── image_gen\u002F # OpenAI, Gemini, OpenRouter image gen providers\n│   ├── reference\u002F     # Reference set management (13 curated examples)\n│   ├── guidelines\u002F    # Style guidelines loader\n│   └── evaluation\u002F    # VLM-as-Judge evaluation system\n├── configs\u002F           # YAML configuration files\n├── prompts\u002F           # Prompt templates for all agents + evaluation\n│   ├── diagram\u002F       # context_enricher, caption_sharpener, retriever, planner, stylist, visualizer, critic\n│   ├── plot\u002F          # plot-specific prompt variants\n│   └── evaluation\u002F    # faithfulness, conciseness, readability, aesthetics\n├── data\u002F\n│   ├── reference_sets\u002F  # 13 verified methodology diagrams\n│   └── guidelines\u002F      # NeurIPS-style aesthetic guidelines\n├── examples\u002F          # Working example scripts + sample inputs\n├── scripts\u002F           # Data curation and build scripts\n├── tests\u002F             # Test suite\n├── mcp_server\u002F        # MCP server for IDE integration\n└── .claude\u002Fskills\u002F    # Claude Code skills (generate-diagram, generate-plot, evaluate-diagram)\n```\n\n## Development\n\n```bash\n# Install with dev dependencies\npip install -e \".[dev,openai,google]\"\n\n# Run tests\npytest tests\u002F -v\n\n# Lint\nruff check paperbanana\u002F mcp_server\u002F tests\u002F scripts\u002F\n\n# Format\nruff format paperbanana\u002F mcp_server\u002F tests\u002F scripts\u002F\n```\n\n## Citation\n\nThis is an **unofficial** implementation. If you use this work, please cite the **original paper**:\n\n```bibtex\n@article{zhu2026paperbanana,\n  title={PaperBanana: Automating Academic Illustration for AI Scientists},\n  author={Zhu, Dawei and Meng, Rui and Song, Yale and Wei, Xiyu\n          and Li, Sujian and Pfister, Tomas and Yoon, Jinsung},\n  journal={arXiv preprint arXiv:2601.23265},\n  year={2026}\n}\n```\n\n**Original paper**: [https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.23265](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.23265)\n\n## Disclaimer\n\nThis project is an independent open-source reimplementation based on the publicly available paper.\nIt is not affiliated with, endorsed by, or connected to the original authors, Google Research, or\nPeking University in any way. The implementation may differ from the original system described in the paper.\nUse at your own discretion.\n\n## License\n\nMIT\n","PaperBanana 是一个用于自动生成学术图表、示意图和研究视觉材料的开源项目，扩展了Google Research的同名工具，并新增了幻灯片生成等功能。该项目基于Python语言开发，采用多代理流水线设计，支持通过文本描述生成高质量的学术图表和统计图。它集成了多种图像生成服务提供商，包括OpenAI（GPT-5.2 + GPT-Image-1.5）、Azure OpenAI\u002FFoundry以及Google Gemini，以满足不同用户的需求。此外，PaperBanana还具有输入优化层，能够提高生成内容的质量。此工具非常适合科研人员、教育工作者及任何需要快速创建专业级学术或技术可视化材料的人士使用。",2,"2026-06-11 03:49:37","high_star"]