[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1138":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":27,"discoverSource":28},1138,"Camyla","yifangao112\u002FCamyla","yifangao112","Scaling Autonomous Research in Medical Image Segmentation",null,"Python",351,41,13,1,0,2,15,6,51.37,"Apache License 2.0",false,"main",[],"2026-06-12 04:00:07","\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo.png\" width=\"420\" alt=\"Camyla\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  [ English | \u003Ca href=\"README_zh.md\">中文\u003C\u002Fa> ]\n\u003C\u002Fp>\n\n## Large-scale, fully autonomous medical image segmentation research — dataset in, paper out.\n\n---\n\n## 📄 Paper\n\n### Camyla: Scaling Autonomous Research in Medical Image Segmentation\n\nYifan Gao¹², Haoyue Li¹, Feng Yuan¹, Xin Gao¹\\*, Weiran Huang²³\\*, Xiaosong Wang²⁴\\*\n\n\u003Csup>¹ USTC · ² Shanghai Innovation Institute · ³ SJTU · ⁴ Shanghai AI Lab · \\* Corresponding authors\u003C\u002Fsup>\n\n📑 **[Read the paper (PDF)](assets\u002Fcamyla.pdf)** · **[alphaXiv](https:\u002F\u002Fwww.alphaxiv.org\u002Fabs\u002F2604.10696)** · **[arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.10696)**\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fhero.png\" width=\"760\" alt=\"Camyla hero\">\n\u003C\u002Fp>\n\n**Headline results** (28 days, **zero human intervention**):\n\n- **40** complete manuscripts written end-to-end\n- **Cost-efficient**: only **$20–30 per paper** in LLM API spend\n- **Beats the strongest per-dataset baseline** (chosen from 14 established architectures including nnU-Net) on **24 of 31** datasets under identical training budgets\n- **CamylaBench**: contamination-free benchmark of 31 datasets, built exclusively from 2025 publications\n- **Stronger long-horizon orchestration on the experiment stage.** During experiment execution, driven by cost-efficient backends (**GLM-4.7** + **MiniMax-M2.5**), Camyla outperforms **AI Scientist**, **autoresearch Claude Code (Opus 4.6)**, and **autoresearch Codex (GPT-5.4-xhigh)** on execution success, completion rate, and fidelity to the original proposal\n\n**Manuscript quality — double-blind evaluation.** Camyla-generated manuscripts were mixed with real 2025 publications and judged without reviewers knowing which was AI-written. Four independent panels — **5 senior reviewers**, **10 junior reviewers**, **5 different AI models**, and the **Stanford Agentic Reviewer** — all place Camyla's output between the T1 and T2 tier of contemporary medical-imaging journals. The T1 anchor is IEEE TMI \u002F Medical Image Analysis; the T2 band is JCR Q1 medical-imaging journals:\n\n| Tier | Journals | Papers | Representative venues |\n|------|:-------:|:------:|-----------------------|\n| **T1** (top-tier) | 2 | 10 | IEEE Transactions on Medical Imaging; Medical Image Analysis |\n| **T2** (JCR Q1) | 7 | 35 | IEEE Journal of Biomedical and Health Informatics; Artificial Intelligence in Medicine; et al. |\n| **T3** | 9 | 45 | International Journal of Computer Assisted Radiology and Surgery; Biomedical Physics & Engineering Express; et al. |\n| **Total** | 18 | 90 | |\n\n---\n\n## 📰 News\n\n- **2026-04-13** — Initial public release of the code, the paper, and **CamylaBench** (31 pre-formatted datasets, [📦 Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F11BrGWWZw8yr2o2AkVV9o5e7de-T9nngz?usp=sharing)). We are still cleaning the full **CamylaTrace-232K** trajectory dataset (per-run logs + intermediate artifacts); release will follow shortly.\n\n---\n\nCamyla is an automated research pipeline that takes a medical image segmentation task,\nsearches the literature for relevant ideas, proposes research hypotheses, runs\nend-to-end deep-learning experiments (Baseline → Creative Research → Ablation), and\nwrites up the results as a publication-ready paper.\n\nIt combines:\n\n- **Quality-Weighted Branch Exploration (QWBE)** over experiment configurations\n- **OpenHands**-driven code generation and iterative debugging\n- Multi-source literature search (ArXiv, OpenAlex, PubMed, Semantic Scholar)\n- A paper agent that drafts, compiles, and cites LaTeX papers (Elsevier format supported)\n- A flexible LLM routing layer (`llm_endpoints` + `llm_roles`) so you can mix providers\n  (OpenRouter, GLM, MiniMax, …) without touching code\n\n> **Status.** Research prototype, preparing for open-source release. APIs may change.\n\n---\n\n## 📚 Sample generated papers\n\nA 10-paper subset of the manuscripts Camyla produced end-to-end — no hand-editing,\nLaTeX compiled as-is by the pipeline. Each PDF lives under\n[`assets\u002Fpaper_pdf\u002F`](assets\u002Fpaper_pdf\u002F).\n\n| # | Title | Modality \u002F task |\n|:-:|-------|-----------------|\n| 1 | [Cross-Directional Feature Lattice for Brain Tumor Segmentation](assets\u002Fpaper_pdf\u002F01.pdf) | MRI · brain tumor |\n| 2 | [Scale-Frequency Adaptive Fusion for Multiple Sclerosis Lesion Segmentation](assets\u002Fpaper_pdf\u002F02.pdf) | MRI · MS lesions |\n| 3 | [Hierarchical Context Gating for Neonatal Brain Lesion Segmentation](assets\u002Fpaper_pdf\u002F03.pdf) | MRI · neonatal HIE |\n| 4 | [Cross-Scale Mutual Refinement for Bronchoalveolar Lavage Fluid Cell Segmentation](assets\u002Fpaper_pdf\u002F04.pdf) | Microscopy · BALF cells |\n| 5 | [Symmetry-Aware Cascaded Attention for Panoramic Tooth Segmentation](assets\u002Fpaper_pdf\u002F05.pdf) | Dental X-ray · tooth |\n| 6 | [Specular-Residual Decoupled Encoding for Surgical Scene Segmentation](assets\u002Fpaper_pdf\u002F06.pdf) | Laparoscopy · surgical scene |\n| 7 | [Adaptive Scale-Aware Feature Integration for Liver Lesion Segmentation](assets\u002Fpaper_pdf\u002F07.pdf) | CT · liver lesion |\n| 8 | [Boundary-Hierarchical Decomposition for Fetal Brain Tissue Segmentation](assets\u002Fpaper_pdf\u002F08.pdf) | MRI · fetal brain |\n| 9 | [Vessel-Guided Boundary Residual Networks for Dermatological Vessel Segmentation](assets\u002Fpaper_pdf\u002F09.pdf) | OCTA · dermatological vessel |\n| 10 | [Hierarchical Resolution-Retentive Feature Encoding for Brain Metastasis Segmentation](assets\u002Fpaper_pdf\u002F10.pdf) | MRI · brain metastasis |\n\n---\n\n## Quick start\n\n### 1. Install\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fyifangao112\u002FCamyla.git camyla\ncd camyla\npython -m venv .venv && source .venv\u002Fbin\u002Factivate   # or: conda create -n camyla python=3.10\npip install -r requirements.txt\n\n# Sister packages (segmentation framework + raw-data conversion agent)\npip install git+https:\u002F\u002Fgithub.com\u002Fyifangao112\u002FCamylaNet.git\npip install git+https:\u002F\u002Fgithub.com\u002Fyifangao112\u002FnnPrep.git\n```\n\n> A Python 3.10+ environment is expected. GPU is required for actually running\n> the segmentation experiments (baseline uses CamylaNet \u002F nnU-Net v2).\n\n### 2. Configure\n\nSet CamylaNet's data-path environment variables (same convention as CamylaNet's\nown README — the baseline pipeline reads and writes under these paths):\n\n```bash\nexport camylanet_raw=\"\u002Fpath\u002Fto\u002Fcamylanet_raw\"\nexport camylanet_preprocessed=\"\u002Fpath\u002Fto\u002Fcamylanet_preprocessed\"\nexport camylanet_results=\"\u002Fpath\u002Fto\u002Fcamylanet_results\"   # baseline artifacts land here\n```\n\nThen copy the example Camyla config and edit it:\n\n```bash\ncp config_example.yaml config.yaml\n```\n\nAt minimum, set:\n\n- At least one entry under `llm_endpoints` with a valid `api_key` (or export the\n  matching `api_key_env`, e.g. `OPENROUTER_API_KEY`)\n- `default_endpoint` — the endpoint name most roles will use\n\nSee [Configuration](#configuration) below for the full layout, role-routing\nsemantics, and N-way model competition. The shipped\n[config_example.yaml](config_example.yaml) already has a complete multi-provider\nsetup you can copy from.\n\n### 3. Prepare the dataset\n\nCamyla operates on a dataset that already lives in nnU-Net v2 layout under\n`$camylanet_raw\u002FDataset{ID}_{Abbr}\u002F`. You have three options:\n\n- **Use CamylaBench (recommended).** The 31 datasets used in the paper are\n  pre-formatted and ready to drop in — download from\n  [📦 Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F11BrGWWZw8yr2o2AkVV9o5e7de-T9nngz?usp=sharing)\n  and extract each `Dataset{ID}_{Abbr}\u002F` folder under `$camylanet_raw\u002F`.\n  The ready-made idea descriptions in [`ideas\u002F`](ideas\u002F) match these IDs.\n- **Bring your own, already in nnU-Net v2 format.** Just point `camylanet_raw` at it.\n- **Bring your own raw data in some arbitrary layout.** Use\n  [nnPrep](https:\u002F\u002Fgithub.com\u002Fyifangao112\u002FnnPrep) (an LLM agent that converts\n  arbitrary medical segmentation datasets into the nnU-Net v2 format) — or\n  write your own conversion script.\n\n### 4. Run\n\nPick an idea from [`ideas\u002F`](ideas\u002F) (31 ready-to-use dataset descriptions) and launch:\n\n```bash\npython launch_camyla.py \\\n    --config config.yaml \\\n    --load_ideas ideas\u002F900.json \\\n    --idea_idx 0\n```\n\nOn first run for a given dataset, Camyla automatically runs the baseline pipeline\n(trainer screening + full training for everything that passes) under\n`$camylanet_results`. Subsequent runs reuse the artifacts and skip this step.\n\nThe pipeline writes everything to `experiments\u002F\u003Cdate>_\u003Cidea>_attempt_\u003Cid>\u002F`:\n\n```\nexperiments\u002F2026-04-12_liver_segmentation_attempt_0\u002F\n├── idea.json \u002F idea.md           # task spec\n├── config.yaml                   # resolved config used for this run\n├── logs\u002F0-run\u002F\n│   ├── experiment_report.md      # human-readable summary\n│   └── experiment_results\u002F       # metrics, checkpoints, plots\n├── research_proposals\u002F           # auto-generated proposals\n└── paper\u002F                        # LaTeX + compiled PDF (if writeup enabled)\n```\n\nCommon flags:\n\n| Flag | Purpose |\n|------|---------|\n| `--resume_from_checkpoint PATH` | Resume from a previous `checkpoint.pkl` |\n| `--skip_writeup` \u002F `--skip_review` | Run experiments only, skip paper generation |\n| `--debug-baseline` | Fake the baseline metrics so Stage 2 runs immediately (dev only) |\n| `--verbose` | DEBUG-level logging |\n\n---\n\n## How it works\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fcamyla-overview.png\" width=\"820\" alt=\"Camyla system overview\">\n\u003C\u002Fp>\n\n- **Phase 1-3 (idea generation).** Searches 1-4 literature sources, extracts open\n  research challenges, and generates multiple proposals scored by an assessment LLM.\n- **Stage 1-3 (experiment).** QWBE expands a tree of code variants. Stage 2 can run\n  **N-way model competition** by listing multiple `experiment.code.candidates`.\n- **Paper Agent.** Takes the final experiment results and produces a cited paper.\n\n---\n\n## Configuration\n\nEvery LLM call Camyla makes resolves through a two-layer system:\n\n1. **`llm_endpoints`** — your named LLM connections (one per provider\u002Fbackend).\n2. **`llm_roles`** — every internal component (\"role\") picks an endpoint and\n   optionally overrides its model, temperature, or `max_tokens`.\n\nAnything you don't configure at the role level falls back to `default_endpoint`.\n\n### 1. `llm_endpoints` — connections\n\nEach entry is an OpenAI-compatible endpoint. The shape is fixed:\n\n```yaml\nllm_endpoints:\n  my_openrouter:\n    api_key: \"\"                              # inline key (leave empty → use env)\n    api_key_env: OPENROUTER_API_KEY          # env var name to read when api_key is empty\n    base_url: \"https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1\"\n    model: \"deepseek\u002Fdeepseek-v3.2\"          # default model for this endpoint\n    temperature: 0.5                         # default temperature\n```\n\n**Key resolution order:** non-empty `api_key` > environment variable named by\n`api_key_env` > empty (error at first call).\n\n**Naming is free-form.** `my_openrouter`, `cheap_backend`, `gpt4` — all valid.\nRoles reference endpoints by name, so changing models across the whole pipeline\nis just swapping one string. Common backends we've tested:\n\n| Backend          | `base_url`                                     | Notes |\n|------------------|-----------------------------------------------|-------|\n| OpenRouter       | `https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1`                | one key, 300+ models |\n| DashScope (Qwen) | `https:\u002F\u002Fcoding.dashscope.aliyuncs.com\u002Fv1`    | cheap Qwen\u002FGLM routing |\n| MiniMax          | `https:\u002F\u002Fapi.minimaxi.com\u002Fv1`                 | M-series models |\n| OpenAI           | `https:\u002F\u002Fapi.openai.com\u002Fv1`                   | native |\n| Local vLLM \u002F Ollama | `http:\u002F\u002Flocalhost:8000\u002Fv1`                  | any OpenAI-compatible server |\n\n### 2. `default_endpoint`\n\nThe global fallback. Every role without an explicit `endpoint` uses this one.\n\n```yaml\ndefault_endpoint: my_openrouter\n```\n\n### 3. `llm_roles` — per-role overrides\n\nA role is one logical LLM use inside the pipeline (feedback, paper writer, idea\ngenerator, etc.). You only specify the fields you want to override:\n\n```yaml\nllm_roles:\n  # Tree-search roles\n  feedback:           { temperature: 0.9, max_tokens: 8192 }\n  log_summary:        { temperature: 1.0 }\n\n  # Idea-generation roles — swap to a cheaper model for these\n  literature_backbone:  { model: google\u002Fgemini-3-flash-preview }\n  challenge_extraction: { temperature: 0.3 }\n\n  # Paper agent sub-agents: `_default` applies to the whole group unless\n  # an individual sub-agent overrides it.\n  paper_agent:\n    _default:             { temperature: 0.6 }\n    BibtexAgent:          { model: z-ai\u002Fglm-4.7, temperature: 0.3 }\n    IdeaGenerationAgent:  { model: google\u002Fgemini-3-flash-preview, temperature: 0.8 }\n\n  # Paper writing roles can be routed to a different endpoint entirely.\n  paper_writing:\n    latex_editor:    { endpoint: my_dashscope, temperature: 0.7 }\n    image_generator: { endpoint: my_openrouter,\n                       model: google\u002Fgemini-3.1-flash-image-preview,\n                       aspect_ratio: \"16:9\", image_size: \"2K\" }\n```\n\nOverride precedence for any given role: role fields > its endpoint's defaults.\nYou can point a role at a completely different endpoint with `endpoint: \u003Cname>`.\n\n### 4. N-way model competition (Stage 2)\n\nUnder `experiment.code.candidates`, list the **endpoint names** that should compete\nas code authors for Stage 2. Camyla will run one branch per candidate and keep\nthe strongest.\n\n```yaml\nexperiment:\n  code:\n    candidates: [my_dashscope, my_minimax]\n    max_tokens: 16384\n```\n\nSet it to a single-element list to disable competition.\n\n### 5. Non-LLM API keys\n\nLiterature search keys live under `api_keys` (same inline\u002Fenv fallback pattern):\n\n```yaml\napi_keys:\n  s2:   { value: \"\", env: S2_API_KEY }    # Semantic Scholar\n  ncbi: { value: \"\", env: NCBI_API_KEY }  # PubMed\n```\n\nCamyla works without these, but rate limits will be tighter.\n\n### 6. Tuning the run\n\nThe rest of the config controls **what** Camyla does, not **which LLMs** it uses:\n\n- `idea_generation.*` — how many papers to search, how to score proposals,\n  how many generator personalities to ensemble\n- `experiment.stages.*` — per-stage iteration budgets (Stage 1 = baseline\n  replication, Stage 2 = creative research, Stage 3 = ablation)\n- `experiment.openhands.*` — OpenHands coder settings (python path, iteration\n  cap, condenser)\n- `experiment.search.*` — QWBE hyperparameters (UCB constant, debug probability,\n  draft count)\n\nAll of these have sensible defaults in [config_example.yaml](config_example.yaml);\nyou usually only touch `stages.*_max_iters` when you want a faster \u002F cheaper run.\n\n---\n\n## Repository layout\n\n```\ncamyla\u002F\n├── LICENSE\n├── README.md                        # this file\n├── config_example.yaml              # documented config template\n├── requirements.txt\n├── launch_camyla.py                 # main entry point\n├── ideas\u002F                           # 31 ready-made idea JSONs\n└── camyla\u002F                          # core package\n    ├── model_config.py              # LLM config loader (get_endpoint\u002Fget_role)\n    ├── baseline\u002F                    # screening + full training before QWBE\n    ├── infrastructure\u002Fliterature\u002F   # arxiv \u002F openalex \u002F pubmed \u002F multi-source\n    ├── paper_agent\u002F                 # LaTeX writer + plotters + bibtex agent\n    ├── tools\u002F                       # OpenAlex \u002F Semantic Scholar tools\n    ├── treesearch\u002F                  # QWBE core, parallel agents, OpenHands coder\n    └── utils\u002F\n```\n\n---\n\n## Related repositories\n\n- [CamylaNet](https:\u002F\u002Fgithub.com\u002Fyifangao112\u002FCamylaNet) — segmentation framework\n  built on nnU-Net v2, shipping a curated set of CNN \u002F Transformer \u002F state-space\n  backbones. Camyla's baseline stage runs CamylaNet trainers.\n- [nnPrep](https:\u002F\u002Fgithub.com\u002Fyifangao112\u002FnnPrep) — LLM agent that converts\n  arbitrary medical segmentation datasets into the nnU-Net v2 format consumed\n  by CamylaNet (and therefore by Camyla).\n\n---\n\n## Limitations & notes\n\n- The baseline stage currently uses CamylaNet trainers. Swapping in another\n  baseline framework requires a corresponding skill under `skills\u002Fframeworks\u002F`.\n- OpenHands runs code in your local Python env — make sure you point\n  `experiment.openhands.python_path` at an env with the needed packages.\n- Long runs: a full idea (proposals → 3 stages → paper) typically takes\n  several hours to a day on a single A100-class GPU.\n\n---\n\n## Citation\n\nIf you find this project useful in academic work, please cite:\n\n```bibtex\n@misc{gao2026camyla,\n  title         = {Camyla: Scaling Autonomous Research in Medical Image Segmentation},\n  author        = {Gao, Yifan and Li, Haoyue and Yuan, Feng and Gao, Xin and Huang, Weiran and Wang, Xiaosong},\n  year          = {2026},\n  eprint        = {2604.10696},\n  archivePrefix = {arXiv},\n  primaryClass  = {cs.AI}\n}\n```\n\n---\n\n## Acknowledgements\n\nCamyla builds on ideas and code from two upstream projects:\n\n- [**AI Scientist** (Sakana AI)](https:\u002F\u002Fgithub.com\u002FSakanaAI\u002FAI-Scientist) — pioneered the autonomous research-agent paradigm that inspired Camyla's overall pipeline design.\n- [**nnU-Net**](https:\u002F\u002Fgithub.com\u002FMIC-DKFZ\u002FnnUNet) — the self-configuring segmentation framework that Camyla's baseline stage (via [CamylaNet](https:\u002F\u002Fgithub.com\u002Fyifangao112\u002FCamylaNet)) is built on.\n\n---\n\n## License\n\nReleased under the **Apache License, Version 2.0** — see [LICENSE](LICENSE).\n","Camyla 是一个用于大规模全自动医学图像分割研究的项目。它能够从输入数据集中自动生成完整的科研论文，整个过程无需人工干预。项目利用了先进的语言模型（如GLM-4.7和MiniMax-M2.5）来优化实验执行的成本效率，并在多个公开数据集上表现出色，超越了许多现有的分割算法。Camyla 生成的论文质量经过双盲评审，达到了顶级医学影像期刊的标准。该项目特别适合需要高效、低成本地进行医学图像分割研究的场景，如医疗机构、科研机构以及对自动化科研流程感兴趣的开发者。","2026-06-11 02:41:52","CREATED_QUERY"]