[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74306":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":37,"readmeContent":38,"aiSummary":39,"trendingCount":16,"starSnapshotCount":16,"syncStatus":40,"lastSyncTime":41,"discoverSource":42},74306,"AutoR","AutoX-AI-Labs\u002FAutoR","AutoX-AI-Labs","AI handles execution, humans own the direction, and every run becomes an inspectable research artifact on disk.","",null,"Python",848,22,6,4,0,3,7,9,8.09,false,"main",true,[25,26,27,28,29,30,31,32,33,34,35,36],"agent","ai","ai-scientist","auto-research","claude","claude-code","cli","harness","llm","openai","paper","science","2026-06-12 02:03:25","\u003Ch1 align=\"center\">AutoR: A Human-Centered Research OS\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>AI handles execution. Humans own the direction.\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  A terminal-first research harness, with a local browser Studio, that turns long, messy research work into reproducible, artifact-backed runs.\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10%2B-blue\" alt=\"Python 3.10+\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWorkflow-Intake%20%2B%208%20Stages-black\" alt=\"Intake plus 8 stages\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FInterface-Terminal--first-green\" alt=\"Terminal-first\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuman-Approval%20Required-orange\" alt=\"Human approval required\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FExecution-Agent%20Harness-purple\" alt=\"Agent harness\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArtifacts-Reproducible%20Research%20Runs-red\" alt=\"Reproducible research runs\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FHavenIntelligence\u002FAutoR\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FHavenIntelligence\u002FAutoR?style=social\" alt=\"GitHub stars\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#-overview\">Overview\u003C\u002Fa>\n  ·\n  \u003Ca href=\"#-news\">News\u003C\u002Fa>\n  ·\n  \u003Ca href=\"#-showcase\">Showcase\u003C\u002Fa>\n  ·\n  \u003Ca href=\"#-quick-start\">Quick Start\u003C\u002Fa>\n  ·\n  \u003Ca href=\"#-how-it-works\">How It Works\u003C\u002Fa>\n  ·\n  \u003Ca href=\"#-run-layout\">Run Layout\u003C\u002Fa>\n  ·\n  \u003Ca href=\"#-architecture\">Architecture\u003C\u002Fa>\n  ·\n  \u003Ca href=\"#-roadmap\">Roadmap\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Start Here:\u003C\u002Fstrong>\n  \u003Ca href=\"docs\u002Ftutorial_en.md\">English Guide\u003C\u002Fa>\n  ·\n  \u003Ca href=\"docs\u002Ftutorial_zh.md\">中文教程\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fexamples\u002Fexample_fig6_two_layer.png\" alt=\"AutoR example figure\" width=\"92%\" \u002F>\n\u003C\u002Fp>\n\n---\n\n> AutoR is not a chat demo, not a generic agent framework, and not a markdown-only research toy.\n>\n> It is a structured research harness over a coding agent execution layer:\n> **AI handles execution, humans own the direction, and every run becomes an inspectable research artifact on disk.**\n\n> New users should start with the step-by-step guides:\n> [English Guide](docs\u002Ftutorial_en.md) or [中文教程](docs\u002Ftutorial_zh.md).\n\n## 📖 Overview\n\nMost autoresearch systems optimize for autonomy.\n\nAutoR takes a different position: research is too important to hand over as a blind end-to-end loop. The goal is not to remove humans from research. The goal is to give them a stronger execution system.\n\n### ✨ At a Glance\n\n| Dimension | AutoR |\n| --- | --- |\n| Execution model | A coding agent as the execution layer, AutoR as the research control loop |\n| Control model | Human approval by default, with an optional strict reviewer-agent gate for unattended runs |\n| Research unit | A reproducible run under `runs\u002F\u003Crun_id>\u002F` |\n| Workflow shape | 9-stage workflow: optional intake plus eight formal research stages |\n| Quality bar | Artifact-backed outputs, not markdown-only summaries |\n| Recovery | Resume, redo-stage, rollback-stage, stage-local continuation |\n\n### 🔦 Highlights\n\n| Layer | Highlight | What AutoR actually does |\n| --- | --- | --- |\n| Big idea | **Human-centered research execution** | AutoR is not an autonomous scientist. AI handles execution; humans retain approval and direction at every stage boundary. |\n| Big idea | **Research loop over agent loop** | The system manages stage progression, validation, repair, recovery, and human checkpoints above the lower-level agent execution loop. |\n| Big idea | **Every run is a reproducible research artifact** | Each run leaves behind prompts, logs, approved summaries, code, data, figures, writing sources, and packaged outputs under `runs\u002F\u003Crun_id>\u002F`. |\n| Big idea | **Verifiable outputs, not paper-shaped theater** | The workflow is judged by inspectable artifacts and human approval, not by whether a generated document merely looks polished. |\n| Useful feature | **Structured literature organization** | Survey notes, bibliographies, related-work tables, and reading artifacts stay under `workspace\u002Fliterature\u002F` instead of disappearing into chat history. |\n| Useful feature | **Automated experiment manifests** | Machine-readable experiment and result files make runs inspectable, comparable, and reusable downstream. |\n| Useful feature | **Citation verification and writing checks** | Writing expects citation verification, build logs, and self-review artifacts before Stage 07 is considered complete. |\n| Useful feature | **Artifact indexing across stages** | `artifact_index.json` and related manifests help later stages find data, results, and figures without guessing from filenames. |\n| Useful feature | **Resume, redo, and rollback controls** | Long research runs can continue in place, retry a stage, or roll downstream state back without starting over. |\n| Useful feature | **Venue-aware packaging** | AutoR can package manuscript sources, PDFs, review materials, and release-ready artifacts instead of stopping at markdown summaries. |\n\nIn practice, that means AutoR is useful not only because of the high-level framing, but also because it handles real research chores: literature organization, experiment manifests, citation verification, artifact indexing, manuscript packaging, and recoverable long-running workflows.\n\n### ✅ What AutoR Guarantees\n\n- By default, human approval is required before the workflow advances.\n- An optional reviewer agent can simulate that gate for unattended runs, but the human-centered default remains manual review.\n- Approved summaries become the only cross-stage memory.\n- Every run is isolated, resumable, and auditable.\n- Later stages must produce real artifacts, not only prose.\n- A coding agent is the execution layer; AutoR is the research control loop above it.\n\n### 🤔 Why AutoR?\n\nMany systems aim to generate research outputs that *look* ready.\n\nAutoR takes a harder path:\n\n- it requires real experiments\n- it enforces artifact validation\n- it keeps humans in control\n\nSo the question is not:\n\n> Does it look ready?\n\nIt is:\n\n> Can you verify every part of it?\n\n## 📰 News\n\nLatest mainline updates:\n\n- **2026-05-10**: Refined the terminal-first run experience. Stage 00 now uses a dedicated clarification flow: the first intake pass asks the user questions one by one with selectable options, custom answers, and skip; the revised intake brief then uses a compact refine \u002F approve \u002F abort menu instead of showing the normal suggestion template. The terminal UI also keeps colored frames on wrapped body rows, handles long lines and wide characters more reliably, and the Codex backend now uses the current `--sandbox workspace-write` execution flag instead of the deprecated Codex CLI `--full-auto` flag.\n- **2026-04-20**: Added an optional `--full-auto` approval mode. The execution loop is unchanged, but the manual approval gate can now be replaced by a strict simulated reviewer agent backed by Claude or Codex, with reviewer settings persisted in `run_config.json`.\n- **2026-04-19**: Merged **AutoR Studio** into main: a local browser workspace for the same run-based workflow, with live stage monitoring, human review, restart-safe recovery, paper preview, version history, and a Notebook view. The browser UI shares the same run directories and artifact model as the terminal workflow and is currently Claude-backed.\n- **2026-04-18**: Fixed a stage-summary recovery bug so local normalization now restores the required `Decision Ledger` section and validates draft outputs against the correct `.tmp.md` path. Added stage recovery controls that let operators `\u002Fskip` the current stage, `\u002Fback \u003Cstage>` to an earlier stage, or choose skip \u002F roll back directly after retry exhaustion.\n- **2026-04-15**: Added minimal `--operator codex` support alongside Claude, persisted the selected execution backend in `run_config.json`, and improved terminal rendering for backend JSON streams.\n- **2026-04-13**: Added literature evidence ledgers and citation verification outputs, introduced typed hypothesis manifests, hardened experiment manifest parsing, and added regression coverage for research diagram injection.\n- **2026-04-10**: Added a decision ledger for human approvals and refined the public showcase gallery so research artifacts are presented more clearly.\n- **2026-04-08**: Documented optional `--research-diagram` dependencies and tightened the README positioning around human-centered, artifact-backed research execution.\n\n## 🌟 Showcase\n\nAutoR already has a full example run used throughout the repository: `runs\u002F20260330_101222`.\n\n### 🧪 Example Run Snapshot\n\n| What the run produced | What it demonstrates |\n| --- | --- |\n| [example_paper.pdf](assets\u002Fexamples\u002Fexample_paper.pdf) | A compiled manuscript artifact within a broader research package |\n| Executable research code | The run is not just a writing pipeline |\n| Machine-readable datasets and result files | Claims are backed by inspectable experiment outputs |\n| Real figures used in the research package | The run produces publication-style visuals, not placeholders |\n| Review and dissemination materials | The workflow continues past writing into release readiness |\n\nHighlighted outcomes from that run:\n\n- `AGSNv2` reached **36.21 ± 1.08** on Actor.\n- The system produced a full research package with real figures, writing sources, and auditable artifacts.\n- The final run preserved the full human-in-the-loop approval trail.\n\n### 🖥️ Terminal Experience\n\nAutoR is designed for terminal-first execution, but the interaction layer is not limited to raw logs and plain prompts. The current UI supports banner-style startup, colored stage panels, parsed backend event streams, display-width-aware markdown wrapping, keyboard-selectable menus, and a Stage 00 clarification flow suitable for demos and recordings.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fterminal.png\" alt=\"AutoR terminal UI\" width=\"92%\" \u002F>\n\u003C\u002Fp>\n\n### 📈 Example Figures\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cstrong>Accuracy Comparison\u003C\u002Fstrong>\u003Cbr \u002F>\n      \u003Cimg src=\"assets\u002Fexamples\u002Fexample_fig1_accuracy.png\" alt=\"Example accuracy figure\" width=\"300\" \u002F>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cstrong>Ablation + Actor Results\u003C\u002Fstrong>\u003Cbr \u002F>\n      \u003Cimg src=\"assets\u002Fexamples\u002Fexample_fig4_ablation_actor.png\" alt=\"Example ablation figure\" width=\"300\" \u002F>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\" valign=\"top\" colspan=\"2\">\n      \u003Cstrong>Two-Layer Narrative Figure\u003C\u002Fstrong>\u003Cbr \u002F>\n      \u003Cimg src=\"assets\u002Fexamples\u002Fexample_fig6_two_layer.png\" alt=\"Two-layer narrative figure\" width=\"620\" \u002F>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### 🧾 Research Output Gallery\n\nThe manuscript pages below are only the visible surface of larger AutoR runs. To keep the showcase compact and comparable, this gallery uses a consistent 4 × 2 layout: four artifact-backed research outputs, two representative pages from each, and a short note on what each run is demonstrating.\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd valign=\"top\" width=\"23%\">\n      \u003Cstrong>Output 1\u003C\u002Fstrong>\u003Cbr \u002F>\n      A complete end-to-end AutoR run. The pair below shows the opening manuscript page and a later evidence-heavy page where algorithm, tables, and quantitative results appear together.\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cimg src=\"assets\u002Fexamples\u002Fexample_paper_page1.png\" alt=\"Output 1 page 1\" width=\"220\" \u002F>\u003Cbr \u002F>\n      \u003Cstrong>Page 1\u003C\u002Fstrong>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cimg src=\"assets\u002Fexamples\u002Fexample_paper_page5.png\" alt=\"Output 1 evidence page\" width=\"220\" \u002F>\u003Cbr \u002F>\n      \u003Cstrong>Evidence Page\u003C\u002Fstrong>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd valign=\"top\" width=\"23%\">\n      \u003Cstrong>Output 2\u003C\u002Fstrong>\u003Cbr \u002F>\n      \u003Cem>Do More Experts Help?\u003C\u002Fem> A parameter-matched MoE-LoRA study. The selected pages show the framing page and a chart-heavy evidence page.\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cimg src=\"assets\u002Fpaper_gallery\u002Fother_run_1_page1.png\" alt=\"Output 2 page 1\" width=\"220\" \u002F>\u003Cbr \u002F>\n      \u003Cstrong>Page 1\u003C\u002Fstrong>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cimg src=\"assets\u002Fpaper_gallery\u002Fother_run_1_results.png\" alt=\"Output 2 evidence page\" width=\"220\" \u002F>\u003Cbr \u002F>\n      \u003Cstrong>Evidence Page\u003C\u002Fstrong>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd valign=\"top\" width=\"23%\">\n      \u003Cstrong>Output 3\u003C\u002Fstrong>\u003Cbr \u002F>\n      \u003Cem>Attention Sink Onset in Tiny Transformers\u003C\u002Fem> A controlled factorial study. The chosen pages show the opening page and a later structured overview page with visual decomposition.\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cimg src=\"assets\u002Fpaper_gallery\u002Fother_run_2_page1.png\" alt=\"Output 3 page 1\" width=\"220\" \u002F>\u003Cbr \u002F>\n      \u003Cstrong>Page 1\u003C\u002Fstrong>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cimg src=\"assets\u002Fpaper_gallery\u002Fother_run_2_overview.png\" alt=\"Output 3 overview page\" width=\"220\" \u002F>\u003Cbr \u002F>\n      \u003Cstrong>Overview Page\u003C\u002Fstrong>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd valign=\"top\" width=\"23%\">\n      \u003Cstrong>Output 4\u003C\u002Fstrong>\u003Cbr \u002F>\n      \u003Cem>HSOD: Harmonic Spectral Operator Decomposition\u003C\u002Fem> A stability-focused time-series study. The pair below shows the framing page and a later page with dense training-dynamics plots.\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cimg src=\"assets\u002Fpaper_gallery\u002Fother_run_3_page1.png\" alt=\"Output 4 page 1\" width=\"220\" \u002F>\u003Cbr \u002F>\n      \u003Cstrong>Page 1\u003C\u002Fstrong>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" valign=\"top\">\n      \u003Cimg src=\"assets\u002Fpaper_gallery\u002Fother_run_3_results.png\" alt=\"Output 4 analysis page\" width=\"220\" \u002F>\u003Cbr \u002F>\n      \u003Cstrong>Analysis Page\u003C\u002Fstrong>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### 🧑‍🔬 Human-in-the-Loop in Practice\n\nThe example run is interesting not because the AI was left alone, but because the human intervened at critical moments:\n\n- **Stage 02** narrowed the project to a single core claim.\n- **Stage 04** pushed the system to download real datasets and run actual pre-checks.\n- **Stage 05** forced experimentation to continue until real benchmark results were obtained.\n- **Stage 06** redirected the story away from leaderboard-only framing toward mechanism-driven analysis.\n\nThat is the intended shape of AutoR:\nAI handles execution load; humans steer the research when direction actually matters.\n\n## 🚀 Quick Start\n\n### 🧰 Prerequisites\n\n- Python 3.10+\n- Claude CLI or Codex CLI available on `PATH` for real runs\n- Local TeX tools are helpful for Stage 07, but not required for smoke tests\n- For `--research-diagram` (Gemini-generated method illustration inserted into the LaTeX paper):\n  - `pip install google-genai` (the `google.genai` SDK is **not** a default dependency; if it is missing the diagram step prints `Diagram generation failed: No module named 'google'` and the rest of the run continues unaffected)\n  - A Gemini API key exposed via `GOOGLE_API_KEY` or `GEMINI_API_KEY`, or a local `configs\u002Fdiagram_config.yaml` (see `configs\u002Fdiagram_config.template.yaml`)\n\n### ⌨️ Common Commands\n\n| Goal | Command |\n| --- | --- |\n| Start a new run | `python main.py` |\n| Start with an explicit goal | `python main.py --goal \"Your research goal here\"` |\n| Start with preloaded resources | `python main.py --goal \"Your research goal here\" --resources paper.pdf refs.bib data.csv` |\n| Run a local smoke test without a real agent backend | `python main.py --fake-operator --goal \"Smoke test\"` |\n| Run with the automated reviewer gate | `python main.py --full-auto --goal \"Your research goal here\"` |\n| Choose the execution backend | `python main.py --operator claude` or `python main.py --operator codex` |\n| Choose the reviewer backend separately | `python main.py --full-auto --review-operator claude --review-model opus` |\n| Choose a Claude model | `python main.py --operator claude --model sonnet` or `python main.py --operator claude --model opus` |\n| Start with Codex | `python main.py --operator codex --model default --goal \"Your research goal here\"` |\n| Choose a writing venue profile | `python main.py --venue neurips_2025` or `python main.py --venue nature` or `python main.py --venue jmlr` |\n| Resume the latest run | `python main.py --resume-run latest` |\n| Redo a stage inside the same run | `python main.py --resume-run 20260329_210252 --redo-stage 03` |\n| Roll back to a stage inside the same run | `python main.py --resume-run 20260329_210252 --rollback-stage 03` |\n\nIf `--venue` is omitted, AutoR defaults to `neurips_2025`.\n\n`--full-auto` does not change the stage pipeline. It only replaces the manual approval menu with a strict reviewer agent. This is useful for unattended sweeps, overnight runs, and dry-run automation, but the default human-reviewed mode is still the recommended path for serious research work.\n\nValid stage identifiers include `03`, `3`, and `03_study_design`.\n\n### Studio (browser UI)\n\nAutoR Studio is a local web UI that drives the same real Claude-backed pipeline through a browser instead of a terminal. Human-in-the-loop approval, feedback, stage re-runs, live session traces, and the compiled paper all live in one page.\n\n```bash\n# Start the Studio server (default: http:\u002F\u002F127.0.0.1:8000)\npython studio.py\n\n# Then open the UI in your browser:\n#   http:\u002F\u002F127.0.0.1:8000\u002Fstudio\u002F\n```\n\nOptions:\n\n```bash\npython studio.py --port 8765                    # custom port\npython studio.py --host 0.0.0.0 --port 8000     # bind externally\npython studio.py --runs-dir \u002Fpath\u002Fto\u002Fruns       # override runs directory\n```\n\nWhat you can do in the Studio:\n\n- **Create a project** from the hub — fill in the title + thesis, click **Create Project**, and a real Claude-backed run starts immediately\n- **Watch stages run live** on the Overview page — stage strip, pulsing current stage, live session trace streaming real Claude tool calls from `logs_raw.jsonl`\n- **Review & Approve** — the Review page shows a \"You are reviewing\" hero card with a TL;DR extracted from the stage markdown, a Files Produced pill list, and an `✅ Approve → Advance to \u003Cnext stage>` button\n- **Send Feedback & Re-run** — feedback is woven into the **first attempt's prompt** of the next run (not wasted on an intermediate Claude call). Works on `human_review` AND `failed` stages\n- **Resume across restarts** — if you stop the server and come back, clicking Approve\u002FFeedback lazy-resumes the existing on-disk run without re-running stages that already have a draft\n- **Paper preview** — the Paper tab renders the compiled PDF, the LaTeX sources, and the build log\n- **Versions page** — the full checkpoint\u002Fattempt timeline for every stage\n\nThe Studio requires the **Claude CLI** (`claude` on `PATH`) since every run is a real Claude-driven pipeline. If `claude` isn't installed, the server fails fast at startup with a clear error.\n\n## ⚙️ How It Works\n\nAutoR uses a 9-stage research workflow: one optional intake stage plus eight formal research stages.\n\n0. `00_intake` (optional)\n1. `01_literature_survey`\n2. `02_hypothesis_generation`\n3. `03_study_design`\n4. `04_implementation`\n5. `05_experimentation`\n6. `06_analysis`\n7. `07_writing`\n8. `08_dissemination`\n\n### The 9 Stages\n\n| Stage | Role | What the human should check |\n| --- | --- | --- |\n| `00_intake` | Align the research goal, resources, constraints, target venue, and success criteria before formal work begins. | Answer the clarification questions, add missing constraints, and make sure the project is narrow enough to execute. |\n| `01_literature_survey` | Build the related-work base, collect evidence, organize papers, and identify the real gap. | Reject shallow paper lists; require task framing, benchmarks, baselines, differences, and structured literature files. |\n| `02_hypothesis_generation` | Convert the broad direction into testable hypotheses and provisional paper claims. | Push for one main claim plus measurable secondary hypotheses instead of an unfocused idea list. |\n| `03_study_design` | Turn the hypothesis into an executable experimental plan. | Check datasets, metrics, baselines, ablations, budgets, failure criteria, and machine-readable data artifacts. |\n| `04_implementation` | Build the runnable code, configs, data preparation, and sanity checks. | Do not approve skeletons; require executable scripts, reproducible commands, and logs or checks showing the path runs. |\n| `05_experimentation` | Run the planned experiments and write machine-readable results. | Distinguish smoke tests from real experiments; require baselines, repeats, result files, and failure records. |\n| `06_analysis` | Interpret the results, create figures, analyze failures, and refine the evidence story. | Require real plots, ablations, error analysis, and explanations rather than metric narration. |\n| `07_writing` | Produce venue-aware manuscript sources, bibliography, compiled PDF, and writing checks. | Verify that every major claim is backed by artifacts, experiments, or citations. |\n| `08_dissemination` | Package the run for review, release, reproduction, or external presentation. | Confirm that readiness notes, review materials, manifests, and outward-facing deliverables exist. |\n\n```mermaid\nflowchart TD\n    A[Start or resume run] --> G0{Skip intake?}\n    G0 -- Yes --> S1[01 Literature Survey]\n    G0 -- No --> I0[00 Intake]\n    I0 --> H0{Human approval}\n    H0 -- Refine --> I0\n    H0 -- Approve --> S1[01 Literature Survey]\n    H0 -- Abort --> X[Abort]\n\n    S1 --> H1{Human approval}\n    H1 -- Refine --> S1\n    H1 -- Approve --> S2[02 Hypothesis Generation]\n    H1 -- Abort --> X[Abort]\n\n    S2 --> H2{Human approval}\n    H2 -- Refine --> S2\n    H2 -- Approve --> S3[03 Study Design]\n    H2 -- Abort --> X\n\n    S3 --> H3{Human approval}\n    H3 -- Refine --> S3\n    H3 -- Approve --> S4[04 Implementation]\n    H3 -- Abort --> X\n\n    S4 --> H4{Human approval}\n    H4 -- Refine --> S4\n    H4 -- Approve --> S5[05 Experimentation]\n    H4 -- Abort --> X\n\n    S5 --> H5{Human approval}\n    H5 -- Refine --> S5\n    H5 -- Approve --> S6[06 Analysis]\n    H5 -- Abort --> X\n\n    S6 --> H6{Human approval}\n    H6 -- Refine --> S6\n    H6 -- Approve --> S7[07 Writing]\n    H6 -- Abort --> X\n\n    S7 --> H7{Human approval}\n    H7 -- Refine --> S7\n    H7 -- Approve --> S8[08 Dissemination]\n    H7 -- Abort --> X\n\n    S8 --> H8{Human approval}\n    H8 -- Refine --> S8\n    H8 -- Approve --> Z[Run complete]\n    H8 -- Abort --> X\n```\n\n### Stage Attempt Loop\n\n```mermaid\nflowchart TD\n    A[Build prompt from template + goal + memory + optional feedback] --> B[Start or resume stage session]\n    B --> C[Backend agent writes draft stage summary]\n    C --> D[Validate markdown and required artifacts]\n    D --> E{Valid?}\n    E -- No --> F[Repair, normalize, or rerun current stage]\n    F --> A\n    E -- Yes --> G[Present validated draft for human review]\n    G --> H{Human choice}\n    H -- 1 or 2 or 3 --> I[Continue current stage conversation with AI refinement]\n    I --> A\n    H -- 4 --> J[Continue current stage conversation with custom feedback]\n    J --> A\n    H -- 5 --> K[Promote approved summary and append to memory.md]\n    K --> L[Continue to next stage]\n    H -- 6 --> X[Abort]\n```\n\n### Approval semantics\n\n- Stage 00 has a dedicated manual intake flow. On the first pass, AutoR asks the clarification questions one by one with selectable options, custom answers, and skip. On the revised pass, the user sees a compact intake brief and chooses refine, approve, or abort.\n- Stages 01-08 use the standard six-action review menu: `1 \u002F 2 \u002F 3` continue with an AI refinement suggestion, `4` continues with custom feedback, `5` approves, and `6` aborts.\n\nThe stage loop is controlled by AutoR, not by Claude.\n\n## ✅ Validation Bar\n\nAutoR does not consider a run successful just because it generated a plausible markdown summary.\n\n| Stage | Required non-toy output |\n| --- | --- |\n| Stage 03+ | Machine-readable data under `workspace\u002Fdata\u002F` |\n| Stage 05+ | Machine-readable results under `workspace\u002Fresults\u002F` |\n| Stage 06+ | Real figure files under `workspace\u002Ffigures\u002F` |\n| Stage 07+ | Manuscript sources plus a compiled PDF |\n| Stage 08+ | Review and readiness assets under `workspace\u002Freviews\u002F` |\n\nRequired stage summary shape:\n\n```md\n# Stage X: \u003Cname>\n\n## Objective\n## Previously Approved Stage Summaries\n## What I Did\n## Key Results\n## Files Produced\n## Suggestions for Refinement\n## Your Options\n```\n\nAdditional rules:\n\n- exactly 3 numbered refinement suggestions\n- the fixed 6 user options\n- no `[In progress]`, `[Pending]`, `[TODO]`, `[TBD]`, or similar placeholders\n- concrete file paths in `Files Produced`\n\nIf a run only leaves behind markdown notes, it has not met AutoR's quality bar.\n\n## 📂 Run Layout\n\nEvery run lives entirely inside its own directory.\n\n```text\nruns\u002F\u003Crun_id>\u002F\n├── user_input.txt\n├── memory.md\n├── run_config.json\n├── run_manifest.json\n├── artifact_index.json\n├── intake_context.json\n├── logs.txt\n├── logs_raw.jsonl\n├── prompt_cache\u002F\n├── operator_state\u002F\n├── handoff\u002F\n├── stages\u002F\n└── workspace\u002F\n    ├── literature\u002F\n    ├── code\u002F\n    ├── data\u002F\n    ├── results\u002F\n    ├── writing\u002F\n    ├── figures\u002F\n    ├── artifacts\u002F\n    ├── notes\u002F\n    └── reviews\u002F\n```\n\n### Workspace Directory Semantics\n\n- `literature\u002F`: reading notes, survey tables, benchmark notes\n- `code\u002F`: runnable code, scripts, configs, implementations\n- `data\u002F`: machine-readable data and manifests\n- `results\u002F`: machine-readable experiment outputs\n- `writing\u002F`: LaTeX sources, sections, bibliography, tables\n- `figures\u002F`: real plots and paper figures\n- `artifacts\u002F`: compiled PDFs and packaged deliverables\n- `notes\u002F`: temporary or supporting research notes\n- `reviews\u002F`: readiness, critique, and dissemination materials\n\n## 🧠 Execution Model\n\nFor each stage attempt, AutoR assembles a prompt from:\n\n1. the stage template from [src\u002Fprompts\u002F](src\u002Fprompts)\n2. the required stage summary contract\n3. execution-discipline constraints\n4. `user_input.txt`\n5. approved `memory.md`\n6. `intake_context.json`, `artifact_index.json`, and, when available, `experiment_manifest.json`\n7. optional refinement feedback\n8. for continuation attempts, the current draft\u002Ffinal stage files and workspace context\n\nThe assembled prompt is written to `runs\u002F\u003Crun_id>\u002Fprompt_cache\u002F`, per-stage session IDs are stored in `runs\u002F\u003Crun_id>\u002Foperator_state\u002F`, and the selected CLI backend is invoked in live streaming mode.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Exact Claude CLI pattern\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nFirst attempt for a stage:\n\n```bash\nclaude --model \u003Cmodel> \\\n  --permission-mode bypassPermissions \\\n  --dangerously-skip-permissions \\\n  --session-id \u003Cstage_session_id> \\\n  -p @runs\u002F\u003Crun_id>\u002Fprompt_cache\u002F\u003Cstage>_attempt_\u003Cnn>.prompt.md \\\n  --output-format stream-json \\\n  --verbose\n```\n\nContinuation attempt for the same stage:\n\n```bash\nclaude --model \u003Cmodel> \\\n  --permission-mode bypassPermissions \\\n  --dangerously-skip-permissions \\\n  --resume \u003Cstage_session_id> \\\n  -p @runs\u002F\u003Crun_id>\u002Fprompt_cache\u002F\u003Cstage>_attempt_\u003Cnn>.prompt.md \\\n  --output-format stream-json \\\n  --verbose\n```\n\n\u003C\u002Fdetails>\n\nImportant behavior:\n\n- refinement attempts reuse the same stage conversation whenever possible\n- streamed agent output is shown live in the terminal\n- raw stream-json output is captured in `logs_raw.jsonl`\n- if resume fails, AutoR can fall back to a fresh session\n- if stage markdown is incomplete, AutoR can repair or normalize it locally\n\n## 🏗️ Architecture\n\nThe main code lives in:\n\n- [main.py](main.py)\n- [src\u002Fmanager.py](src\u002Fmanager.py)\n- [src\u002Foperator.py](src\u002Foperator.py)\n- [src\u002Fintake.py](src\u002Fintake.py)\n- [src\u002Fmanifest.py](src\u002Fmanifest.py)\n- [src\u002Fartifact_index.py](src\u002Fartifact_index.py)\n- [src\u002Fexperiment_manifest.py](src\u002Fexperiment_manifest.py)\n- [src\u002Futils.py](src\u002Futils.py)\n- [src\u002Fwriting_manifest.py](src\u002Fwriting_manifest.py)\n- [src\u002Fplatform\u002Ffoundry.py](src\u002Fplatform\u002Ffoundry.py)\n- [src\u002Fprompts\u002F](src\u002Fprompts)\n\n```mermaid\nflowchart LR\n    A[main.py] --> B[src\u002Fmanager.py]\n    B --> C[src\u002Foperator.py]\n    B --> D[src\u002Fintake.py]\n    B --> E[src\u002Fmanifest.py]\n    B --> F[src\u002Fartifact_index.py]\n    B --> G[src\u002Fexperiment_manifest.py]\n    B --> H[src\u002Futils.py]\n    B --> I[src\u002Fwriting_manifest.py]\n    B --> J[src\u002Fplatform\u002Ffoundry.py]\n    B --> K[src\u002Fprompts\u002F*]\n    C --> H\n```\n\nFile boundaries:\n\n- [main.py](main.py): CLI entry point. Starts a new run, resumes an existing run, collects resources, and exposes redo\u002Frollback controls.\n- [src\u002Fmanager.py](src\u002Fmanager.py): Owns intake plus the 8-stage loop, approval flow, repair flow, resume\u002Fredo\u002Frollback logic, and stage-level continuation policy.\n- [src\u002Foperator.py](src\u002Foperator.py): The shared CLI operator flow used by Claude today and reused by Codex support for stage session state, live streaming, and resume fallback.\n- [src\u002Foperator_codex.py](src\u002Foperator_codex.py): Codex CLI adapter over the same stage contract, including JSON event streaming and stage-local session continuation.\n- [src\u002Fintake.py](src\u002Fintake.py): Resource ingestion, intake context persistence, and prompt formatting for preloaded materials.\n- [src\u002Fmanifest.py](src\u002Fmanifest.py): Lightweight run lifecycle state, stage status tracking, and rollback\u002Fstale invalidation.\n- [src\u002Fartifact_index.py](src\u002Fartifact_index.py): Run-wide artifact indexing over data, results, and figures.\n- [src\u002Fexperiment_manifest.py](src\u002Fexperiment_manifest.py): Standardized experiment bundle summary used by later stages.\n- [src\u002Futils.py](src\u002Futils.py): Stage metadata, prompt assembly, run paths, markdown validation, artifact validation, and handoff helpers.\n- [src\u002Fprompts\u002F](src\u002Fprompts): Per-stage prompt templates.\n\n## 🗂️ Run State\n\nEach run contains `user_input.txt`, `memory.md`, `run_manifest.json`, `artifact_index.json`, `prompt_cache\u002F`, `operator_state\u002F`, `stages\u002F`, `workspace\u002F`, `logs.txt`, and `logs_raw.jsonl`. The substantive research payload lives in `workspace\u002F`.\n\n```mermaid\nflowchart TD\n    A[workspace\u002F] --> B[literature\u002F]\n    A --> C[code\u002F]\n    A --> D[data\u002F]\n    A --> E[results\u002F]\n    A --> F[writing\u002F]\n    A --> G[figures\u002F]\n    A --> H[artifacts\u002F]\n    A --> I[notes\u002F]\n    A --> J[reviews\u002F]\n```\n\nWorkspace directories:\n\n- `literature\u002F`: papers, benchmark notes, survey tables, reading artifacts.\n- `code\u002F`: runnable pipeline code, scripts, configs, and method implementations.\n- `data\u002F`: machine-readable datasets, manifests, processed splits, caches, and loaders.\n- `results\u002F`: machine-readable metrics, predictions, ablations, tables, and evaluation outputs.\n  AutoR also standardizes `results\u002Fexperiment_manifest.json` as a machine-readable summary over result, code, and note artifacts for downstream analysis.\n- `writing\u002F`: manuscript sources, LaTeX, section drafts, tables, and bibliography.\n- `figures\u002F`: plots, diagrams, charts, and paper figures.\n- `artifacts\u002F`: compiled PDFs and packaged deliverables.\n- `notes\u002F`: temporary notes and setup material.\n- `reviews\u002F`: critique notes, threat-to-validity notes, and readiness reviews.\n\nOther run state:\n\n- `memory.md`: approved cross-stage memory only.\n- `run_manifest.json`: machine-readable run and stage lifecycle state.\n- `artifact_index.json`: machine-readable index over `workspace\u002Fdata`, `workspace\u002Fresults`, and `workspace\u002Ffigures`.\n- `prompt_cache\u002F`: exact prompts used for stage attempts and repairs.\n- `operator_state\u002F`: per-stage backend session IDs.\n- `stages\u002F`: draft and promoted stage summaries.\n- `logs.txt` and `logs_raw.jsonl`: workflow logs and raw backend stream output.\n\n## ✅ Validation\n\nAutoR validates both the stage markdown and the stage artifacts.\n\nRequired stage markdown shape:\n\n```md\n# Stage X: \u003Cname>\n\n## Objective\n## Previously Approved Stage Summaries\n## What I Did\n## Key Results\n## Files Produced\n## Suggestions for Refinement\n## Your Options\n```\n\nAdditional markdown requirements:\n\n- Exactly 3 numbered refinement suggestions.\n- The fixed 6 user options.\n- No unfinished placeholders such as `[In progress]`, `[Pending]`, `[TODO]`, or `[TBD]`.\n- Concrete file paths in `Files Produced`.\n\nArtifact requirements by stage:\n\n- Stage 03+: machine-readable data under `workspace\u002Fdata\u002F`\n- Stage 05+: machine-readable results under `workspace\u002Fresults\u002F`\n- Stage 05+: `workspace\u002Fresults\u002Fexperiment_manifest.json` must exist and remain structurally valid\n- Stage 06+: figure files under `workspace\u002Ffigures\u002F`\n- Stage 07+: venue-aware conference or journal-style LaTeX sources plus a compiled PDF under `workspace\u002Fwriting\u002F` or `workspace\u002Fartifacts\u002F`\n- Stage 08+: review and readiness artifacts under `workspace\u002Freviews\u002F`\n\nA run with only markdown notes does not pass validation.\n\n## 📌 Scope\n\n### Included in the current mainline\n\n- optional intake stage and resource ingestion\n- 9-stage workflow: optional intake plus eight formal research stages\n- mandatory human approval after every stage\n- Claude Code or Codex as the execution layer\n- Stage 00 clarification Q&A plus a compact intake approval flow\n- stage-local continuation within the same backend session\n- prompt caching via `@file`\n- live streaming terminal output with keyboard-selectable menus\n- repair passes and local fallback normalization\n- run manifest, rollback, and stale tracking\n- artifact index and experiment manifest\n- stage handoff context\n- manuscript\u002Frelease package generation after approval\n- artifact-aware validation\n- resume, `--redo-stage`, and `--rollback-stage`\n- lightweight venue profiles for Stage 07 writing\n\n### Intentionally out of scope\n\n- generic multi-agent orchestration\n- database-backed runtime state\n- concurrent stage execution\n- heavyweight platform abstractions\n- dashboard-first productization\n\n## 🛣️ Roadmap\n\nThe most valuable next steps are the ones that make AutoR more like a real research workflow, not more like a demo framework.\n\n| Next step | Why it matters |\n| --- | --- |\n| **Deeper cross-stage rollback and invalidation** | Make downstream stale-state handling stronger and more explicit after earlier-stage changes. |\n| **Stronger machine-readable run state** | Extend the current run manifest into a better source of truth for stage status, stale dependencies, and artifact pointers. |\n| **Continuation handoff compression** | Make long stage refinement more stable without bloating context. |\n| **Stronger automated tests** | Cover repair flow, resume fallback, artifact validation, and approval-loop correctness more deeply. |\n| **Richer artifact indexing** | Extend metadata around `data\u002F`, `results\u002F`, `figures\u002F`, and `writing\u002F` without turning AutoR into a heavy platform. |\n| **Frontend run browser** | Add a lightweight UI for browsing runs, stages, logs, and artifacts directly from the run directory. |\n\nImplemented milestone:\n\n- ~~Stage-local continuation sessions.~~ Keep one Claude conversation per stage, reuse it for `1\u002F2\u002F3\u002F4` refinement, and fall back to a fresh session only when resume fails. This is now implemented in the operator and manager flow.\n- ~~Artifact-level validation for non-toy outputs.~~ Enforce machine-readable data, result files, figures, LaTeX sources, PDF output, and review artifacts at the right stages. This is now part of the workflow validation path.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Expanded roadmap notes\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n- Cross-stage rollback and invalidation. When a later stage reveals that an earlier design decision is wrong, the workflow should be able to jump back to an earlier stage and mark downstream stages as stale. This is the biggest current control-flow gap.\n- Machine-readable run manifest. Add a single source of truth such as `run_manifest.json` to track stage status, approval state, stale dependencies, session IDs, and key artifact pointers. This should make both automation and future UI work much cleaner.\n- Continuation handoff compression. Add a short machine-generated stage handoff file that summarizes what is already correct, what is missing, and which files matter most. This should reduce context growth and make continuation more stable over long runs.\n- ~~Result schema and artifact indexing.~~ Standardize `workspace\u002Fdata\u002F`, `workspace\u002Fresults\u002F`, and `workspace\u002Ffigures\u002F` around explicit schemas and generate an artifact index automatically. The workflow now writes `artifact_index.json`, carries basic inferred or declared schema metadata, and feeds the index into later-stage prompt context and the writing manifest.\n- Writing pipeline hardening. Turn Stage 07 into a reliable manuscript production pipeline with stable conference and journal-style writing structures, bibliography handling, table and figure inclusion, and reproducible PDF compilation. The goal is a submission-grade research package, not just writing notes.\n- Review and dissemination package. Expand Stage 08 so it produces readiness checklists, threats-to-validity notes, artifact manifests, release notes, and external-facing research bundles. The final stage should feel like packaging a verifiable research release, not just wrapping up text.\n- Frontend run dashboard. Build a lightweight UI that can browse runs, stage status, summaries, logs, artifacts, and validation failures. It should read from the run directory and manifest rather than introducing a database first.\n- README and open-source assets. Keep refining the README and add `assets\u002F` images such as workflow diagrams, UI screenshots, and artifact examples. This is important for open-source clarity, onboarding, and project presentation.\n\n\u003C\u002Fdetails>\n\n## 🌍 Community\n\nJoin the project community channels:\n\n| Discord | WeChat | WhatsApp |\n| --- | --- | --- |\n| \u003Cimg src=\"assets\u002Fdiscord.jpg\" alt=\"Discord QR\" width=\"180\" \u002F> | \u003Cimg src=\"assets\u002Fwechat.jpg\" alt=\"WeChat QR\" width=\"180\" \u002F> | \u003Cimg src=\"assets\u002Fwhatsapp.jpg\" alt=\"WhatsApp QR\" width=\"180\" \u002F> |\n","AutoR 是一个以人类为中心的研究操作系统，旨在通过AI执行研究任务的同时保持人类对研究方向的掌控。其核心功能包括基于Python 3.10+的终端优先界面、支持9阶段工作流、以及确保每次运行都能生成可审查的研究成果。AutoR采用AI代理作为执行层，并结合严格的人类审批机制来保证研究的质量与准确性。此外，它还提供了强大的恢复功能如重新开始或回滚特定阶段。此项目特别适合需要高可复现性和透明度的科研场景，能够帮助研究人员更高效地管理复杂且耗时的研究过程。",2,"2026-06-11 03:49:54","high_star"]