[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74773":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":28,"discoverSource":29},74773,"pi-autoresearch","davebcn87\u002Fpi-autoresearch","davebcn87","Autonomous experiment loop extension for pi",null,"TypeScript",6975,412,24,2,0,27,111,416,81,38.85,"MIT License",false,"main",[],"2026-06-12 02:03:28","\u003Cdiv align=\"center\">\n\u003Cimg  height=\"120\" alt=\"result\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc66cbd02-4491-4833-a63a-142cfd7530c1\" \u002F>\n\n# pi-autoresearch\n### Autonomous experiment loop for pi\n**[Install](#install)** · **[Usage](#usage)** · **[How it works](#how-it-works)**\n\n\u003C\u002Fdiv>\n\n*Try an idea, measure it, keep what works, discard what doesn't, repeat forever.*\n\nAn extension for **[pi](https:\u002F\u002Fpi.dev\u002F)** — an AI coding agent that runs in your terminal. pi-autoresearch gives pi the tools and workflow to run autonomous optimization loops: try an idea, benchmark it, keep improvements, revert regressions, repeat.\n\nInspired by [karpathy\u002Fautoresearch](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch). Works for any optimization target: test speed, bundle size, LLM training, build times, Lighthouse scores.\n\n---\n\n\u003Cimg width=\"1736\" height=\"518\" alt=\"pi-autoresearch\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F5078aa31-3530-406a-85fc-bdeff98679a6\" \u002F>\n\n---\n\n## Quick start\n\n```bash\npi install npm:pi-autoresearch\n```\n\n## What's included\n\n| | |\n|---|---|\n| **Extension** | Tools + live widget + `\u002Fautoresearch` dashboard |\n| **Skill** | Gathers what to optimize, writes session files, starts the loop |\n\n### Extension tools\n\n| Tool | Description |\n|------|-------------|\n| `init_experiment` | One-time session config — name, metric, unit, direction |\n| `run_experiment` | Runs any command, times wall-clock duration, captures output |\n| `log_experiment` | Records result, auto-commits, updates widget and dashboard |\n\n### `\u002Fautoresearch` command\n\n| Subcommand | Description |\n|------------|-------------|\n| `\u002Fautoresearch \u003Ctext>` | Enter autoresearch mode. If `autoresearch.md` exists, resumes the loop with `\u003Ctext>` as context. Otherwise, sets up a new session. |\n| `\u002Fautoresearch off` | Leave autoresearch mode. Stops auto-resume and clears runtime state but keeps `autoresearch.jsonl` intact. |\n| `\u002Fautoresearch clear` | Delete `autoresearch.jsonl`, reset all state, and turn autoresearch mode off. Use this for a clean start. |\n| `\u002Fautoresearch export` | Open a live dashboard in your browser. Auto-updates as experiments run. |\n\n**Examples:**\n\n```\n\u002Fautoresearch optimize unit test runtime, monitor correctness\n\u002Fautoresearch model training, run 5 minutes of train.py and note the loss ratio as optimization target\n\u002Fautoresearch export\n\u002Fautoresearch off\n\u002Fautoresearch clear\n```\n\n### Keyboard shortcuts\n\n| Shortcut     | Description |\n|--------------|-------------|\n| `Ctrl+Shift+T` | Toggle dashboard expand\u002Fcollapse (inline widget ↔ full results table above the editor) |\n| `Ctrl+Shift+F` | Open fullscreen scrollable dashboard overlay. Navigate with `↑`\u002F`↓`\u002F`j`\u002F`k`, `PageUp`\u002F`PageDown`\u002F`u`\u002F`d`, `g`\u002F`G` for top\u002Fbottom, `Escape` or `q` to close. |\n\nTo avoid conflicts with other pi extensions, override or disable these shortcuts in\n`\u003Cagent-dir>\u002Fextensions\u002Fpi-autoresearch.json`. `\u003Cagent-dir>` is the active pi profile\nconfig directory (usually `~\u002F.pi\u002Fagent`, or `PI_CODING_AGENT_DIR` when set):\n\n```json\n{\n  \"shortcuts\": {\n    \"toggleDashboard\": \"ctrl+shift+y\",\n    \"fullscreenDashboard\": null\n  }\n}\n```\n\nUse `null` to skip registering a shortcut. Omitted shortcuts keep their defaults.\n\n### UI\n\n- **Status widget** — always visible above the editor: `🔬 autoresearch 12 runs 8 kept │ ★ total_µs: 15,200 (-12.3%) │ conf: 2.1×`\n- **Confidence score** — after 3+ runs, shows how the best improvement compares to the session noise floor. ≥2.0× (green) = likely real, 1.0–2.0× (yellow) = above noise but marginal, \u003C1.0× (red) = within noise.\n- **Expanded dashboard** — `Ctrl+Shift+T` expands the widget into a full results table with columns for commit, metric, status, and description.\n- **Fullscreen overlay** — `Ctrl+Shift+F` opens a scrollable full-terminal dashboard. Shows a live spinner with elapsed time for running experiments.\n\n### Skills\n\n**`autoresearch-create`** asks a few questions (or infers from context) about your goal, command, metric, and files in scope — then writes two files and starts the loop immediately:\n\n**`autoresearch-finalize`** turns a noisy autoresearch branch into clean, independent branches — one per logical change, each starting from the merge-base. Groups must not share files, so each branch can be reviewed and merged independently.\n\n**`autoresearch-hooks`** *(optional)* helps author `autoresearch.hooks\u002Fbefore.sh` and `autoresearch.hooks\u002Fafter.sh` for a session. It ships with ten reference scripts in [`skills\u002Fautoresearch-hooks\u002Fexamples\u002F`](skills\u002Fautoresearch-hooks\u002Fexamples\u002F) (external search, learnings journal, native notifications, anti-thrash, idea rotation, and more) — the skill handles the contract, you pick the inspiration. The core autoresearch loop has no hook awareness.\n\n| File | Purpose |\n|------|---------|\n| `autoresearch.md` | Session document — objective, metrics, files in scope, what's been tried. A fresh agent can resume from this alone. |\n| `autoresearch.sh` | Benchmark script — pre-checks, runs the workload, outputs `METRIC name=number` lines. |\n| `autoresearch.checks.sh` | *(optional)* Backpressure checks — tests, types, lint. Runs after each passing benchmark. Failures block `keep`. |\n| `autoresearch.hooks\u002F` | *(optional)* Executable scripts (`before.sh`, `after.sh`) that fire around iterations. Stdout is delivered to the agent as a steer message. |\n\n---\n\n## Install\n\n```bash\npi install npm:pi-autoresearch\n```\n\n\u003Cdetails>\n\u003Csummary>Manual install\u003C\u002Fsummary>\n\n```bash\ncp -r extensions\u002Fpi-autoresearch ~\u002F.pi\u002Fagent\u002Fextensions\u002F\ncp -r skills\u002Fautoresearch-create ~\u002F.pi\u002Fagent\u002Fskills\u002F\n```\n\nThen `\u002Freload` in pi.\n\n\u003C\u002Fdetails>\n\n---\n\n## Usage\n\n### 1. Start autoresearch\n\n```\n\u002Fskill:autoresearch-create\n```\n\nThe agent asks about your goal, command, metric, and files in scope — or infers them from context. It then creates a branch, writes `autoresearch.md` and `autoresearch.sh`, runs the baseline, and starts looping immediately.\n\n### 2. The loop\n\nThe agent runs autonomously: edit → commit → `run_experiment` → `log_experiment` → keep or revert → repeat. It never stops unless interrupted.\n\nEvery result is appended to `autoresearch.jsonl` in your project — one line per run. This means:\n\n- **Survives restarts** — the agent can resume a session by reading the file\n- **Survives context resets** — `autoresearch.md` captures what's been tried so a fresh agent has full context\n- **Human readable** — open it anytime to see the full history\n- **Branch-aware** — each branch has its own session\n\n### 3. Finalize into reviewable branches\n\n```\n\u002Fskill:autoresearch-finalize\n```\n\nThe agent reads `autoresearch.jsonl`, groups kept experiments into logical changesets, proposes the grouping for your approval, then creates independent branches from the merge-base. Each commit includes metric improvements in the message. Groups must not share files, so branches can be reviewed and merged independently.\n\n### 4. Monitor progress\n\n- **Widget** — always visible above the editor\n- **`Ctrl+Shift+T`** — expand\u002Fcollapse the full results table inline (config key: `shortcuts.toggleDashboard`)\n- **`Ctrl+Shift+F`** — fullscreen scrollable dashboard overlay (config key: `shortcuts.fullscreenDashboard`)\n- **`\u002Fautoresearch export`** — open a live browser dashboard with chart and share card\n- **`Escape`** — interrupt anytime and ask for a summary\n\n---\n\n## Example domains\n\n| Domain | Metric | Command |\n|--------|--------|---------|\n| Test speed | seconds ↓ | `pnpm test` |\n| Bundle size | KB ↓ | `pnpm build && du -sb dist` |\n| LLM training | val_bpb ↓ | `uv run train.py` |\n| Build speed | seconds ↓ | `pnpm build` |\n| Lighthouse | perf score ↑ | `lighthouse http:\u002F\u002Flocalhost:3000 --output=json` |\n\n---\n\n## How it works\n\nThe **extension** is domain-agnostic infrastructure. The **skill** encodes domain knowledge. This separation means one extension serves unlimited domains.\n\n```\n┌──────────────────────┐     ┌──────────────────────────┐\n│  Extension (global)  │     │  Skill (per-domain)       │\n│                      │     │                           │\n│  run_experiment      │◄────│  command: pnpm test       │\n│  log_experiment      │     │  metric: seconds (lower)  │\n│  widget + dashboard  │     │  scope: vitest configs    │\n│                      │     │  ideas: pool, parallel…   │\n└──────────────────────┘     └──────────────────────────┘\n```\n\nTwo files keep the session alive across restarts and context resets:\n\n```\nautoresearch.jsonl   — append-only log of every run (metric, status, commit, description)\nautoresearch.md      — living document: objective, what's been tried, dead ends, key wins\n```\n\nA fresh agent with no memory can read these two files and continue exactly where the previous session left off.\n\n---\n\n## Configuration (optional)\n\nCreate `autoresearch.config.json` in your pi session directory to customize behavior:\n\n```json\n{\n  \"workingDir\": \"\u002Fpath\u002Fto\u002Fproject\",\n  \"maxIterations\": 50\n}\n```\n\n| Field | Type | Description |\n|-------|------|-------------|\n| `workingDir` | string | Override the directory for all autoresearch operations — file I\u002FO, command execution, and git. Supports absolute or relative paths (resolved against the pi session cwd). The config file itself always stays in the session cwd. Fails if the directory doesn't exist. |\n| `maxIterations` | number | Maximum experiments before auto-stopping. The agent is told to stop and won't run more experiments until a new segment is initialized. |\n\n### Long-running loops and context\n\nThe loop is designed to run unattended across context limits. When pi's [auto-compaction](https:\u002F\u002Fgithub.com\u002Fbadlogic\u002Fpi-mono\u002Fblob\u002Fmain\u002Fpackages\u002Fcoding-agent\u002Fdocs\u002Fcompaction.md) summarizes the older portion of the conversation, autoresearch detects the resulting idle and re-prompts the agent to re-read `autoresearch.md`, the tail of `autoresearch.jsonl`, `autoresearch.ideas.md`, and `git log` before continuing. All progress is persisted in those files, so the post-summary turn rehydrates from the source of truth instead of relying on whatever survived compaction. No tuning required — if pi's auto-compaction is enabled (the default), this just works.\n\n---\n\n## Confidence scoring\n\nAfter 3+ experiments in a session, pi-autoresearch computes a **confidence score** — how the best improvement compares to the session's noise floor. This helps distinguish real gains from benchmark jitter, especially on noisy signals like ML training, Lighthouse scores, or flaky benchmarks.\n\n**How it works:**\n\n- Uses [Median Absolute Deviation (MAD)](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FMedian_absolute_deviation) of all metric values in the current segment as a robust noise estimator.\n- Confidence = `|best_improvement| \u002F MAD`. A score of 2.0× means the best improvement is twice the noise floor.\n- Shown in the widget, expanded dashboard, and `log_experiment` output.\n- Persisted to `autoresearch.jsonl` on each result for post-hoc analysis.\n- **Advisory only** — never auto-discards. The agent is guided to re-run experiments when confidence is low, but the final keep\u002Fdiscard decision stays with the agent.\n\n| Confidence | Color | Meaning |\n|-----------|-------|---------|\n| ≥ 2.0× | 🟢 green | Improvement is likely real |\n| 1.0–2.0× | 🟡 yellow | Above noise but marginal |\n| \u003C 1.0× | 🔴 red | Within noise — consider re-running to confirm |\n\n---\n\n## Backpressure checks (optional)\n\nCreate `autoresearch.checks.sh` to run correctness checks (tests, types, lint) after every passing benchmark. This ensures optimizations don't break things.\n\n```bash\n#!\u002Fbin\u002Fbash\nset -euo pipefail\npnpm test --run\npnpm typecheck\n```\n\n**How it works:**\n\n- If the file doesn't exist, everything behaves exactly as before — no changes to the loop.\n- If it exists, it runs automatically after every benchmark that exits 0.\n- Checks execution time does **not** affect the primary metric.\n- If checks fail, the experiment is logged as `checks_failed` (same behavior as a crash — no commit, revert changes).\n- The `checks_failed` status is shown separately in the dashboard so you can distinguish correctness failures from benchmark crashes.\n- Checks have a separate timeout (default 300s, configurable via `checks_timeout_seconds` in `run_experiment`).\n\n---\n\n## Hooks (optional)\n\nDrop executable scripts in `autoresearch.hooks\u002F` to run code at iteration boundaries. Hooks are **transparent to the agent** — the agent calls tools and sees results; hooks run alongside without any agent-facing surface.\n\n- `autoresearch.hooks\u002Fbefore.sh` — fires before every iteration (at `\u002Fautoresearch` activation and at the end of every `log_experiment`, after `after.sh`). Use for prospective work: fetch research, prime context for the next attempt.\n- `autoresearch.hooks\u002Fafter.sh` — fires at the end of every `log_experiment`. Use for retrospective work: annotate learnings, send notifications.\n\n**Contract:**\n\n- Must be executable (`chmod +x`). Preserved on revert like all `autoresearch.*` artefacts.\n- **Stdin** — a JSON object on a single line. Shape depends on the stage (see below). Extract fields with `jq`.\n- **Stdout** is delivered to the agent as a steer message (capped at 8 KB). Empty stdout = silent.\n- Non-zero exit or >30s timeout surfaces an error steer to the agent.\n- Each fire appends a `{\"type\":\"hook\",…}` entry to `autoresearch.jsonl` for observability.\n\n**`before.sh` stdin** (on fresh activation `last_run` is `null`):\n\n```json\n{\n  \"event\": \"before\",\n  \"cwd\": \"\u002Fpath\u002Fto\u002Fworkdir\",\n  \"next_run\": 6,\n  \"last_run\": {\n    \"run\": 5, \"status\": \"discard\", \"metric\": 42.1,\n    \"description\": \"…\",\n    \"asi\": { \"hypothesis\": \"…\", \"next_focus\": \"…\" }\n  },\n  \"session\": {\n    \"metric_name\": \"total_ms\", \"metric_unit\": \"ms\", \"direction\": \"lower\",\n    \"baseline_metric\": 40.7, \"best_metric\": 33.5,\n    \"run_count\": 5, \"goal\": \"optimize sort speed\"\n  }\n}\n```\n\n**`after.sh` stdin:**\n\n```json\n{\n  \"event\": \"after\",\n  \"cwd\": \"\u002Fpath\u002Fto\u002Fworkdir\",\n  \"run_entry\": {\n    \"run\": 6, \"status\": \"discard\", \"metric\": 38.9,\n    \"description\": \"…\",\n    \"asi\": { \"hypothesis\": \"…\", \"learned\": \"…\" }\n  },\n  \"session\": { \"metric_name\": \"total_ms\", \"direction\": \"lower\", \"baseline_metric\": 40.7, \"best_metric\": 33.5, \"run_count\": 6, \"goal\": \"…\" }\n}\n```\n\n**Agent signal.** The agent writes `description` and `asi.*` fields in its `log_experiment` calls for its own future-self reasoning. The hook opportunistically mines whichever fields the agent naturally uses — `asi.hypothesis`, `asi.next_focus`, `description`, etc. There is no dedicated \"hook input\" field; the agent is unaware the hook exists.\n\n**Examples.** Reference scripts for both stages live at [`skills\u002Fautoresearch-hooks\u002Fexamples\u002F`](skills\u002Fautoresearch-hooks\u002Fexamples\u002F) — external search, qmd document search, persistent learnings, native notifications, git tagging, anti-thrash, idea rotator, hypothesis reflection, context rotation. Copy one to your session's `autoresearch.hooks\u002F` directory, adapt, `chmod +x`.\n\n---\n\n## Prerequisites\n\n1. **Install pi** — follow the instructions at [pi.dev](https:\u002F\u002Fpi.dev\u002F)\n2. **An API key** for your preferred LLM provider (configured in pi)\n\n## Controlling costs\n\nAutoresearch loops run autonomously and can burn through tokens. Two ways to cap spend:\n\n- **API key limits** — most providers let you set per-key or monthly budgets. Check your provider's dashboard.\n- **`maxIterations`** — cap experiments per session in `autoresearch.config.json`:\n   ```json\n   {\n     \"maxIterations\": 30\n   }\n   ```\n\n## License\n\nMIT\n","pi-autoresearch 是一个为 pi（一个在终端运行的 AI 编码代理）设计的自主实验循环扩展。它允许用户通过定义优化目标、运行实验并根据结果自动调整，从而实现持续改进。项目采用 TypeScript 开发，提供了一套完整的工具集和工作流程，包括初始化实验配置、执行命令并记录结果等。此外，还配备了实时仪表盘和键盘快捷键来提高用户体验。该扩展适用于多种场景，如测试速度优化、LLM 训练、构建时间减少等，任何需要自动化迭代优化的过程都可以利用此工具提升效率。","2026-06-11 03:50:45","high_star"]