[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2966":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":13,"forks30d":13,"starsTrendScore":15,"compositeScore":17,"rankGlobal":8,"rankLanguage":8,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":8,"pushedAt":8,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":13,"starSnapshotCount":13,"syncStatus":14,"lastSyncTime":25,"discoverSource":26},2966,"the-verifier-agent","disler\u002Fthe-verifier-agent","disler",null,"TypeScript",143,42,1,0,2,6,24,4.9,"MIT License",false,"main",[],"2026-06-12 02:00:45","# Pi Verifier Agent\n\n> A two-agent system with a custom pi agent harness that treats verification as a first-class problem, not an additional human-in-the-loop workflow.\n>\n> Watch the full breakdown: https:\u002F\u002Fyoutu.be\u002FEnXKysJNz_8\n\n![Pi Verifier Agent — two-agent observer architecture: engineer prompts the BUILDER, BUILDER writes session.jsonl, VERIFIER reads the session JSONL and prompts corrections back to BUILDER, all under a defense-in-depth bash and persona policy](.\u002Fimages\u002Farch.jpg)\n\n## How it works\n\nA two-agent observer system for the [Pi Coding Agent](https:\u002F\u002Fpi.dev): a normal interactive **Builder** runs in your terminal; a sibling **Verifier** Pi runs in its own tmux window with input disabled. After every builder turn, the verifier independently re-runs the work using deterministic read-only tools and prompts the builder back with concrete corrective feedback when verification fails. It closes the review-constraint feedback loop so you can stop hand-checking every \"✅ done.\"\n\nThe pattern is a **top-down observer**: the builder doesn't know the verifier exists. The verifier connects over a unix domain socket, listens for the builder's lifecycle ticks (`start` \u002F `stop` \u002F `error`), and pulls the slice it needs from the builder's session JSONL on disk. When verification fails, the verifier calls its `verifier_prompt` tool — the only thing it can do that touches the builder — and the builder injects the message via `pi.sendUserMessage(deliverAs:\"followUp\")` and runs another turn. The loop repeats up to three times, then escalates to the human.\n\n## Quick start\n\n### Agentic Installation\n\nOpen Claude Code (or any coding agent you like) in this repo:\n\n```bash\n\u002Finstall              # one-time — open Claude Code in this repo and run it\n```\n\nThen run:\n\n```bash\njust v                # boot the generic verifier\n```\n\n### Manual Installation\n\nNo Claude Code? See [Manual install (no agentic coding tool)](#manual-install-no-agentic-coding-tool) below.\n\n### What you'll see on Startup\n\n![Two-agent runtime — builder Pi on the left, verifier Pi on the right, communicating over a unix domain socket while the verifier observes the builder's session JSONL](.\u002Fimages\u002Fverifier.png)\n\n1. The builder Pi opens in your current terminal. The default footer is hidden; instead, the **input box's borders carry the live status**:\n   - Top-right border: `● verifier connected` (or `◌ spawning`, `✗ disconnected`, `⚠ error`)\n   - Bottom-left border: active model id (e.g. `claude-sonnet-4-6`)\n   - Bottom-right border: context-window utilization (e.g. `12%`)\n2. **A new OS-level terminal window opens automatically.** If you're already inside `$TMUX`, you get a sibling tmux window instead. If your `$TERM_PROGRAM` is recognized (Ghostty \u002F iTerm \u002F Apple_Terminal \u002F WezTerm), the launcher targets that emulator directly; otherwise it falls back to a Terminal.app window via `osascript do script` so engineers always see a real window — never a headless tmux session.\n3. The verifier window comes pre-tuned: **mouse scroll enabled**, **bottom green tmux status bar hidden**, **10k-line scrollback**, **OSC52 clipboard** (mouse-drag → Cmd+V works natively). Tmux defaults are session-scoped — your other tmux sessions are untouched.\n4. The verifier's input row is replaced by a colored full-width status bar — input is locked. The bar shows `VERIFIER · \u003Cphase> · \u003CCONFIDENCE>` and updates live (e.g. `VERIFIER · ✓ verified · PERFECT`). The bar's **background color reflects the confidence grade**: green for PERFECT\u002FVERIFIED, orange for PARTIAL\u002FFEEDBACK, red for FAILED, purple while idle\u002Fverifying.\n5. Type a normal builder prompt. After the builder finishes its turn, the verifier auto-runs verification.\n6. If verification fails, the verifier calls `verifier_prompt` with concrete corrective feedback. The builder receives it as a follow-up user message (via `pi.sendUserMessage(deliverAs:\"followUp\")`) and runs another turn fixing the issue.\n7. Loop repeats up to `max_loops: 3` (configurable per persona). On the 4th attempt, the builder surfaces an \"escalating to human\" message instead of auto-injecting another correction.\n8. On `Ctrl+D` of the builder, the verifier window closes, the unix socket is unlinked, and the breadcrumb at `.pi\u002Fstate\u002Fverifier-\u003Csid>.sock.ref` is cleaned up. (`just clean` also force-tears-down anything stale from a prior crash.)\n\n## What this unlocks\n\n### Spend tokens to save time\n\nEngineers spend roughly half their day reviewing agent output. Every \"I created the table,\" \"I added the foreign key,\" \"I applied the migration\" gets re-checked by hand. That review work is the binding constraint on agentic engineering throughput. The verifier moves it onto a second agent.\n\nIf you optimize for tokens, this looks wasteful — you're spending 2–5× more compute. If you optimize for *time*, it's the highest-leverage trade you can make. Tokens are cheap. Your time is not. Spend tokens to generate value and time.\n\n### Break the review constraint\n\nAgentic engineering has two binding constraints: **how much you can plan**, and **how much you can review**. Most engineers stall on review. The verifier collapses that side of the constraint into a parallel agent whose entire job is re-checking, deterministically, with read-only tools.\n\nThe verifier's job is **decomposition**: break every claim into the smallest atomic unit that can be independently proven or disproven, then verify each against actual state. A single `PASS` that hides three unverified sub-claims is worse than three explicit `FAIL`s.\n\n### Templated engineering as a habit\n\nThe verifier is structurally **un-promptable**. Its input bar is locked; you cannot drop one-off instructions into it. The only thing that drives it is `verify_on_stop.md` rendered against the builder's `stop` event.\n\nThat sounds annoying. It's the point.\n\nYou can't fix bugs by typing at the verifier — you fix them by editing the persona, the script, or the prompt template. Improvements solve the entire problem class, not the one instance you happened to hit. Every gap turns into reusable engineering. There's no falling back to vibe-coding the fix.\n\n### Trust + Scale, with a positive feedback loop\n\nThe verifier compounds. Every `## Report` block lists what it **could not verify** — missing oracles, no fixture, no harness, ambiguous claim. That gap becomes the next thing you template into the persona or build a domain script around. The verifier teaches you what your verifier is missing.\n\nThis is how you build the system that builds the system.\n\n### Multi-agent orchestration > a smarter single model\n\nEvery model benchmark you see runs a single model in isolation. That's not how the highest-leverage engineers operate. They stack intelligence. They orchestrate models. **GPT 5.5 *and* Opus 4.7**, not *or*. The verifier is the simplest concrete instance of multi-agent orchestration: two specialized agents — one builds, one verifies — coordinated through a tight architectural seam (a unix socket and a session JSONL).\n\n### Defense-in-depth on the bash tool\n\nThe bash tool is the most dangerous tool you give an agent. The verifier persona declares its tool surface as `read, grep, find, ls, bash, verifier_prompt` — no `write`, no `edit` — and the persona body restricts bash to read-only commands. Domain-specific personas (sql, python, image-gen) can pin bash to a single allowlisted script: that's the highest level of control you can give an agent. Anything outside the script is blocked.\n\n## Architecture\n\n```\n   ┌──────────────────────────────┐         ┌──────────────────────────────┐\n   │  Your Terminal               │         │  New OS Window  (or sibling  │\n   │  (Ghostty \u002F iTerm \u002F ...)     │         │   tmux window if in $TMUX)   │\n   │                              │         │                              │\n   │  ┌────────────────────────┐  │         │  ┌────────────────────────┐  │\n   │  │  pi  (BUILDER)         │  │         │  │  pi  (VERIFIER --child)│  │\n   │  │  verifiable.ts         │◄─┼─unix────┼─►│  verifier.ts           │  │\n   │  │  + socket server       │  │ socket  │  │  + status-bar editor   │  │\n   │  │  + lifecycle forwarder │  │ JSONL   │  │  + input lock          │  │\n   │  └─────────┬──────────────┘  │         │  │  + verifier_prompt tool│  │\n   │            │                 │         │  └─────────┬──────────────┘  │\n   └────────────┼─────────────────┘         └────────────┼─────────────────┘\n                │                                        │\n                │ writes session JSONL                   │ reads\n                ▼                                        ▼\n        ~\u002F.pi\u002Fagent\u002Fsessions\u002F\u003Csid>.jsonl  ──────────────►   (builder transcript)\n```\n\nOne Pi binary, two roles. The builder owns a unix domain socket at `\u002Ftmp\u002Fpi-verifier\u002F\u003CsessionId>.sock` (short path so we sidestep macOS's 104-byte `sun_path` limit, `chmod 0700` so only the owning UID can connect — that is the authentication). The builder pushes lifecycle ticks only — never transcript content. The verifier pulls the substantive content it needs from the builder's session JSONL on disk and runs verification with read-only tools.\n\n### Direction matrix\n\n```\nverifier ──► builder           builder ──► verifier\n─────────────────────          ──────────────────────\nhello                          hello_ack            ← handshake\nprompt   (correction text)     prompt_ack           ← receipt confirmation\nreport   (rendered inline)     event                ← lifecycle channel\n\nBidirectional: ping \u002F pong (10s liveness), bye (clean teardown).\n```\n\nAll envelopes are TypeScript discriminated unions on `type`, JSONL-framed (one JSON object per line, terminated by `\\n` — split on `\\n` only, never via Node's `readline` which would split on `U+2028` \u002F `U+2029` embedded in JSON strings).\n\n### The CONFIDENCE ladder\n\nThe verifier emits a `CONFIDENCE:` line under `STATUS:` on every Report. The grade encodes both completeness AND outcome:\n\n| Level | Meaning | Bar color |\n|---|---|---|\n| `PERFECT` | Every claim verified, zero gaps, no feedback | 🟢 green |\n| `VERIFIED` | All checked passed, minor non-blocking gaps | 🟢 green |\n| `PARTIAL` | No failures, but significant unverifiable gaps | 🟠 orange |\n| `FEEDBACK` | At least one claim failed, `verifier_prompt` called (system working as designed) | 🟠 orange |\n| `FAILED` | Couldn't verify at all — escalating to human | 🔴 red |\n\n## Manual install (no agentic coding tool)\n\n### Prerequisites\n\n- **Node 20+** and `npm`\n- **tmux** — `brew install tmux` on macOS, `apt install tmux` on Debian\u002FUbuntu\n- **Pi Coding Agent** (`pi` on your PATH), authenticated against an LLM provider\n- **just** (recommended) — `brew install just`\n- macOS or Linux. Windows-native is untested — use WSL.\n\n### Setup\n\n```bash\ngit clone \u003Cyour-fork-or-this-repo>\ncd the-verifier-agent\ncd apps\u002Fverifier && npm install && cd ..\u002F..\n```\n\n### `.env`\n\nBoth the builder and the verifier load `.env` from your **current working directory** on `session_start`. Drop your provider API keys (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `DEEPSEEK_API_KEY`, …) into a project-local `.env` and **both agents see them automatically** — no shell-config gymnastics. Existing `process.env` always wins; `.env` only fills gaps.\n\n### Recipes\n\n```bash\njust                  # list all recipes\njust verifier         # builder + auto-spawn verifier\njust clean            # kill stale verifier-* tmux sessions, sockets, breadcrumbs\njust prime            # prime context in an interactive Claude Code session\n```\n\n## Known limitations\n\n- **One verifier per builder** (server-side enforced — a duplicate connection gets `bye {reason: \"duplicate connection\"}`).\n- **Late-attach across processes is not supported.** Use `\u002Fverify` from the same builder Pi to spawn its own verifier.\n- **Persona selection** via `--verifier-agent \u003Cname>`. Defaults to the generic `verifier`. Drop a sibling persona file in `.pi\u002Fverifier\u002Fagents\u002F` and select it without editing source.\n- **The verifier's persona body is rendered into `--system-prompt` as a full overwrite.** Pi's default system prompt is replaced, not appended to — by design.\n- **Read-only is by tool surface, not by sandbox.** Don't load untrusted personas.\n- **Windows-native is untested.** Use WSL.\n\n## Master Agentic Coding\n\n> Prepare for the future of software engineering\n\nLearn tactical agentic coding patterns with [Tactical Agentic Coding](https:\u002F\u002Fagenticengineer.com\u002Ftactical-agentic-coding?y=verifier) — the course teaches you to build systems that build the system: own your agent harness, control the core four (context, model, prompt, tools), lock down bash, and orchestrate specialized agents that outperform any single model alone.\n\nFollow the [IndyDevDan YouTube channel](https:\u002F\u002Fwww.youtube.com\u002F@indydevdan) to keep your agentic coding advantage compounding.\n\n## License\n\nMIT — see [LICENSE](.\u002FLICENSE).\n","Pi Verifier Agent 是一个双代理系统，旨在将代码验证作为首要问题处理，而非依赖人工流程。其核心功能包括通过定制的pi代理框架实现自动化的构建与验证过程，其中一个Builder代理负责执行任务并生成session.jsonl文件，而Verifier代理则独立地使用确定性只读工具重运行这些任务，并在验证失败时向Builder提供具体的修正反馈。该系统采用顶级观察者模式，确保了构建者与验证者之间的有效沟通，同时减少了手动检查的需求。适用于需要高效、自动化代码审查和验证的开发场景，特别是在持续集成\u002F持续部署（CI\u002FCD）环境中能够显著提高软件质量与开发效率。","2026-06-11 02:51:59","CREATED_QUERY"]