[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82417":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},82417,"Webwright","microsoft\u002FWebwright","microsoft","A simple SWE style browser agent framework that achieves SOTA results on long horizon web tasks. ","",null,"Python",5296,326,17,7,0,167,709,828,501,113.54,"MIT License",false,"main",true,[],"2026-06-12 04:01:38","# Webwright\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fwebwright_logo.svg\" alt=\"Webwright logo\" width=\"320\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cb>Turn Your Coding Models to Be State-of-the-art Browser Agents\u003C\u002Fb>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-%E2%89%A53.10-blue?logo=python&logoColor=white\" alt=\"Python\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fplaywright-chromium-green\" alt=\"Playwright\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fbackends-OpenAI%20%7C%20Anthropic%20%7C%20OpenRouter-orange\" alt=\"Backends\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ffootprint-%E2%89%A4~1.5k%20LoC-brightgreen\" alt=\"Footprint\">\n\u003C\u002Fp>\n\n- 📝 **Blog:** [Webwright: A Terminal Is All You Need For Web Agents](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Farticles\u002Fwebwright-a-terminal-is-all-you-need-for-web-agents\u002F)\n- 🌐 **Project Page:** [microsoft.github.io\u002FWebwright](https:\u002F\u002Fmicrosoft.github.io\u002FWebwright\u002F)\n\nWebwright gives LLM a terminal where it can launch multiple browser sessions to inspect the page and complete a web task. It captures and inspects page screenshots\u002Fstates only when needed. It enforces each web task to be completed end-to-end within a re-runnable Python script, i.e. your web agent browsing history is a single code file. No multi-agent system, no graph engine, no plugin layer, no hidden orchestration — just a terminal, a browser, and a model.\n\nAlready got your favorite agents, and wonder how to make Claude Code, Codex, Hermes, OpenClaw more capable in browser tasks? Consider adding [Webwright plugin\u002Fskills](#-use-as-a-claude-code-skill)!\n\n---\n\n## 📰 News\n\n- **2026-05-11** — Support Task2UI mode: Webwright completes the task and renders task results into an HTML-based web app you can easily view and reuse.  \n- **2026-05-06** — Codex and Claude Code plugin manifests added; install via `\u002Fplugin install webwright@webwright`. OpenClaw and Hermes Agent integrations shipped; the same `skills\u002Fwebwright\u002F` folder now loads across Claude Code, Codex, OpenClaw, and Hermes.\n- **2026-05-04** — Initial public release: ~1.5k LoC, OpenAI \u002F Anthropic \u002F OpenRouter backends, Playwright environment.\n\n---\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>💡 Motivation: Beyond Step-by-Step Web Interaction in a Stateful Browser\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nMost web agents today treat the browser session itself as the workspace: at each step the model receives the current page state and predicts a single next operation — a click, a type, a DOM selector, or a short tool call. Whatever the format, the agent is locked into predicting one web action at a time inside a predefined interaction loop. That harness was useful when LLMs were weaker. As models get stronger at writing and debugging code, the same harness becomes a bottleneck.\n\nWebwright takes a different stance: **separate the agent from the browser**, and treat the browser as something the agent can launch, inspect, and discard while developing a program. The persistent artifact is not the browser session — it's the **code and logs in the local workspace**.\n\n- 🧱 **Robust, reusable interaction with web environments** — instead of fragile pixel-level actions, a coding agent with a terminal queries elements, waits for conditions, and handles dynamic behaviors like lazy loading or re-rendering. The resulting scripts can be rerun, adapted, and shared across tasks rather than rediscovered from scratch.\n- ⚡ **Efficient composition of complex workflows** — multi-step interactions like selecting a date or filling a form become a compact program. Loops, functions, and abstractions let the agent generalize across similar tasks (e.g. different dates) without re-predicting the same low-level sequences. Fewer interaction rounds, faster execution, less error accumulation on long horizons.\n- 🧪 **Workspace-as-state, not browser-as-state** — the agent can write exploratory scripts, spawn fresh browser sessions, and decide for itself when to capture screenshots and inspect failures, much like a human engineer iterating on an RPA script.\n- 🪄 **Surprisingly effective despite being minimal** — this stripped-down setup turns out to handle complex and especially long-horizon web tasks well (see [Performance](#-performance)).\n\n\u003C\u002Fdetails>\n\n---\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>🌟 Why Webwright\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nMost web agent frameworks bury the actual agent loop under layers of abstractions. Webwright takes the opposite stance:\n\n- 🪶 **Lightweight by design** — core agent loop in a single ~450-line file, Playwright environment in ~570 lines, CLI in ~150 lines.\n- 🧩 **Pluggable model backends** — OpenAI, Anthropic, and OpenRouter, each ~150–200 lines.\n- 🔍 **Zero hidden frameworks** — just `httpx`, `pydantic`, `playwright`, and `typer`.\n- 🔁 **Flat prompt → observe → execute script loop** — readable end-to-end, easy to debug, easy to fork.\n- 🧪 **Run-artifact first** — every run writes trajectories and screenshots to disk for inspection.\n\nIf you want a minimal, easy-to-debug starting point for browser-using agents instead of another heavyweight platform, this is it.\n\n\u003C\u002Fdetails>\n\n---\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>🆚 How Webwright Differs From Other Browser-Agent Repos\u003C\u002Fstrong>\u003C\u002Fsummary>\n\nHow they differ at the architectural level:\n\n|                     | **Stagehand (Browserbase)**                                  | **agent-browser (Vercel)**                                                | **browser-use**                                       | **Webwright**                                                       |\n| ------------------- | ------------------------------------------------------------ | ------------------------------------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------------------------------- |\n| **Paradigm**        | Hybrid: code + NL primitives (`act` \u002F `extract` \u002F `agent`)   | CLI tool that *another* agent (Claude Code, Codex, etc.) calls            | Autonomous LLM agent loop over DOM\u002FAX snapshots       | **Coding agent with a terminal**; browser is just an environment it spawns |\n| **Action space**    | Playwright code, or NL → LLM-translated Playwright           | Discrete subcommands (`open`, `click @e2`, `snapshot`, `eval`)            | Indexed click\u002Ftype actions selected by the LLM        | **Free-form Python (writes Playwright scripts itself)**                       |\n| **What is \"state\"?**| The browser session                                          | The browser session (held by daemon across CLI calls)                     | The browser session                                   | **The local workspace — code, screenshots, logs.** Browser is disposable. |\n| **Loop shape**      | Imperative; `agent()` does multi-step when needed            | One CLI invocation per micro-step                                         | observe → predict next action → execute → repeat      | write code → execute → inspect screenshots → repair (code-as-action)      |\n\u003C\u002Fdetails>\n\n\n---\n\n## 🎥 Demo\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4ed94cd5-11be-4daa-b2d7-1260a803baca\n\n---\n\n## 📊 Performance\n\nState-of-the-art on two real-website benchmarks with a 100-step budget — see the [blog post](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Farticles\u002Fwebwright-a-terminal-is-all-you-need-for-web-agents\u002F) for full details.\n\n- 🏆 **Online-Mind2Web (300 tasks):** **86.7%** with GPT-5.4 — highest among open-sourced harnesses in the AutoEval category. Claude Opus 4.7 reaches **84.7%**, and is stronger on the hard split (**80.5%** vs. 76.6% for GPT-5.4 at N=100).\n- 🚀 **Odysseys (200 long-horizon tasks):** **60.1%** with GPT-5.4 (avg. 76.1 steps) — **+15.6 points** over the prior SOTA (Opus 4.6 at 44.5%, using vision based approach and persistent browser) and **+26.6 points** over base GPT-5.4 (33.5% using xy-coordinate prediction and persistent browser).\n- 🧠 **Code-as-action beats coordinate prediction:** Webwright substantially outperforms a reproduced GPT-5.4 screenshot+xy-coordinate baseline across all difficulty splits.\n- 🧰 **Small models + reusable tools:** generated scripts can be packaged as parameterized CLI tools — even **Qwen-3.5-9B** completes tasks well on Online-Mind2Web sites with 5+ tools available.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fodysseys_eval_step100.png\" alt=\"Odysseys long-horizon eval @ 100 steps\" width=\"49%\">\n  \u003Cimg src=\"assets\u002Fom2w_autoeval_step100.png\" alt=\"Online-Mind2Web AutoEval @ 100 steps\" width=\"49%\">\n\u003C\u002Fp>\n\n---\n\n## 🗺️ Project Map\n\n```\nwebwright\u002F\n├── pyproject.toml           # package: webwright\n├── src\u002Fwebwright\u002F\n│   ├── run\u002Fcli.py           # CLI entrypoint (`webwright`)\n│   ├── agents\u002Fdefault.py    # core agent loop\n│   ├── environments\u002F        # Playwright browser workspace\n│   ├── tools\u002F               # image_qa, self_reflection\n│   ├── models\u002F              # openai_model, anthropic_model, base\n│   ├── config\u002F              # base.yaml, model_openai.yaml, model_claude.yaml\n│   └── utils\u002F\n├── assets\u002F\n│   └── task_showcase\u002F       # tiny Flask dashboard for repeatable runs\n│       ├── app.py\n│       ├── templates\u002F       # dashboard.html, task.html\n│       └── tasks\u002F\u003Cshort_id>\u002F # task.json + report.json per task\n├── tests\u002F\n└── outputs\u002F                 # run artifacts (trajectories, screenshots)\n```\n\n---\n\n## 📰 Task Showcase (repeatable runs as a dashboard)\n\nA tiny Flask app under [`assets\u002Ftask_showcase\u002F`](assets\u002Ftask_showcase\u002FREADME.md) consolidates\nWebwright runs for **repeatable** odyssey tasks (deals, inventory, listings,\njob boards, weather, etc.) into a single dashboard. Each task ships only two\nfiles — `task.json` (metadata) and `report.json` (curated, structured output:\nsources + result sections like tables, lists, summaries) — and the templates\nrender them generically, so adding a new task is just dropping a new folder\nin `assets\u002Ftask_showcase\u002Ftasks\u002F`.\n\n```bash\npip install flask\npython assets\u002Ftask_showcase\u002Fapp.py    # http:\u002F\u002F127.0.0.1:5005\n```\n\nTo have Webwright produce a renderer-ready task folder at runtime, stack the\nTask Showcase overlay:\n\n```bash\npython -m webwright.run.cli \\\n    -c base.yaml -c model_openai.yaml -c task_showcase.yaml \\\n    -t \"\u003Crepeatable web task>\" \\\n    --task-id my_repeatable_task \\\n    -o outputs\u002Fdefault\n```\n\n> **Note:** `report.json` is only generated when `-c task_showcase.yaml` is\n> included. A plain `base.yaml` run produces `trajectory.json` and debug\n> artifacts but no `report.json`.\n\nThe run writes `task_showcase\u002Ftasks\u002F\u003Cshort_id>\u002Ftask.json` and `report.json`\ninside the output workspace. Render those generated files without copying them\nback into the repo:\n\n```bash\npython assets\u002Ftask_showcase\u002Fapp.py \\\n    --tasks-dir outputs\u002Fdefault\u002F\u003Crun>\u002Ftask_showcase\u002Ftasks\n```\n\n---\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.10+\n- Chromium installed through Playwright\n- An API key for your chosen backend (OpenAI, Anthropic, or OpenRouter)\n\n### Install\n\n```bash\npip install -e .\nplaywright install chromium\n```\n\n### Run\n\nExport credentials for the configured backend (for example, `OPENAI_API_KEY`\nwith `model_openai.yaml` or `ANTHROPIC_API_KEY` with `model_claude.yaml`). The\n`image_qa` and `self_reflection` tools use the same configured model by default,\nso an Anthropic run does not require an OpenAI key. Then:\n\n```bash\npython -m webwright.run.cli \\\n    -c base.yaml -c model_openai.yaml \\\n    -t \"Search for flights from SEA to JFK on 2026-08-15 to 2026-08-20\" \\\n    --start-url https:\u002F\u002Fwww.google.com\u002Fflights \\\n    --task-id demo_openai \\\n    -o outputs\u002Fdefault\n```\n\n### 🚩 Flags\n\n| Flag | Description |\n|------|-------------|\n| `-c` | Config file(s) from `src\u002Fwebwright\u002Fconfig\u002F` (stackable). |\n| `-t` | Task instruction. |\n| `--start-url` | Initial page. |\n| `--task-id` | Output subfolder name. |\n| `-o` | Output directory. |\n\n---\n\n## 🔌 Use as a Plugin\n\nWebwright ships plugin manifests for both [Claude Code](https:\u002F\u002Fdocs.claude.com\u002Fen\u002Fdocs\u002Fclaude-code\u002Fplugins) ([`.claude-plugin\u002Fplugin.json`](.claude-plugin\u002Fplugin.json)) and [OpenAI Codex](https:\u002F\u002Fdevelopers.openai.com\u002Fcodex\u002Fplugins) ([`.codex-plugin\u002Fplugin.json`](.codex-plugin\u002Fplugin.json)), with the shared skill at [`skills\u002Fwebwright\u002F`](skills\u002Fwebwright\u002F) and slash commands at [`skills\u002Fwebwright\u002Fcommands\u002F`](skills\u002Fwebwright\u002Fcommands\u002F). The host agent drives the Webwright loop natively — no extra LLM API key or cost beyond your host subscription. Hosts that read PNG screenshots natively skip the `image_qa` \u002F `self_reflection` tools.\n\nCommon runtime deps (install once after either path):\n\n```bash\npip install -e .\nplaywright install chromium\n```\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Claude Code\u003C\u002Fb>\u003C\u002Fsummary>\n\n### Install\n\nInstall through the bundled marketplace inside Claude Code:\n\n```text\n# 1. Add this repo as a Claude Code plugin marketplace\n\u002Fplugin marketplace add microsoft\u002FWebwright\n\n# 2. Install the plugin from that marketplace\n\u002Fplugin install webwright@webwright\n```\n\nPrefer a local checkout? Point the marketplace command at the cloned repo instead:\n\n```text\n\u002Fplugin marketplace add \u002Fabsolute\u002Fpath\u002Fto\u002FWebwright\n\u002Fplugin install webwright@webwright\n```\n\n### Use\n\n**Start a new Claude Code session** after installing — plugins are loaded at session start and won't appear until you restart.\n\nYou can either ask Claude Code in plain English (the skill auto-activates from its description), or use one of the slash commands:\n\n```\n\u002Fwebwright:run search Google Flights for flights from SEA to JFK on 2026-08-15 to 2026-08-20\n\u002Fwebwright:craft search a ticket on Google Flights from LAX to SFO depart June 7 return June 14\n```\n\n- `\u002Fwebwright:run` (or any plain prompt) produces a **one-shot** `final_script.py` for the literal task values.\n- `\u002Fwebwright:craft` produces a **reusable CLI tool**: `final_script.py` becomes one parameterized function with a Google-style `Args:` docstring and an `argparse` wrapper whose flags default to the concrete task values, so you can rerun it later with different arguments — e.g. `python final_script.py --origin JFK --destination LAX --depart-date 2026-07-01`.\n\nIn both modes Claude Code scaffolds a workspace with `plan.md`, runs instrumented Playwright scripts under `final_runs\u002Frun_\u003Cid>\u002F`, and visually self-verifies each critical point against the saved screenshots.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>OpenAI Codex\u003C\u002Fb>\u003C\u002Fsummary>\n\n### Install\n\nCodex reads Claude-style marketplaces, so the same repo works as a Codex plugin marketplace. From the Codex CLI:\n\n```bash\n# 1. Add this repo as a Codex plugin marketplace\ncodex plugin marketplace add microsoft\u002FWebwright\n\n# 2. Open the plugin browser and install Webwright\ncodex\n\u002Fplugins\n```\n\nPrefer a local checkout?\n\n```bash\ncodex plugin marketplace add \u002Fabsolute\u002Fpath\u002Fto\u002FWebwright\n```\n\nThen restart Codex so the new marketplace and plugin are picked up.\n\n### Use\n\nIn a new Codex thread, either ask in plain English (the skill auto-activates from its description) or invoke the bundled skill explicitly with `@webwright`:\n\n```\n@webwright search Google Flights for flights from SEA to JFK on 2026-08-15 to 2026-08-20\n```\n\nCodex scaffolds a workspace with `plan.md`, runs instrumented Playwright scripts under `final_runs\u002Frun_\u003Cid>\u002F`, and visually self-verifies each critical point against the saved screenshots.\n\nTo turn the plugin off without uninstalling, set its entry in `~\u002F.codex\u002Fconfig.toml` to `enabled = false` and restart Codex.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>🦞 OpenClaw\u003C\u002Fb>\u003C\u002Fsummary>\n\n### Install\n\nInstall directly from a local checkout (path, archive, npm spec, git repo, or `clawhub:` spec all work):\n\n```bash\nopenclaw plugins install \u002Fabsolute\u002Fpath\u002Fto\u002FWebwright\nopenclaw gateway restart   # reload so the plugin and skill are picked up\n```\n\nVerify:\n\n```bash\nopenclaw plugins list | grep webwright\nopenclaw skills  list | grep webwright   # should show \"✓ ready\"\n```\n\n### Use\n\nThe `webwright` skill is now available to any OpenClaw agent surface (CLI, Telegram, etc.) — invoke it by asking the agent in natural language, or via the slash commands shipped under [`skills\u002Fwebwright\u002Fcommands\u002F`](skills\u002Fwebwright\u002Fcommands\u002F), e.g. `\u002Fwebwright run \u003Ctask>`.\n\nTo uninstall: `openclaw plugins uninstall webwright`.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Hermes Agent\u003C\u002Fb>\u003C\u002Fsummary>\n\n### Install\n\n[Hermes Agent](https:\u002F\u002Fgithub.com\u002FNousResearch\u002Fhermes-agent) is a [skills-compatible client](https:\u002F\u002Fagentskills.io), so the same `skills\u002Fwebwright\u002F` folder loads as a Hermes skill. Symlink it into your Hermes user-skills directory:\n\n```bash\nmkdir -p ~\u002F.hermes\u002Fskills\nln -sfn \u002Fabsolute\u002Fpath\u002Fto\u002FWebwright\u002Fskills\u002Fwebwright ~\u002F.hermes\u002Fskills\u002Fwebwright\n```\n\nNo Hermes-specific manifest is needed; only `SKILL.md` is loaded.\n\n### Use\n\nStart Hermes (`hermes`) and ask it to drive a web task in natural language — the skill auto-activates from its description. You can also invoke it explicitly with `\u002Fwebwright`.\n\nNote: the named subcommands shipped under [`skills\u002Fwebwright\u002Fcommands\u002F`](skills\u002Fwebwright\u002Fcommands\u002F) (`\u002Fwebwright:run`, `\u002Fwebwright:craft`) are a Claude Code \u002F Codex convention and are inert in Hermes; the skill itself still works end-to-end.\n\n\u003C\u002Fdetails>\n\n## 📃 Trajectory Comparison & Viewer\n\nYou can run the same tasks using the Webwright harness and its Codex \u002F GitHub Copilot skill variant, and see how token usage and trajectories stack up between different harnesses. The trajectory viewer supports Codex, GitHub Copilot and Webwright harness traces.\n\n![Trajectory comparison](assets\u002Ftrajectory-compare.png)\n\n### How to use\n\n```bash\ncd assets\u002Fcompare_trajectory\u002F\npython3 -m http.server\n```\n\nOpen the webpage in your browser and upload the Webwright `raw_responses.jsonl` and attach `trajectory.json` to view. Then on the other side you can upload your Codex or GitHub Copilot trace.\n\n### Obtaining Codex traces:\n\n```\nls ~\u002F.codex\u002Fsessions\u002F2026\u002FMONTH\u002FDAY\u002FSESSION_ID.jsonl\n```\n\n### Obtaining GitHub Copilot traces:\n\n```\n\u002Fexport file session\n-> session.md is the uploadable trace\n```\n\n### Quick Comparison\n\n#### \"Find the cheapest used 8-cylinder bmw made between 2005-2015 and priced from 25,000 to  50,000 dollars with mileage less than 50,000 miles or less.\"\n\n| Tokens | Webwright Harness (Local Browser Mode) | Codex Webwright Skill |\n| --- | ---: | ---: |\n| Input | 420,433 | 3,271,143 |\n| Output | 3,593 | 20,040 |\n| Reasoning | 0 | 4,410 |\n| Cached | 217,216 | 3,081,3440 |\n| Total | 424,026 | 3,291,183 |\n\nIndividual runs and results may vary.\n\n---\n\n## Credits\n\n- [SWE-agent\u002Fmini-swe-agent](https:\u002F\u002Fgithub.com\u002FSWE-agent\u002Fmini-swe-agent\u002Ftree\u002Fmain) — design inspiration for the minimal agent loop.\n- [Playwright](https:\u002F\u002Fplaywright.dev\u002F) — browser automation.\n\n## Citation\n\nIf you use Webwright in your research or build on it, please cite this repository:\n\n```bibtex\n@misc{webwright2026,\n  title        = {Webwright: A terminal is all you need for web agents},\n  author       = {Lu, Yadong and Xu, Lingrui and Huang, Chao and Awadallah, Ahmed},\n  year         = {2026},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FWebwright}},\n  note         = {GitHub repository}\n}\n```\n","Webwright 是一个用于执行长时间网页任务的浏览器代理框架。它通过Python脚本让大语言模型能够启动多个浏览器会话，按需捕获和检查页面截图或状态，并将整个网页任务封装在一个可重运行的Python脚本中完成。该框架支持Chromium作为浏览器后端，并兼容OpenAI、Anthropic及OpenRouter等平台提供的模型服务。其特点是简洁高效（约1.5k行代码），不依赖复杂的多代理系统或图形引擎，非常适合需要在网页环境中自动执行复杂任务的应用场景，如自动化测试、数据抓取或基于浏览器的任务自动化处理。",2,"2026-06-11 04:08:33","high_star"]