[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81550":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":10,"openIssues":12,"contributorsCount":12,"subscribersCount":12,"size":12,"stars1d":12,"stars7d":12,"stars30d":12,"stars90d":12,"forks30d":12,"starsTrendScore":12,"compositeScore":13,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":14,"fork":14,"defaultBranch":15,"hasWiki":16,"hasPages":14,"topics":17,"createdAt":8,"pushedAt":8,"updatedAt":18,"readmeContent":19,"aiSummary":20,"trendingCount":12,"starSnapshotCount":12,"syncStatus":21,"lastSyncTime":22,"discoverSource":23},81550,"qa-core-agent-openclaw","sardar-usman\u002Fqa-core-agent-openclaw","sardar-usman",null,"TypeScript",25,8,0,2.86,false,"main",true,[],"2026-06-12 02:04:16","# QA-Core\n\n**Autonomous Playwright test generation, powered by Claude.**\n\nQA-Core is an AI agent that opens a real browser, explores your app, reviews its own work, and writes a Playwright test suite. Every test runs and passes once inside the agent before it is saved to disk, so you get specs that already work on day one.\n\nBuilt on [Claude](https:\u002F\u002Fwww.anthropic.com\u002F) by Anthropic. Distributed through [OpenClaw](https:\u002F\u002Fopenclaw.dev). Drives [Playwright](https:\u002F\u002Fplaywright.dev\u002F).\n\n## Table of contents\n\n1. [What it does](#what-it-does)\n2. [Why this is different](#why-this-is-different)\n3. [How it works](#how-it-works)\n4. [Quick start](#quick-start)\n5. [Commands](#commands)\n6. [Web UI](#web-ui)\n7. [MCP server](#mcp-server-for-claude-desktop-cursor-cline-continue)\n8. [Model routing and budgets](#model-routing-and-budgets)\n9. [Evaluation results](#evaluation-results)\n10. [Project layout](#project-layout)\n11. [Configuration files](#configuration-files)\n12. [Requirements](#requirements)\n13. [About the author](#about-the-author)\n14. [License](#license)\n\n## What it does\n\nQA-Core exposes three commands. Each one solves a different problem in test automation.\n\n| Command | What you give it | What you get back |\n| ------- | ---------------- | ----------------- |\n| `npm run explore` | A live URL | A full Playwright suite written from a verified browser session, with a Page Object Model framework |\n| `npm run generate` | A user story or Jira ticket | A Playwright spec built from acceptance criteria. You can run it to verify |\n| `npm run heal` | A spec that broke because the page changed | A patched copy with re-resolved selectors and confidence scores |\n\nGenerated files land under `output\u002F\u003Crun-id>\u002F`.\n\n## Why this is different\n\nMost \"AI test generators\" take a single DOM snapshot, hand it to an LLM, and hope the output works. QA-Core does not do that.\n\nIt runs a real agent pipeline:\n\n* The **Planner** uses Haiku to read one page snapshot and write a numbered scenario list.\n* The **Explorer** uses Opus and a tool-use loop to drive the browser. It navigates, clicks, fills, and asserts against the live page. Every action is verified before the next one.\n* The **Critic** uses Sonnet to review the trace and label each scenario as ship, weak, or fix.\n* The **Transcriber** is deterministic. It turns the verified trace into Playwright code.\n* The **Healer** is on-demand. When a real Playwright run fails because the page changed, it re-resolves the broken selectors live.\n\nThis means every line in the final spec corresponds to an action that already worked once against your real app.\n\n## How it works\n\n```text\n                         ┌── per-host memory ──┐\n                         │  (loaded as cached  │\n                         │   system block)     │\n                         └──────────┬──────────┘\n                                    │\n[1] Planner   (Haiku)  ─────────────┘\n    1 page snapshot then numbered scenario list\n\n[2] Explorer  (Opus)  ◀─ tool-use loop with prompt caching\n    navigate \u002F click \u002F fill \u002F assert \u002F get_dom \u002F finish\n    every action verified against the live page\n\n[3] Critic    (Sonnet)\n    reads the trace, returns ship \u002F weak \u002F fix verdicts\n\n       ↓\n\n  trace transcriber then output\u002F\u003Crun-id>\u002F\u003Cname>.spec.ts\n                       then run-report.json (plan, verdicts, cost, cascade)\n```\n\n```mermaid\nflowchart LR\n    classDef stage fill:#1a1a22,stroke:#b9a6ff,color:#f5f5f7\n    classDef optional fill:#131318,stroke:#f4c560,stroke-dasharray:5 5,color:#f4c560\n    classDef io fill:#0d0d10,stroke:#5b5b66,color:#9d9da7\n    classDef memory fill:#0d0d10,stroke:#5dd5a4,color:#5dd5a4\n\n    URL[\"URL or Story\"]:::io\n    P[\"Planner (Haiku 4.5)\"]:::stage\n    REV[\"Review checkpoint\"]:::optional\n    E[\"Explorer (Opus 4.7) tool-use loop\"]:::stage\n    C[\"Critic (Sonnet 4.6) ship, weak, fix\"]:::stage\n    T[\"Transcriber + axe-core\"]:::stage\n    H[\"Healer (Sonnet 4.6) on-demand\"]:::stage\n    SPEC[\"Spec file (.ts or .js)\"]:::io\n    CI[\"CI and GitHub Actions\"]:::io\n    MEM[\"Per-host memory\"]:::memory\n\n    URL --> P\n    P -.->|optional| REV\n    REV -.->|from plan| E\n    P --> E\n    E --> C\n    C --> T\n    T --> SPEC\n    SPEC --> CI\n    SPEC -.->|on failure| H\n    H -.->|patched| SPEC\n\n    MEM -.->|cached prompt| P\n    MEM -.->|cached prompt| E\n    E -.->|observed intents| MEM\n```\n\n### The selector cascade\n\nQA-Core picks selectors in this order: `getByRole`, then `getByLabel`, then `getByTestId`, then CSS as a last resort. The level that resolved each call is logged. The transcriber emits the most resilient selector available, and the Critic can flag overuse of CSS.\n\n### Auto-injected accessibility checks\n\nEvery generated spec ships with an `@axe-core\u002Fplaywright` accessibility check against the landing page. You get WCAG 2 AA coverage by default.\n\n### Per-host memory\n\nAfter each run, the agent saves what it learned about that site to `.qa-core\u002Fsites\u002F\u003Chost>.json`. This includes the intents it observed and the selector cascade level that worked. The next run against the same host loads this memory into the system prompt as a cached block. Repeat runs are typically 90 percent cheaper than the cold path.\n\n### Self-healing\n\nWhen a spec fails because the page changed, `npm run heal` re-resolves the broken selectors on the live page. Each replacement is verified to resolve to exactly one element before it lands in the patched copy at `\u003Cspec>.healed.\u003Cext>`. A comment annotation shows the original call and the model's confidence.\n\n### More reference material\n\n* Full reference: [`docs\u002FDOCUMENTATION.md`](.\u002Fdocs\u002FDOCUMENTATION.md). Every component, flag, env var, and file format.\n* Flow diagram in SVG: [`docs\u002Farchitecture.svg`](.\u002Fdocs\u002Farchitecture.svg).\n* Interactive HTML page: [`docs\u002Farchitecture.html`](.\u002Fdocs\u002Farchitecture.html).\n* MCP install guide: [`docs\u002FMCP.md`](.\u002Fdocs\u002FMCP.md).\n\n## Quick start\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsardarusmanjutt\u002Fqa-core-agent.git\ncd qa-core-agent\ncp .env.example .env          # then add your ANTHROPIC_API_KEY\nbash setup.sh                 # installs dependencies and Playwright Chromium\n```\n\nRequired environment variable: `ANTHROPIC_API_KEY`. Get one at [console.anthropic.com](https:\u002F\u002Fconsole.anthropic.com\u002Fsettings\u002Fkeys).\n\nOptional: `QA_CORE_AUTH_URL`, `QA_CORE_AUTH_USER`, `QA_CORE_AUTH_PASS` if you want a stored auth session reused across tests. See [`tests\u002Fauth.setup.ts`](.\u002Ftests\u002Fauth.setup.ts).\n\n## Commands\n\n### Explore a URL\n\n```bash\nnpm run explore -- https:\u002F\u002Fwww.saucedemo.com\u002F\nnpm run explore -- https:\u002F\u002Fwww.saucedemo.com\u002F --lang js      # JavaScript output\nnpm run explore -- https:\u002F\u002Fwww.saucedemo.com\u002F --name login   # custom filename\n```\n\nBy default `\u002Fexplore` emits a full Page Object Model framework. Output lands under `output\u002F\u003Ctimestamp>-\u003Chost>\u002F`:\n\n```text\noutput\u002F20260514-160000-saucedemo-com\u002F\n  pages\u002F\n    BasePage.ts                    # base class with goto + waitReady helpers\n    SaucedemoPage.ts               # typed Locator fields + loginAs(user, pass)\n  tests\u002F\n    saucedemo.spec.ts              # spec that uses the page object\n  a11y\u002F\n    landing.a11y.spec.ts           # auto-injected WCAG 2 AA check\n  run-report.json                  # cost, cascade stats, scenario list\n```\n\nThe page class looks like this:\n\n```typescript\nexport class SaucedemoPage extends BasePage {\n  readonly url = \"https:\u002F\u002Fwww.saucedemo.com\u002F\";\n  readonly username: Locator;\n  readonly password: Locator;\n  readonly loginButton: Locator;\n  readonly loginError: Locator;\n\n  constructor(page: Page) {\n    super(page);\n    this.username    = page.getByRole(\"textbox\", { name: \"Username\" });\n    this.password    = page.getByRole(\"textbox\", { name: \"Password\" });\n    this.loginButton = page.getByRole(\"button\",  { name: \"Login\" });\n    this.loginError  = page.locator(\"[data-test=error]\");\n  }\n\n  async loginAs(username: string, password: string): Promise\u003Cvoid> {\n    await this.username.fill(username);\n    await this.password.fill(password);\n    await this.loginButton.click();\n  }\n}\n```\n\nAnd the spec that uses it:\n\n```typescript\ntest(\"[happy] logged in with valid credentials\", async ({ page }) => {\n  await saucedemoPage.loginAs(\"standard_user\", \"secret_sauce\");\n  await expect(page).toHaveURL(\u002Finventory\u002F);\n});\n```\n\nIf you prefer a single-file output without the page object, pass `--no-pom`.\n\n### Review mode (sign-off before automation)\n\nFor team workflows where a lead needs to approve scenarios before the Explorer runs:\n\n```bash\nnpm run explore -- https:\u002F\u002Fwww.saucedemo.com\u002F --review\n# writes output\u002F\u003Crun-id>\u002Fplan.csv and exits\n```\n\nOpen `plan.csv` in Excel, Numbers, or Google Sheets. Set `Approve=no` on any row you want to skip. Then resume:\n\n```bash\nnpm run explore -- --from-plan output\u002F\u003Crun-id>\u002Fplan.csv\n# skips Planner, runs Explorer + Critic + Transcriber on approved scenarios only\n```\n\nThe Planner cost is paid only once. The CSV header preserves the target URL, so the resume command needs no extra arguments.\n\n### Generate tests from a user story\n\n```bash\nnpm run generate -- \"As a user I want to log in so I can access my dashboard\"\nnpm run generate -- \"...\" --lang js --base-url https:\u002F\u002Fstaging.example.com\n```\n\nThis one does not open a browser. It produces code from acceptance criteria. Run the spec to verify it works against your real app.\n\n### Heal a spec that broke\n\n```bash\nnpm run heal -- output\u002F\u003Crun-id>\u002F\u003Cname>.spec.ts\n```\n\nQA-Core runs the spec, finds selector-style failures, opens the URL in a fresh browser, and proposes replacements. Each replacement is verified to resolve to exactly one element before it is written to `\u003Cspec>.healed.\u003Cext>`. The patched file includes a comment with the original call and the model's confidence score.\n\n### Run the suite\n\n```bash\nnpx playwright test output\u002F\u003Crun-id>\u002F\u003Cname>.spec.ts\n```\n\nPlaywright is configured with Chromium, Firefox, WebKit, and mobile projects. CI mode adds retries, trace on first retry, and an HTML report.\n\n## Web UI\n\nThe chat-style UI at [`qa-core-ui.html`](.\u002Fqa-core-ui.html) talks to a WebSocket gateway that bridges the OpenClaw web surface to the agent runtime.\n\n```bash\nnpm run gateway              # starts ws:\u002F\u002F127.0.0.1:18789\nopen qa-core-ui.html         # in your browser\n```\n\nClick **Connect** in the header. Then type a slash command:\n\n* `\u002Fexplore https:\u002F\u002F...`\n* `\u002Fgenerate \"user story\"`\n* `\u002Fheal output\u002F\u003Crun-id>\u002F\u003Cname>.spec.ts`\n\nThe gateway streams progress messages as the Planner, Explorer, and Critic stages run. It then sends the generated spec as a final message that the UI renders as a copy and save code block. The Activity panel on the right has three tabs: Results (run history), Files (list of generated files with copy and download), and Log (live event stream). The refresh button re-syncs runs from the gateway.\n\nOptional auth: set `QA_CORE_GATEWAY_TOKEN` in your environment. The UI accepts the token via the page URL fragment, for example `qa-core-ui.html#token=\u003Cvalue>`.\n\n## MCP server (for Claude Desktop, Cursor, Cline, Continue)\n\nQA-Core ships an MCP (Model Context Protocol) server. Any MCP-aware client can use the three workflows as first-class tools, with no gateway, no UI, and no clone-and-run setup.\n\n```bash\nnpm run mcp                  # standalone, useful for debugging via MCP Inspector\n```\n\nFor real use, point your AI client at the server through its config file. The full install guide is [`docs\u002FMCP.md`](.\u002Fdocs\u002FMCP.md). An example Claude Desktop config is at [`docs\u002Fclaude_desktop_config.example.json`](.\u002Fdocs\u002Fclaude_desktop_config.example.json).\n\nOnce installed, in Claude Desktop you can just chat:\n\n> \"Use qa-core to explore `https:\u002F\u002Fwww.saucedemo.com\u002F` and show me the generated spec.\"\n\nClaude calls the `qa_explore` MCP tool. The server runs the multi-agent pipeline and returns the verified spec.\n\n**Tools exposed:** `qa_explore`, `qa_generate`, `qa_heal`.\n**Resources exposed:** `qa-core:\u002F\u002Fruns`, `qa-core:\u002F\u002Fmemory`.\n\n## Model routing and budgets\n\nEach stage of the pipeline uses a different model so cost stays low and quality stays high. You can override any of them with environment variables.\n\n| Setting | Default | Purpose |\n| ------- | ------- | ------- |\n| `QA_CORE_MODEL_PLANNER` | `claude-haiku-4-5` | Cheap scenario derivation pre-pass |\n| `QA_CORE_MODEL_EXPLORE` | `claude-opus-4-7` | Browser-driving tool-use loop. Use Opus for hard sites |\n| `QA_CORE_MODEL_CRITIC` | `claude-sonnet-4-6` | Post-run review with per-scenario verdicts |\n| `QA_CORE_MODEL_HEAL` | `claude-sonnet-4-6` | Selector re-resolution in `npm run heal` |\n| `QA_CORE_MODEL_TRANSCRIBE` | `claude-sonnet-4-6` | Story to spec in `npm run generate` |\n| `QA_CORE_MAX_STEPS` | `40` | Hard ceiling on tool calls per `\u002Fexplore` |\n| `QA_CORE_MAX_USD` | `2.00` | Hard ceiling on cost per run. The agent aborts if exceeded |\n\nPrompt caching is enabled on three cached blocks: the frozen behavior rules, the site memory for the target host, and the planner output. Repeat runs against the same host reuse the first two. Cost is typically 90 percent lower than a cold run.\n\n## Evaluation results\n\nQA-Core ships an evaluation suite that runs the agent against three public test sites, executes the generated specs, and publishes pass-rate, flake-rate, cost, and selector cascade distribution.\n\n```bash\nnpm run eval\n# writes eval-results\u002F\u003Ctimestamp>\u002Fsummary.md\n```\n\nLatest run is from 2026-05-14. First-run unfiltered, no self-healing applied.\n\nThe first column below shows the original inline output from the eval harness. The second column shows the same agent trace re-emitted through the Page Object Model framework. Same scenarios. Same browser session. Better code emission target.\n\n| Site | Pass-rate (inline) | Pass-rate (POM) |\n| ---- | -----------------: | --------------: |\n| saucedemo | 50% | 83% |\n| the-internet | 29% | 43% |\n| practice-todo | 17% | 67% |\n| **Aggregate** | **6 of 19 = 32%** | **12 of 19 = 63%** |\n\nPOM almost doubles the first-run pass-rate. The reason is consistency. When locators live as typed class fields, the same selector is used in every scenario and across reruns. Inline emission was free to pick a different selector flavour per test, and that introduced flake.\n\nTotal cost: $0.7697 across the three sites in 5 minutes 38 seconds. Remaining failures fall into three buckets: selector drift in dynamic DOMs (TodoMVC), strict URL assertions, and unhandled timing on JS-heavy widgets. Each one is a candidate for `npm run heal` to repair, or for the next round of Critic policy tuning. Full breakdown: [`eval-results\u002F2026-05-14T08-04-45-447Z\u002Fsummary.md`](.\u002Feval-results\u002F2026-05-14T08-04-45-447Z\u002Fsummary.md).\n\n> **A note on absolute pass-rates.** Single-run aggregate numbers are noisy. Public test sites sometimes rate-limit, sleep (Heroku free tier), or rotate selectors. A different eval run in our history showed saucedemo at 80 percent but the-internet at 0 percent, purely because a Heroku cold-start exceeded the default 15 second navigation timeout. The signal worth quoting is the inline-vs-POM delta on identical traces, because that comparison controls for site flakiness. The jump from 32 percent to 63 percent is real and reproducible. Any headline like \"we got X percent today\" is not. Treat any single eval run as one data point, not the truth.\n\n## Project layout\n\n```text\nsrc\u002F\n  agent\u002F\n    runtime.ts        # multi-agent pipeline (Planner, Explorer, Critic) + budgets\n    planner.ts        # Haiku pre-step: scenario derivation from one DOM snapshot\n    critic.ts         # Sonnet post-step: per-scenario ship\u002Fweak\u002Ffix verdicts\n    memory.ts         # per-host fingerprints + project memory, cached into prompt\n    heal.ts           # selector self-healing, re-resolves broken calls live\n    tools.ts          # Playwright tool surface exposed to Claude\n    selectors.ts      # role, label, testid, CSS cascade resolver\n    transcriber.ts    # legacy single-file emission (verified trace to inline spec)\n    pom.ts            # Page Object Model emitter (default): BasePage + per-page classes\n    trace.ts          # types: Scenario, TraceStep, Assertion, RunReport\n    generate.ts       # \u002Fgenerate: story to spec, no browser\n  cli\u002F\n    explore.ts        # npm run explore\n    generate.ts       # npm run generate\n    heal.ts           # npm run heal\n  server\u002F\n    gateway.ts        # WebSocket bridge between qa-core-ui.html and the runtime\n  mcp\u002F\n    server.ts         # MCP server: exposes qa_explore, qa_generate, qa_heal\ndocs\u002F\n  DOCUMENTATION.md    # full reference\n  architecture.html   # full-page architecture infographic\n  architecture.svg    # single-image flow diagram\n  MCP.md              # MCP install guide for Claude Desktop, Cursor, Cline\nscripts\u002F\n  eval.ts             # npm run eval\ntests\u002F\n  auth.setup.ts       # storage-state fixture for auth-gated apps\n.qa-core\u002F             # per-host memory cache (gitignored)\nqa-core-ui.html       # web UI client\nplaywright.config.ts\n.github\u002Fworkflows\u002Fqa-core.yml\n```\n\n## Configuration files\n\nThe agent's behavior is defined in plain markdown so OpenClaw can load it.\n\n| File | Purpose |\n| ---- | ------- |\n| [`agent\u002FSOUL.md`](.\u002Fagent\u002FSOUL.md) | Operating principles, hard rules, defaults |\n| [`agent\u002FIDENTITY.md`](.\u002Fagent\u002FIDENTITY.md) | What QA-Core is and what it does |\n| [`agent\u002FTOOLS.md`](.\u002Fagent\u002FTOOLS.md) | Tool surface and selector cascade |\n| [`agent\u002FMEMORY.md`](.\u002Fagent\u002FMEMORY.md) | Per-project persistent context |\n| [`skills\u002Fexplore-url.md`](.\u002Fskills\u002Fexplore-url.md) | `\u002Fexplore` command behavior |\n| [`skills\u002Fgenerate-tests.md`](.\u002Fskills\u002Fgenerate-tests.md) | `\u002Fgenerate` command behavior |\n\n## Requirements\n\n* Node.js 20 or newer\n* `ANTHROPIC_API_KEY`\n* Playwright Chromium (`npx playwright install chromium`)\n\n## About the author\n\n**Muhammad Usman**\nSenior QA Automation Engineer. AI Test Engineering Lead.\nISTQB CTFL Certified. Upwork Top Rated Plus (Top 3 percent).\n10+ years in QA automation.\n\n* Website: [sardarusmanjutt.com](https:\u002F\u002Fsardarusmanjutt.com)\n* LinkedIn: [linkedin.com\u002Fin\u002Fsardarusmanjutt](https:\u002F\u002Flinkedin.com\u002Fin\u002Fsardarusmanjutt)\n* Email: [muhammad.usman101@hotmail.com](mailto:muhammad.usman101@hotmail.com)\n\n## License\n\nMIT. Use it, fork it, build on it.\n","QA-Core 是一个基于 Claude 的 AI 代理，用于自动生成 Playwright 测试。它通过打开真实浏览器、探索应用并编写 Playwright 测试套件来实现自动化测试生成。其核心功能包括使用 Haiku 规划器生成场景列表、Opus 探索者驱动浏览器执行操作验证、Sonnet 评论家审查结果以及 Transcriber 转录为 Playwright 代码。此外，Healer 功能可以在页面变更导致测试失败时修复选择器。项目使用 TypeScript 编写，并集成了 Anthropic 的 Claude 和 Playwright。适用于需要快速构建和维护高质量端到端测试的 Web 应用开发场景。",2,"2026-06-11 04:05:28","CREATED_QUERY"]