[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11163":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},11163,"agent-ctrl","k4cper-g\u002Fagent-ctrl","k4cper-g","Desktop automation CLI for AI agents. Fast native Rust CLI.","https:\u002F\u002Fagent-ctrl.dev",null,"Rust",77,3,120,0,6,42.41,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:00:54","\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Flogo.png\" alt=\"agent-ctrl\" height=\"120\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  Desktop automation CLI for AI agents. Fast native Rust CLI.\n\u003C\u002Fp>\n\n> **Browser automation is out of scope.** agent-ctrl drives native UI; for Chromium-via-CDP use the sibling [agent-browser](https:\u002F\u002Fgithub.com\u002Fvercel-labs\u002Fagent-browser) project. The two are designed to compose in the same agent loop.\n\n## Installation\n\n### npm (recommended)\n\n```bash\nnpm install -g @agent-ctrl\u002Fcli\n```\n\nThe package ships a Node launcher; the postinstall step downloads the\nmatching native binary for your platform from the corresponding GitHub\nRelease. Supported in v0.1.x: Windows x64, macOS arm64, macOS x64. Linux\nis on the roadmap.\n\n### Windows binary\n\nFor tagged releases, download the Windows zip from GitHub Releases or run:\n\n```powershell\npowershell -ExecutionPolicy Bypass -File .\\scripts\\install-windows.ps1\n```\n\nThe installer downloads the latest `agent-ctrl.exe`, installs it under\n`%LOCALAPPDATA%\\agent-ctrl\\bin`, and adds that directory to the user PATH\nunless `-NoPath` is passed.\n\n### macOS binary\n\nFor tagged releases, download the macOS tarball from GitHub Releases (one\nasset per arch: `aarch64-apple-darwin` for Apple Silicon, `x86_64-apple-darwin`\nfor Intel) or run:\n\n```bash\ncurl -fsSL https:\u002F\u002Fraw.githubusercontent.com\u002Fk4cper-g\u002Fagent-ctrl\u002Fmain\u002Fscripts\u002Finstall-macos.sh | bash\n```\n\nThe installer detects the host arch, downloads the matching tarball, places\n`agent-ctrl` at `~\u002F.local\u002Fbin\u002Fagent-ctrl`, and runs `agent-ctrl info` to\nverify. Pass `--install-dir \u003Cpath>` to install elsewhere or `--no-path` to\nskip the PATH-update reminder.\n\nAfter install, grant Accessibility permission in *System Settings >\nPrivacy & Security > Accessibility* (and Screen Recording in the same pane\nif you'll use `screenshot`). Run `agent-ctrl doctor` to verify.\n\n### From source (recommended for v0.1)\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fk4cper-g\u002Fagent-ctrl\ncd agent-ctrl\ncargo build --release -p agent-ctrl-cli\n# put target\u002Frelease\u002Fagent-ctrl on your PATH\n```\n\nThe Rust workspace crates are not published to crates.io in v0.1. The public\ndistribution paths are `npm install -g @agent-ctrl\u002Fcli`, the GitHub release\nbinaries (`agent-ctrl-*`), and source builds.\n\n### TypeScript client\n\n```bash\nnpm install @agent-ctrl\u002Fclient\n# expects `agent-ctrl` on PATH for the daemon transport\n```\n\n### Requirements\n\n- **Windows 10\u002F11** for UIA, **macOS 12+** for AX. Both surfaces ship the full action vocabulary; Linux \u002F Android \u002F iOS are not implemented yet. Other OSes build cleanly with stub surfaces.\n- **Rust 1.85+** (workspace MSRV; rustup will install it from `rust-toolchain.toml`).\n- **Node.js 20+** only when using the TypeScript client.\n\n## Quick start\n\n```bash\nagent-ctrl info                                  # OS, available surfaces, active sessions\nagent-ctrl open uia                              # spawn a daemon (background)\nagent-ctrl snapshot --target-process \u003Cname>      # tree of refs (@e0, @e1, ...)\nagent-ctrl click @e4                             # click by ref\nagent-ctrl get name @e4                          # inspect cached snapshot fields\nagent-ctrl is enabled @e4                        # boolean state checks\nagent-ctrl fill @e0 \"hello from agent-ctrl\"      # set value via UIA ValuePattern\nagent-ctrl press \"Ctrl+S\"                        # key chord via SendInput\nagent-ctrl screenshot result.png                 # PNG of the pinned window\nagent-ctrl close                                 # stop the daemon\n```\n\nEvery action follows the same pattern: `snapshot` once to learn what's on screen, then issue actions by ref. Refs are valid only for the snapshot that produced them - re-snapshot before acting on a tree that has changed.\n\n## Commands\n\n### Core\n\n```bash\nagent-ctrl open \u003Csurface>                # spawn a daemon (uia, mock, ...)\nagent-ctrl close                         # stop the daemon\nagent-ctrl list [--json]                 # active sessions\nagent-ctrl info [--json]                 # static facts about this binary\nagent-ctrl doctor [--json] [--fix] [--quick]  # diagnose the install + live probe\nagent-ctrl launch [--json] \u003Cpath> [--wait MS]  # spawn an app detached from this shell\n```\n\n### Snapshot\n\n```bash\nagent-ctrl snapshot                              # capture pinned window's a11y tree\nagent-ctrl snapshot --target-process \u003Cname>      # pin by process executable name\nagent-ctrl snapshot --target-pid \u003Cpid>           # pin by PID\nagent-ctrl snapshot --target-title \u003Csubstring>   # pin by window title (locale-dependent)\nagent-ctrl snapshot --json                       # full JSON for programmatic consumption\nagent-ctrl snapshot --compact false              # disable compact-tree filtering\n```\n\nThe first `snapshot` after `open` pins the session to a target window. Subsequent actions on the session target that window until a `focus-window` re-pins it.\n\n### Pointer \u002F focus\n\n```bash\nagent-ctrl click @eN                     # primary-button click on a ref\nagent-ctrl double-click @eN              # double-click\nagent-ctrl right-click @eN               # secondary-button click\nagent-ctrl hover @eN                     # cursor over element, no buttons\nagent-ctrl focus @eN                     # UIA SetFocus\nagent-ctrl highlight @eN                 # move cursor to element for human debugging\n```\n\n### Keyboard\n\n```bash\nagent-ctrl type \"hello\"                  # synthetic Unicode keystrokes\nagent-ctrl fill @eN \"value\"              # native value setting where supported\nagent-ctrl clear @eN                     # clear an editable field\nagent-ctrl press \"Ctrl+S\"                # key chord - Enter, Tab, Ctrl+A, Cmd+A, etc.\nagent-ctrl key-down \"Shift\"              # hold a modifier\nagent-ctrl key-up \"Shift\"                # release it\nagent-ctrl clipboard read                # read clipboard text\nagent-ctrl clipboard write \"text\"        # replace clipboard text\nagent-ctrl clipboard copy                # send Ctrl+C\nagent-ctrl clipboard paste               # send Ctrl+V\n```\n\n### Selection \u002F scroll\n\n```bash\nagent-ctrl select @eN \"Option name\"      # pick an item in a select \u002F combo \u002F list\nagent-ctrl select-all [@eN]              # select all in field; without ref, sends Ctrl+A to focus\nagent-ctrl check @eN                     # set a TogglePattern control on\nagent-ctrl uncheck @eN                   # set a TogglePattern control off\nagent-ctrl toggle @eN                    # toggle a TogglePattern control\nagent-ctrl scroll \u003CDX> \u003CDY> [--ref @eN]  # wheel scroll (positive DY = down)\nagent-ctrl scroll-into-view @eN          # UIA ScrollItemPattern\nagent-ctrl drag @eFROM @eTO              # source-to-destination drag\nagent-ctrl mouse move X Y                # raw mouse move\nagent-ctrl mouse down X Y --button left  # raw button down\nagent-ctrl mouse up X Y --button left    # raw button up\nagent-ctrl mouse wheel X Y --dy -120     # raw wheel\n```\n\n### Find\n\n```bash\nagent-ctrl find \"Save\"                   # case-insensitive substring on name\nagent-ctrl find \"Save\" --role button     # narrow by role (kebab-case)\nagent-ctrl find \"Save\" --exact           # case-sensitive equality\nagent-ctrl find --role menu-item         # all nodes of a role; no name filter\nagent-ctrl find \"OK\" --in @e2            # restrict to subtree under @e2\nagent-ctrl find \"Save\" --first           # bare ref for shell substitution\nagent-ctrl find --limit 5                # cap result count\n```\n\n`find` queries the *cached* snapshot - it does not re-walk the OS tree. With no match, writes `no match` to stderr and exits non-zero. `--first` prints just `@eN` so the canonical \"find then act\" pattern composes:\n\n```bash\nagent-ctrl click \"$(agent-ctrl find \"Save\" --role button --first)\"\n```\n\n### Inspect\n\n```bash\nagent-ctrl get text @eN                  # value if present, otherwise accessible name\nagent-ctrl get value @eN                 # editable\u002Fvalue-bearing field value\nagent-ctrl get name @eN                  # accessible name\nagent-ctrl get role @eN                  # canonical role\nagent-ctrl get state @eN                 # full state object\nagent-ctrl get bounds @eN                # logical screen bounds\nagent-ctrl get window                    # cached window context\n\nagent-ctrl is visible @eN\nagent-ctrl is enabled @eN\nagent-ctrl is focused @eN\nagent-ctrl is selected @eN\nagent-ctrl is checked @eN\nagent-ctrl is expanded @eN\n```\n\nInspect commands read the cached snapshot. They are fast and deterministic, but require a prior `snapshot`.\n\n### Wait\n\n```bash\nagent-ctrl wait \u003CMS>                                # dumb sleep on the daemon worker\nagent-ctrl wait-for \"Save\" --role button            # wait for a node to appear\nagent-ctrl wait-for \"Loading...\" --gone             # wait for a node to disappear\nagent-ctrl wait-for \"Agree\" --state checked         # wait for a boolean state\nagent-ctrl wait-for --role text-field --value-contains ready\nagent-ctrl wait-for --window-appears \"Dialog title\" # wait for a sibling window title\nagent-ctrl wait-for --stable [--idle-ms 500]        # wait for the tree signature to settle\nagent-ctrl wait-for ... --timeout 10000 --poll 250  # tune the poll loop\n```\n\nThree reliability tiers. Use `--stable` after a click to let the UI settle before the next action. Exit codes: 0 satisfied, 1 bad args, 2 timeout - branch on those in shell pipelines instead of parsing strings.\n\n### Windows\n\n```bash\nagent-ctrl window-list                            # all top-level windows owned by the pinned process\nagent-ctrl window-list --first-other              # bare hex id of the first non-pinned window\nagent-ctrl focus-window \u003Chex_id>                  # bring a window to the foreground; re-pins the session\nagent-ctrl switch-app \u003Capp_id>                    # foreground by app id (path or bare exe name)\n```\n\nWhen a file dialog, confirmation dialog, or popup appears as a sibling top-level window, `window-list` is how you find it. `focus-window` re-pins so subsequent `snapshot` \u002F `find` \u002F actions target the dialog. Mirrors agent-browser's `tab_list` \u002F `tab_switch`.\n\n```bash\nagent-ctrl press \"Ctrl+S\"                                 # may open a sibling dialog HWND\nagent-ctrl focus-window \"$(agent-ctrl window-list --first-other)\"\nagent-ctrl snapshot                                       # now sees the dialog\nagent-ctrl click \"$(agent-ctrl find \"OK\" --role button --first)\"\n```\n\nFor detailed Windows guidance on dialogs, elevation, stale refs, foreground focus, IME, screenshots, and app framework quirks, see [`docs\u002Fwindows-reliability.md`](docs\u002Fwindows-reliability.md).\n\n### Output\n\n```bash\nagent-ctrl screenshot                            # PNG of the pinned window to a temp path\nagent-ctrl screenshot result.png                 # to a specific path\nagent-ctrl screenshot --region X,Y,W,H           # crop in physical screen pixels\nagent-ctrl screenshot --target desktop           # virtual desktop\nagent-ctrl screenshot --target window            # pinned window\nagent-ctrl screenshot --target ref --ref @eN     # element bounds\nagent-ctrl screenshot --annotated                # draw @eN labels from cached snapshot bounds\n```\n\n`--annotated` draws cached snapshot refs onto the PNG. Run `snapshot` first so the screenshot has a current ref map and bounds.\n\n### Batch\n\n```bash\nagent-ctrl batch --file steps.json\nGet-Content steps.json | agent-ctrl batch --stdin      # PowerShell-friendly\nagent-ctrl batch '[{\"op\":\"find\",\"query\":{\"name\":\"Save\",\"limit\":1}}]'  # Unix-shell-friendly\n```\n\nBatch steps run in order on one daemon session and return structured per-step results. Supported step ops: `act`, `find`, `get`, `is`, `wait`, and `list_windows`.\n\n### JSON mode\n\n```bash\nagent-ctrl list --json\nagent-ctrl find \"Save\" --role button --json\nagent-ctrl get state @eN --json\nagent-ctrl is enabled @eN --json\nagent-ctrl click @eN --json\nagent-ctrl wait-for --stable --json\nagent-ctrl window-list --json\nagent-ctrl screenshot out.png --json\n```\n\nMost runtime commands accept `--json` for machine-readable output. `snapshot --json` returns the full snapshot; `get --json`, `is --json`, action commands, `wait-for --json`, and `window-list --json` return structured protocol results. `batch` output is always JSON, so `batch --json` is accepted as a compatibility no-op.\n\nSession commands redact the TCP auth token from JSON output. `screenshot --json` writes the PNG to disk and prints file metadata (`path`, `width`, `height`, `bytes`, `annotated`) instead of echoing the base64 image payload.\n\nWhen `--json` is present, parse and runtime failures are emitted as one structured object with `ok: false`, `error.code`, `error.message`, and, when available, `error.hint`. Exit codes still matter: 0 means success, 1 means command\u002Frequest failure, and `wait-for --json` keeps exit 2 for timeouts while printing the structured wait outcome.\n\n## Sessions\n\nRun multiple isolated UIA sessions side by side:\n\n```bash\nagent-ctrl open uia --session app1\nagent-ctrl open uia --session app2\n\nagent-ctrl snapshot --session app1 --target-process \u003Cprocess-a>\nagent-ctrl snapshot --session app2 --target-process \u003Cprocess-b>\n\nagent-ctrl list\n# SESSION         SURFACE   PID         ENDPOINT\n# app1            uia       12345       127.0.0.1:54001\n# app2            uia       12346       127.0.0.1:54002\n\nagent-ctrl close --session app1\nagent-ctrl close --session app2\n```\n\nThe default session is `default`, so most commands need no flag. Each session has its own daemon process, pinned target window, cached snapshot, and refs. Session metadata lives at `~\u002F.agent-ctrl\u002F\u003Cname>.json` while the daemon is running. TCP session files include a random per-session auth token, and every CLI TCP request sends it automatically. Stdio daemon clients, including the TypeScript client, do not need a token.\n\n## Mock surface\n\nThe `mock` surface returns a fixed two-button window - handy for testing the protocol without UIA permissions or a target app:\n\n```bash\nagent-ctrl open mock\nagent-ctrl snapshot\nagent-ctrl click @e0\nagent-ctrl close\n```\n\nAvailable on every OS, no setup required. Used by the integration tests under `packages\u002Fclient\u002Ftests\u002F`.\n\n## TypeScript client\n\n```typescript\nimport { AgentCtrl } from \"@agent-ctrl\u002Fclient\";\n\nconst ctrl = new AgentCtrl();              \u002F\u002F spawns `agent-ctrl daemon` over stdio\nconst session = await ctrl.openSession(\"uia\");\n\nawait ctrl.snapshot(session, {\n  target: { by: \"process-name\", name: \"target-app\" },\n});\n\nconst matches = await ctrl.find(session, {\n  name: \"Save\",\n  role: \"button\",\n});\n\nawait ctrl.act(session, { kind: \"click\", ref_id: matches[0]!.ref_id });\n\nconst outcome = await ctrl.waitFor(session, {\n  predicate: { kind: \"stable\", idle_ms: 500 },\n  timeout_ms: 5000,\n  poll_ms: 250,\n});\n\nawait ctrl.closeSession(session);\nawait ctrl.close();\n```\n\nMethod surface: `openSession`, `snapshot`, `act`, `find`, `waitFor`, `listWindows`, `closeSession`, `close`. Both transports (shell CLI and stdio TypeScript) talk the same wire protocol; agents can mix and match.\n\nSee [packages\u002Fclient\u002FREADME.md](packages\u002Fclient\u002FREADME.md) for the full API.\n\n## Architecture\n\nagent-ctrl uses a client-daemon architecture mirroring agent-browser:\n\n1. **Rust CLI** (`crates\u002Fcli`) - parses commands, dials the daemon, prints results.\n2. **Rust daemon** (`crates\u002Fdaemon`) - long-running process that owns surface sessions and dispatches snapshot \u002F action \u002F find \u002F wait \u002F list-windows requests.\n3. **Surface trait** (`crates\u002Fcore`) - cross-platform contract every backend implements. Per-platform crates (`crates\u002Fsurface-uia`, `surface-ax`) provide the implementations, gated by `target_os`.\n\nThe daemon starts via `agent-ctrl open \u003Csurface>` and persists across CLI invocations for fast subsequent operations. Each session has its own daemon process and writes a discovery file at `~\u002F.agent-ctrl\u002F\u003Csession>.json`.\n\n## Workspace layout\n\nThe repository is a **dual workspace** - a Cargo workspace for the Rust engine and an npm workspace for the TypeScript client.\n\n| Crate \u002F package | Purpose |\n|---|---|\n| [`crates\u002Fcore`](crates\u002Fcore) | Shared types and the `Surface` trait. Schema, role taxonomy, action vocabulary, errors. |\n| [`crates\u002Fdaemon`](crates\u002Fdaemon) | Long-running process that owns surface sessions and dispatches actions. |\n| [`crates\u002Fcli`](crates\u002Fcli) | The `agent-ctrl` binary - user-facing entrypoint. |\n| [`crates\u002Fsurface-uia`](crates\u002Fsurface-uia) | Windows UI Automation surface (Windows-only). |\n| [`crates\u002Fuia-fixture`](crates\u002Fuia-fixture) | Deterministic native Win32 fixture app for UIA reliability tests. |\n| [`crates\u002Fsurface-ax`](crates\u002Fsurface-ax) | macOS Accessibility surface (full action vocabulary; macOS-only). |\n| [`crates\u002Fax-fixture`](crates\u002Fax-fixture) | Deterministic native Cocoa fixture app for AX reliability tests. |\n| [`packages\u002Fclient`](packages\u002Fclient) | `@agent-ctrl\u002Fclient` - typed TypeScript wrapper over stdio JSON-RPC. |\n\nSurfaces gated by `target_os` compile to empty crates on other platforms, so the workspace builds on any host.\n\n## Platforms\n\nA **surface** is one accessibility protocol - UIA, AX, AT-SPI, etc. A **platform** is an operating system. They aren't 1-to-1: most platforms can be driven by more than one surface.\n\n| Platform | Native surface | Status |\n|---|---|---|\n| Windows | [`surface-uia`](crates\u002Fsurface-uia) - UI Automation | **ready** |\n| macOS | [`surface-ax`](crates\u002Fsurface-ax) - Accessibility \u002F AX | **ready** |\n| Linux | _planned_ `surface-atspi` (AT-SPI \u002F D-Bus) | not started |\n| Android | _planned_ `surface-accessibility-service` (JNI) | not started |\n| iOS | _planned_ `surface-xcuitest` (WebDriverAgent) | not started |\n\nFor browsers, run agent-ctrl alongside [agent-browser](https:\u002F\u002Fgithub.com\u002Fvercel-labs\u002Fagent-browser); the two are complementary, not competing.\n\nAcronyms in one line: **UIA** = Microsoft UI Automation, **AX** = macOS Accessibility, **AT-SPI** = the Linux GNOME accessibility bus, **XCUITest** = Apple's UI test framework.\n\nAX feature coverage is in [docs\u002Fmacos-ax.md](docs\u002Fmacos-ax.md); production\nguidance for macOS lives in [docs\u002Fmacos-ax-reliability.md](docs\u002Fmacos-ax-reliability.md).\nWindows production guidance lives in [docs\u002Fwindows-reliability.md](docs\u002Fwindows-reliability.md).\n\n## Build\n\n```bash\ncargo check --workspace                          # fast type-check\ncargo build --release -p agent-ctrl-cli          # the binary\ncargo test --workspace                           # all unit + integration tests\ncargo clippy --workspace --all-targets -- -D warnings   # lint, fail on warnings\ncargo fmt --all -- --check                       # format check\n```\n\nWindows UIA fixture:\n\n```powershell\ncargo build -p agent-ctrl-cli -p agent-ctrl-uia-fixture\n.\\target\\debug\\agent-ctrl-uia-fixture.exe --ready-file \"$env:TEMP\\agent-ctrl-fixture.ready\"\n.\\target\\debug\\agent-ctrl.exe open uia --session fixture\n.\\target\\debug\\agent-ctrl.exe snapshot --session fixture --target-process agent-ctrl-uia-fixture\n```\n\nThe fixture is the preferred real-UIA test target. It exposes common native controls through stable Win32\u002FUIA patterns so tests do not depend on Notepad, Calculator, localized strings, or Windows-version-specific app redesigns.\n\nOpt-in fixture integration test:\n\n```powershell\ncargo build -p agent-ctrl-uia-fixture\n$env:RUN_UIA_TESTS = \"1\"\ncargo test -p agent-ctrl-cli --test windows_uia_fixture\n```\n\nSuccessful UIA actions may print a method diagnostic such as `ok method=keyboard-space`, `ok method=selection-item-pattern`, or `ok method=toggle-pattern`. These are intended for agents and humans debugging cross-app behavior.\n\nmacOS AX fixture:\n\n```bash\ncargo build -p agent-ctrl-cli -p agent-ctrl-ax-fixture\ntarget\u002Fdebug\u002Fagent-ctrl-ax-fixture --ready-file \u002Ftmp\u002Fagent-ctrl-ax-fixture.ready &\ntarget\u002Fdebug\u002Fagent-ctrl open ax --session fixture\ntarget\u002Fdebug\u002Fagent-ctrl snapshot --session fixture --target-process agent-ctrl-ax-fixture\n```\n\nOpt-in AX fixture integration test:\n\n```bash\ncargo build -p agent-ctrl-ax-fixture\nRUN_AX_TESTS=1 cargo test -p agent-ctrl-cli --test macos_ax_fixture\n```\n\nThe AX fixture covers the deterministic macOS loop for snapshots, `find`,\n`click`, `fill`, `check`, `uncheck`, `toggle`, and `window-list`. Keyboard\nactions exist, but are still validated manually because host focus and event-tap\nbehavior can vary under the Rust test harness.\n\nTypeScript client:\n\n```bash\nnpm install\nnpm run build --workspace=@agent-ctrl\u002Fclient\nnpm run test  --workspace=@agent-ctrl\u002Fclient     # spawns the Rust daemon under cargo run\n```\n\nThe TS test suite spawns the Rust daemon under `cargo run` and exercises the full protocol against the mock surface - including `find`, `waitFor`, and `listWindows`.\n\n## Usage with AI agents\n\n### Just ask the agent\n\nThe simplest approach - tell your agent it can use it:\n\n```\nUse agent-ctrl to drive Windows apps. Run `agent-ctrl --help` to see the command list,\nand `agent-ctrl info` to check what's available on this machine.\n```\n\nThe `--help` output is comprehensive and most modern agents can figure out the rest from there.\n\n### AGENTS.md \u002F CLAUDE.md\n\nFor consistent results, add to your project or global instructions:\n\n```markdown\n## OS automation\n\nUse `agent-ctrl` for native UI automation on Windows and macOS. Core workflow:\n\n1. `agent-ctrl open uia` (Windows) or `agent-ctrl open ax` (macOS) - spawn a daemon\n2. `agent-ctrl snapshot --target-process \u003Cname>` - pin to the app and capture refs\n3. `agent-ctrl find \"Save\" --role button --first` - discover refs by name\u002Frole\n4. `agent-ctrl click @eN` \u002F `fill @eN \"text\"` \u002F `press \"Ctrl+S\"` (or `Cmd+S` on macOS) - interact\n5. `agent-ctrl wait-for --stable` - let the UI settle before the next action\n6. `agent-ctrl window-list` + `focus-window \u003Cid>` - switch to dialogs \u002F popups\n7. Re-`snapshot` after the tree changes\n```\n\n### Example flow\n\nThe recommended pattern is app-agnostic: launch or focus the target, snapshot by\nprocess\u002Fwindow, find by role\u002Fname, act, wait for stability, and re-snapshot after\nthe tree changes. A concrete Notepad walkthrough is available in\n[examples\u002Fnotepad-tour.sh](examples\u002Fnotepad-tour.sh), but production agents should\nprefer the generic loop above over app-specific assumptions.\n\n## Known limitations\n\nThese are real today - the goal is to fix or document them as the project matures.\n\n- **Windows and macOS are the supported surfaces.** Linux \u002F Android \u002F iOS \u002F browser flows are not implemented in this project yet. macOS additionally requires Screen Recording permission for `screenshot` and may require Automation permission for some Apple system apps (Notes, Calendar, Music) - see [docs\u002Fmacos-ax-reliability.md](docs\u002Fmacos-ax-reliability.md).\n- **Local TCP daemon auth is developer-machine scoped.** TCP session files include a random bearer token and the daemon rejects missing or incorrect tokens, but anyone who can read `~\u002F.agent-ctrl\u002F\u003Csession>.json` can still use that session. Treat sessions as a local developer-machine boundary, not a multi-user security sandbox.\n- **Refs are valid only against the snapshot that produced them.** If `wait-for` runs in parallel with another command on the same session (across two shells), the wait loop refreshes the cached refs on each poll, and a previously-issued ref may resolve to a different element. Sequential CLI usage in one shell - the realistic flow - doesn't trip this.\n- **Modern Win11 file dialogs and popup menus open as sibling top-level windows**, not as children of the app's main window. Use `window-list` + `focus-window` to discover and switch to them.\n- **`type` bypasses IME.** Synthetic Unicode keystrokes via `SendInput` are reliable for ASCII; CJK with IME composition is not supported yet. `fill` (UIA `ValuePattern`) is the right escape hatch for non-ASCII text input.\n- **HWND recycling.** Windows reassigns numeric HWNDs after a window closes; `window-list` shows whatever currently holds an id, with no UIA-runtime-id verification. Theoretical, never observed in practice.\n\n## License\n\nApache-2.0. See [LICENSE](LICENSE).\n","agent-ctrl 是一个用于桌面自动化的命令行工具，专为AI代理设计，基于Rust语言开发，提供快速的原生执行体验。其核心功能包括通过命令行界面控制和自动化桌面应用程序，支持Windows和macOS系统上的UIA和AX接口，允许用户以编程方式与操作系统交互。该项目不涵盖浏览器自动化，对于需要在Chromium上工作的场景推荐使用配套项目agent-browser。适合希望提高工作效率、进行软件测试或构建涉及复杂桌面操作的AI应用开发者使用。安装简单，支持npm全局安装以及直接下载对应平台的二进制文件。",2,"2026-06-11 03:31:16","CREATED_QUERY"]