[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80521":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":46,"readmeContent":47,"aiSummary":48,"trendingCount":16,"starSnapshotCount":16,"syncStatus":14,"lastSyncTime":49,"discoverSource":50},80521,"computer-use-linux","agent-sh\u002Fcomputer-use-linux","agent-sh","Linux desktop control over MCP — AT-SPI, GNOME Shell, Wayland portals, ydotool","",null,"Rust",162,15,2,3,0,4,32,89,16,76.51,"MIT License",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45],"agent","ai","cargo","claude","codex","computer-use","gnome","hermes","linux","llm","mcp","npm","oss","rust","shell","skill","tools","wayland","ydotool","2026-06-12 04:01:28","\u003Cdiv align=\"center\">\n  \u003Ch1>computer-use-linux\u003C\u002Fh1>\n  \u003Cp>\u003Cstrong>Control a real Linux desktop from any MCP host.\u003C\u002Fstrong>\u003C\u002Fp>\n  \u003Cp>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fagent-sh\u002Fcomputer-use-linux\u002Factions\u002Fworkflows\u002Fci.yml\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fagent-sh\u002Fcomputer-use-linux\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg\" alt=\"CI\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcrates.io\u002Fcrates\u002Fcomputer-use-linux\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fcomputer-use-linux.svg\" alt=\"crates.io\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002F@agent-sh\u002Fcomputer-use-linux\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fv\u002F@agent-sh\u002Fcomputer-use-linux.svg\" alt=\"npm\">\u003C\u002Fa>\n    \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg\" alt=\"License: MIT\">\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n`computer-use-linux` reads accessibility trees, takes screenshots, and drives clicks, scrolls, and keystrokes across GNOME, KDE\u002FKWin, Hyprland, i3, and COSMIC — Wayland-first, X11 best-effort.\n\n```bash\nnpm install -g @agent-sh\u002Fcomputer-use-linux\ncomputer-use-linux doctor | jq .readiness\n```\n\nThe Rust crate is published as [`computer-use-linux`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fcomputer-use-linux) and the npm wrapper as [`@agent-sh\u002Fcomputer-use-linux`](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002F@agent-sh\u002Fcomputer-use-linux). Prebuilt binaries ship with the [latest release](https:\u002F\u002Fgithub.com\u002Fagent-sh\u002Fcomputer-use-linux\u002Freleases\u002Flatest).\n\n## What this is\n\n`computer-use-linux` is a Rust MCP server and CLI for Linux desktop control. The crate ships the main `computer-use-linux` binary plus a small `computer-use-linux-cosmic` helper used only for COSMIC Wayland window management. Any MCP host — Codex Desktop's Linux build, Claude Desktop, [Hermes Agent](https:\u002F\u002Fgithub.com\u002FNousResearch\u002Fhermes-agent), or your own client — can spawn it and gain full control of the local Linux desktop: read accessibility trees, list and focus windows, take screenshots, click, drag, scroll, type, and invoke semantic accessibility actions.\n\nMost computer-use MCP servers are macOS-only (they lean on AppKit, AXUIElement, CGEvent). The few that target Linux either drive `xdotool` against an X11 root window or shell out to OCR over screenshots. Four things set this one apart:\n\n- **Wayland actually works.** Pointer actions can use the `org.freedesktop.portal.RemoteDesktop` interface on Wayland, with `ydotool` \u002F `ydotoold` (uinput) as the deterministic fallback and keyboard\u002Ftext path. Screenshots use the GNOME Shell DBus screenshot method when present and `org.freedesktop.portal.Screenshot` otherwise.\n- **Window targeting is compositor-aware.** The window registry tries GNOME Shell extension, GNOME Shell Introspect, COSMIC Wayland helper, KWin DBus scripting, Hyprland `hyprctl`, and i3 IPC in order, then reports exactly which backend won or why each backend failed.\n- **Semantic selectors, not pixel coordinates.** Tools like `click`, `perform_action`, and `set_value` accept `role` \u002F `name` \u002F `text` \u002F `states` selectors backed by AT-SPI. Pixel coordinates remain available as a fallback for rendering-only surfaces (canvas, games, X clients without ATK).\n- **One JSON readiness report.** `computer-use-linux doctor` returns a structured document covering platform, portals, AT-SPI, windowing, input, and a `readiness` summary with explicit blockers and a recommended next step. MCP hosts can render or surface that to the user without parsing prose.\n\nThe crate was extracted from [`codex-desktop-linux`](https:\u002F\u002Fgithub.com\u002Favifenesh\u002Fcodex-desktop-linux) (the Linux distribution of Codex Desktop), which still bundles this binary as a built-in plugin. This standalone repo is the upstream.\n\n## Features\n\nMCP tools exposed by the server:\n\n**Diagnostics**\n- `doctor` — single-shot JSON readiness report (platform, portals, accessibility, windowing, input, readiness summary, and a capability map of available backends)\n- `setup_accessibility` — enables GNOME's `org.gnome.desktop.interface toolkit-accessibility` setting so toolkit apps expose AT-SPI trees\n- `setup_window_targeting` — installs and enables the bundled GNOME Shell extension when `org.gnome.Shell.Introspect` is locked down\n\n**Discovery**\n- `list_apps` — running desktop apps visible to the AT-SPI registry\n- `list_windows` — compositor windows with title, app id, wm_class, focus state, client type (Wayland\u002FX11), and bounds\n- `focused_window` — the window currently holding keyboard focus\n- `get_app_state` — combined screenshot + accessibility tree for a chosen app, with element indices that the input tools accept\n- `screenshot` — capture the screen as a PNG; can target a window, which is raised to the front and cropped to just that window\n\n**Input**\n- `click` — by element index, semantic selector, or pixel coordinates\n- `drag` — pixel-coordinate drag (start \u002F end)\n- `scroll` — page-based scroll on an element or at a pixel location\n- `press_key` — keys \u002F chords; can focus a window or terminal first\n- `type_text` — literal text input, optionally targeted at a window or terminal\n\n**Semantic actions**\n- `perform_action` — invoke any AT-SPI action exposed by an element (`Press`, `Activate`, `Toggle`, …); defaults to the primary action\n- `set_value` — write to a settable accessibility element (text fields, sliders, spinners)\n\n**Navigation**\n- `activate_window` — focus a window by `window_id`, `pid`, `app_id`, `wm_class`, `title`, or terminal selectors\n\n### MCP safety contract\n\n`computer-use-linux` is not a read-only data source. It can observe the local desktop and, when a mutating tool is called, can change real application state. The `tools\u002Flist` response includes MCP `ToolAnnotations` so hosts can surface this distinction before invocation:\n\n| Class | Tools | Contract |\n| --- | --- | --- |\n| Read-only observation | `doctor`, `list_apps`, `list_windows`, `focused_window`, `get_app_state` | `readOnlyHint=true`; may reveal app, window, accessibility, and screenshot contents. `get_app_state` may trigger the desktop screenshot portal prompt. |\n| Local setup mutators | `setup_accessibility`, `setup_window_targeting` | `readOnlyHint=false`, `destructiveHint=false`, `idempotentHint=true`; modifies user desktop configuration by enabling accessibility or installing\u002Fenabling the GNOME window-targeting extension. |\n| UI state mutators | `activate_window`, `scroll`, `screenshot` | `readOnlyHint=false`, `destructiveHint=false`; changes focus or scroll position in the live desktop, or raises a window to capture it. |\n| Desktop action mutators | `click`, `drag`, `press_key`, `type_text`, `perform_action`, `set_value` | `readOnlyHint=false`, `destructiveHint=true`, `openWorldHint=true`; can trigger arbitrary actions in whatever local application is targeted. |\n\nAnnotations are safety hints, not an authorization system. MCP hosts should still ask the user before calls that could submit, delete, send, purchase, overwrite, or otherwise commit state.\n\nThe binary also exposes the same capabilities from the CLI for scripting and debugging:\n\n```\ncomputer-use-linux mcp                                  # stdio MCP server\ncomputer-use-linux doctor                               # JSON readiness report\ncomputer-use-linux setup                                # enable AT-SPI\ncomputer-use-linux setup-window-targeting               # install GNOME Shell extension\ncomputer-use-linux apps\ncomputer-use-linux state [APP_NAME]\ncomputer-use-linux screenshot                           # JSON screenshot summary\ncomputer-use-linux windows\n```\n\n## Support matrix\n\nValidated manually on Ubuntu 25.10 (GNOME Shell 50.1, Wayland). Other compositor backends are implemented and covered by parser \u002F contract tests, but real desktop behavior still depends on each session exposing its expected control API.\n\n| Desktop\u002Fsession | Window backend | Notes |\n| --- | --- | --- |\n| GNOME Wayland | GNOME Shell extension first, `org.gnome.Shell.Introspect` fallback | Full target. The extension provides exact window activation when GNOME blocks native introspection; Introspect can list windows and focus apps by `app_id` when allowed. |\n| GNOME X11 | `org.gnome.Shell.Introspect` when allowed | AT-SPI and `ydotool` work; the bundled GNOME Shell extension is only needed for GNOME Wayland. Exact per-window focus may be unavailable without the extension backend. |\n| KDE Plasma \u002F KWin | temporary KWin DBus scripting | Lists and focuses windows through `org.kde.KWin` scripting when the session bus exposes it. |\n| Hyprland | `hyprctl clients -j` and `hyprctl dispatch focuswindow` | Requires `hyprctl` in the desktop session. |\n| i3 | `i3-msg`; optional `xprop` for PID hydration | Lists and focuses i3 windows over the active i3 IPC socket. |\n| COSMIC Wayland | `computer-use-linux-cosmic` helper | Installed automatically by `.\u002Finstall.sh`, `cargo install`, and npm. For custom\u002Fmanual layouts, put the helper next to the main binary, on `PATH`, or point `COMPUTER_USE_LINUX_COSMIC_HELPER` at it. |\n| Sway \u002F generic wlroots | no dedicated backend yet | AT-SPI, screenshots, and global `ydotool` input can still work; exact window list\u002Ffocus is currently unavailable unless another backend applies. |\n| Generic X11 \u002F XFCE \u002F other WMs | no dedicated backend yet | AT-SPI plus `ydotool` global input only, unless running under i3. |\n\nIf you run on a desktop not covered above, or a covered backend does not come up cleanly, please open an issue with the output of `computer-use-linux doctor` so we can extend the matrix honestly.\n\n## Install\n\nCOSMIC users do not need a second package or a separate helper install when using `.\u002Finstall.sh`, `cargo install`, or the npm wrapper. Those paths install `computer-use-linux-cosmic` alongside the main binary automatically. Only manual prebuilt-binary installs need you to copy both release assets.\n\n### Option A — `.\u002Finstall.sh` from a clone\n\nInstalls system packages on Debian\u002FUbuntu, Fedora\u002FRHEL-like, or Arch-like distros; installs Rust if needed; builds both release binaries; installs them to `~\u002F.local\u002Fbin`; enables `ydotoold` as a user service; enables GNOME AT-SPI settings when running under GNOME; and installs the bundled GNOME Shell extension on GNOME Wayland.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fagent-sh\u002Fcomputer-use-linux\ncd computer-use-linux\n.\u002Finstall.sh\n# log out and back in if the GNOME extension was newly installed\ncomputer-use-linux doctor | jq .readiness\n```\n\n### Option B — `cargo install` (Rust binaries, no system setup)\n\nInstalls the Rust binaries from crates.io. You still handle the system-level pieces yourself: `ydotoold`, AT-SPI, desktop portals, and the GNOME extension if you need the GNOME Wayland exact-focus backend.\n\n```bash\ncargo install computer-use-linux\ncomputer-use-linux doctor\n```\n\nFor unreleased changes from `main`, install directly from Git:\n\n```bash\ncargo install --git https:\u002F\u002Fgithub.com\u002Fagent-sh\u002Fcomputer-use-linux\n```\n\nThen, as needed:\n\n```bash\nsudo apt install ydotool at-spi2-core         # or your distro's equivalent\nsystemctl --user enable --now ydotoold\ncomputer-use-linux setup                      # gsettings AT-SPI bridge\ncomputer-use-linux setup-window-targeting     # GNOME Shell extension\n```\n\n### Option C — npm wrapper (binary download)\n\nGood for users who already have Node.js and want a no-Rust install. The npm package downloads and verifies the matching main and COSMIC helper binaries during install, then the wrapper sets `COMPUTER_USE_LINUX_COSMIC_HELPER` to the bundled helper automatically.\n\n```bash\nnpm install -g @agent-sh\u002Fcomputer-use-linux\ncomputer-use-linux doctor\n```\n\nYou will still need `ydotoold` running and AT-SPI enabled (run `computer-use-linux setup` and the systemd commands above).\n\n### Option D — prebuilt binaries\n\nLinux x86_64 \u002F aarch64 builds are published with each tag. Each binary ships a `.sha256` next to it.\n\n- Latest release: \u003Chttps:\u002F\u002Fgithub.com\u002Fagent-sh\u002Fcomputer-use-linux\u002Freleases\u002Flatest>\n\n```bash\ntarget=x86_64-unknown-linux-gnu\nbase=https:\u002F\u002Fgithub.com\u002Fagent-sh\u002Fcomputer-use-linux\u002Freleases\u002Flatest\u002Fdownload\nfor binary in computer-use-linux computer-use-linux-cosmic; do\n  asset=\"$binary-$target\"\n  curl -L -O \"$base\u002F$asset\"\n  curl -L -O \"$base\u002F$asset.sha256\"\n  sha256sum -c \"$asset.sha256\"\n  install -m 0755 \"$asset\" \"$HOME\u002F.local\u002Fbin\u002F$binary\"\ndone\n```\n\nYou will still need `ydotoold` running and AT-SPI enabled (run `computer-use-linux setup` and the systemd commands above).\n\n## Wire it into your MCP host\n\nThe binary speaks the `rmcp` 2024-11-05 stdio protocol. Pass `mcp` as the only argument; everything else is configured through MCP tool calls.\n\n### Codex Desktop (Linux build)\n\nThe Linux build of Codex Desktop already bundles this binary as a plugin. You don't need to wire it up manually — the plugin definition lives in [`codex-desktop-linux`](https:\u002F\u002Fgithub.com\u002Favifenesh\u002Fcodex-desktop-linux) under its `plugins\u002F` directory and is enabled by default. To upgrade the plugin in place, replace the binary it ships with the one from this repo's release assets.\n\n### Claude Code (CLI)\n\nUse the `claude mcp add` command to register the binary as a stdio MCP server. Pick a scope:\n\n- `--scope user` — available across all projects for your user.\n- `--scope project` — written to `.mcp.json` at the project root for team sharing.\n- `--scope local` (default) — only the current project, stored in `~\u002F.claude.json`.\n\n```bash\n# User-wide install (recommended for desktop control)\nclaude mcp add --scope user computer-use-linux -- computer-use-linux mcp\n\n# Verify the server is registered and reachable\nclaude mcp list\n```\n\nIf `computer-use-linux` is not on `PATH`, pass the absolute path (e.g. `~\u002F.local\u002Fbin\u002Fcomputer-use-linux`). Inside a Claude Code session, run `\u002Fmcp` to confirm the tools are loaded.\n\n### Claude Desktop\n\nEdit `~\u002F.config\u002FClaude\u002Fclaude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"computer-use-linux\": {\n      \"command\": \"computer-use-linux\",\n      \"args\": [\"mcp\"]\n    }\n  }\n}\n```\n\nRestart Claude Desktop. The tools should appear in the tools list.\n\n### Hermes Agent\n\nInstall the companion Hermes skill so Hermes has the desktop-specific runbook:\n\n```bash\nhermes skills tap add agent-sh\u002Fcomputer-use-linux\nhermes skills install agent-sh\u002Fcomputer-use-linux\u002Fcomputer-use-linux\n```\n\nThe skill is optional but recommended for Hermes users. It teaches Hermes how to install, configure, verify, and call the Linux desktop MCP safely. It follows the same `skills\u002F\u003Cname>\u002FSKILL.md` tap layout used by Hermes community skills.\n\nThen add the stdio MCP server:\n\n```bash\nhermes mcp add computer-use-linux --command computer-use-linux --args mcp\nhermes mcp test computer-use-linux\nhermes mcp configure computer-use-linux\n```\n\n`configure` opens Hermes' tool-selection UI for the server. The generated config should look like this:\n\n```yaml\nmcp_servers:\n  computer-use-linux:\n    command: computer-use-linux\n    args: [\"mcp\"]\n    timeout: 120\n    connect_timeout: 30\n\n# Optional: expose the tools to subagents as well.\ninherit_mcp_toolsets: true\n```\n\nIf you installed the binary somewhere that is not on `PATH`, pass the absolute path as `--command`.\n\nRestart Hermes after editing the config. Hermes registers the tools as `mcp_computer_use_linux_\u003Ctool>` and creates the `mcp-computer-use-linux` runtime toolset.\n\nYou can verify both sides before asking Hermes to use the desktop:\n\n```bash\ncomputer-use-linux doctor | jq .readiness\nhermes skills inspect agent-sh\u002Fcomputer-use-linux\u002Fcomputer-use-linux\nhermes chat --toolsets mcp-computer-use-linux -q \"List the current desktop windows.\"\n```\n\nFor one-off installs without adding the tap first, Hermes also accepts `hermes skills install agent-sh\u002Fcomputer-use-linux\u002Fskills\u002Fcomputer-use-linux`.\n\n### Generic MCP client\n\nSpawn the binary with `[\"mcp\"]` as the argv tail. It speaks JSON-RPC over stdio per the rmcp 2024-11-05 protocol; capability discovery happens through `tools\u002Flist` and the `doctor` tool. The server normally needs no MCP-specific configuration, but desktop runtime environment still matters (`DBUS_SESSION_BUS_ADDRESS`, `XDG_RUNTIME_DIR`, portals, AT-SPI, `ydotoold`, and optionally `COMPUTER_USE_LINUX_COSMIC_HELPER`).\n\n## First-run checklist\n\n1. **Run `doctor`.**\n\n   ```bash\n   computer-use-linux doctor | jq .readiness\n   ```\n\n   Aim for `can_register_mcp_tools`, `can_build_accessibility_tree`, `can_send_development_input`, and `can_query_windows` all `true`. The `blockers` array should be empty.\n\n2. **If `accessibility.at_spi_bus.ok = false`** — run `computer-use-linux setup` (or call the `setup_accessibility` MCP tool). This sets:\n   - `org.gnome.desktop.interface toolkit-accessibility true`\n\n   You may need to restart toolkit-using apps for the change to take effect.\n\n3. **If `windowing.can_list_windows = false`** — inspect `doctor.windowing.backends`. On GNOME Wayland, run `computer-use-linux setup-window-targeting` (or call `setup_window_targeting`) to install the bundled `computer-use-linux@avifenesh.dev` Shell extension, then log out and back in so GNOME Shell loads it. On KDE, Hyprland, i3, or COSMIC, install or expose the matching compositor tool\u002Fhelper shown in the backend details.\n\n4. **Grant the screencast portal on first screenshot.** The first time `get_app_state` or any screenshot subcommand runs, GNOME will pop a portal dialog asking to share the screen. Accept once and tick \"remember\" to make it sticky for the session.\n\n5. **Confirm `ydotoold` is running.**\n\n   ```bash\n   systemctl --user status ydotoold\n   ```\n\n   Its socket should appear at `\u002Frun\u002Fuser\u002F$UID\u002F.ydotool_socket`.\n\n## Environment variables\n\nMost setups need none of these — `doctor` and the installers pick sensible defaults. They exist for overriding auto-detected paths and input backends.\n\n**Server runtime** (set in the MCP host's environment):\n\n| Variable | Effect |\n| --- | --- |\n| `COMPUTER_USE_LINUX_COSMIC_HELPER` | Path to the `computer-use-linux-cosmic` helper when it isn't next to the binary or on `PATH` (`CODEX_COMPUTER_USE_COSMIC_HELPER` is also accepted by embedded Codex builds). |\n| `CU_DISABLE_ABS_POINTER` | Disable the uinput absolute pointer and click through `ydotool` instead (for setups where the abs-pointer device misbehaves); embedded Codex builds may use `CODEX_COMPUTER_USE_DISABLE_ABS_POINTER`. |\n| `COMPUTER_USE_LINUX_FORCE_PORTAL_POINTER` \u002F `…_KEYBOARD` | Always route pointer \u002F keyboard through the RemoteDesktop portal on Wayland, skipping auto-detection; embedded Codex builds may use `CODEX_COMPUTER_USE_FORCE_PORTAL_POINTER` \u002F `…_KEYBOARD`. |\n| `COMPUTER_USE_LINUX_FORCE_YDOTOOL_POINTER` \u002F `…_KEYBOARD` | Always route pointer \u002F keyboard through `ydotool`, skipping the portal and KDE clipboard paths; embedded Codex builds may use `CODEX_COMPUTER_USE_FORCE_YDOTOOL_POINTER` \u002F `…_KEYBOARD`. |\n\n**Build-time identity overrides** (set while compiling a downstream embedded\nbundle): `CUL_GNOME_EXTENSION_UUID`, `CUL_DBUS_SERVICE`, and\n`CUL_DBUS_OBJECT_PATH` replace the default standalone GNOME Shell extension\nUUID and DBus endpoint in both the Rust probes and the generated extension\nfiles.\n\n**npm wrapper** (set during `npm install`, or before running):\n\n| Variable | Effect |\n| --- | --- |\n| `COMPUTER_USE_LINUX_BIN` | Run this binary instead of the one bundled by the npm package. |\n| `COMPUTER_USE_LINUX_DOWNLOAD_BASE` | Override the GitHub release base URL the installer downloads from (mirrors, air-gapped hosts). |\n| `COMPUTER_USE_LINUX_SKIP_DOWNLOAD=1` | Skip the post-install binary download entirely. |\n| `COMPUTER_USE_LINUX_LOCAL_BINARY` \u002F `…_LOCAL_COSMIC_HELPER` | Install from a local build instead of downloading (used by CI and local testing). |\n\n## Architecture\n\n- **Accessibility tree** — [`atspi`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fatspi) crate (tokio backend) talks to the AT-SPI registry on the user session bus. The tree is flattened to `(role, name, text, states, bounds)` tuples and indexed; element indices are stable for the duration of a `get_app_state` snapshot.\n- **DBus where desktops expose it** — [`zbus`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fzbus) for portal calls (`org.freedesktop.portal.Screenshot`, `…RemoteDesktop`, `…ScreenCast`), GNOME Shell screenshots (`org.gnome.Shell.Screenshot`), the bundled GNOME extension's `dev.avifenesh.ComputerUseLinux.WindowControl` service, and temporary KWin scripting.\n- **MCP transport** — [`rmcp`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Frmcp) with the `transport-io` feature; stdio framing, no network.\n- **Input fallback** — when the remote-desktop portal isn't available or the host wants deterministic injection, the binary writes to `ydotoold`'s socket, which writes to `\u002Fdev\u002Fuinput`. `install.sh` can configure `ydotoold`; the `setup` command only enables the GNOME AT-SPI bridge.\n- **Window registry** — `list_windows`, `focused_window`, `activate_window`, `press_key`, and `type_text` share a backend registry. It tries GNOME extension, GNOME Introspect, COSMIC helper, KWin scripting, Hyprland `hyprctl`, and i3 IPC in that order, skipping empty or failed backends so another compositor backend can answer.\n- **GNOME extension fallback** — recent GNOME builds deny `org.gnome.Shell.Introspect.GetWindows` to non-blessed clients. The bundled Shell extension exposes window data and exact activation under `dev.avifenesh.ComputerUseLinux.WindowControl`.\n- **COSMIC helper** — `computer-use-linux-cosmic` talks to COSMIC toplevel protocols and is resolved from `COMPUTER_USE_LINUX_COSMIC_HELPER`, next to the running binary, or from `PATH`.\n- **Terminal enrichment** — `list_windows` cross-references each terminal window with its controlling TTY and the foreground process on that TTY, so `type_text` \u002F `press_key` can target \"the terminal where `pytest` is running\" without the host ever knowing the window id.\n\n## Security\n\nComputer-use tooling is, by definition, a privilege-escalation surface. The threat model:\n\n- **`ydotoold` runs as a per-user systemd service** with read\u002Fwrite access to `\u002Fdev\u002Fuinput`. Any process that can connect to its socket (`\u002Frun\u002Fuser\u002F$UID\u002F.ydotool_socket`, mode `0600` by default) can synthesize arbitrary input — keypresses, clicks, anything. Keep the socket in the user runtime dir (the default), not in `\u002Ftmp` or any world-readable location. Do not run `ydotoold` as a system service.\n- **The screencast portal asks for permission once per session.** Granting it lets the calling MCP host capture the screen for the rest of the session. If you don't want that, decline the portal dialog and use `get_app_state` with `include_screenshot: false`.\n- **AT-SPI exposes window contents to any client on your session bus.** Enabling the AT-SPI bridge (`setup_accessibility`) is a prerequisite for this binary; it's also what screen readers use, and it shares the same trust boundary.\n- **The GNOME Shell extension** is loaded only into your user's GNOME Shell, runs in the Shell's JS sandbox, and exposes a single DBus interface on the user session bus. It does not request any extra permissions.\n- **No network.** This binary opens no TCP\u002FUDP listener, makes no outbound Internet connections, and ships no telemetry. It does use local session transports such as DBus and the per-user `ydotoold` Unix socket.\n- **Mutating tools are explicit.** The MCP tool list annotates read-only versus mutating tools, and CI fails if the published tool annotations drift from the table above. Treat those annotations as hints; the host is still responsible for user approval and policy.\n\nIf you're running this on a shared workstation, set `ydotoold`'s socket permissions to `0600` (the default) and audit which processes on your user can `connect()` to it.\n\n## Troubleshooting\n\n`computer-use-linux doctor` is the source of truth. Common failure modes and fixes:\n\n- **`accessibility.at_spi_bus.ok = false`** — AT-SPI registry isn't running or the toolkit bridge is off. Fix: `computer-use-linux setup` (or call the `setup_accessibility` MCP tool). Restart the apps you want to drive.\n- **`windowing.gnome_shell_introspect.ok = false` and `gnome_shell_extension_dbus.ok = false`** — GNOME blocks introspection and the extension isn't installed. Fix: `computer-use-linux setup-window-targeting`, then log out and log back in.\n- **`input.ydotool_socket.ok = false`** — daemon isn't running. Fix: `systemctl --user enable --now ydotoold`. If the unit doesn't exist, install the `ydotool` package and rerun `.\u002Finstall.sh` (or copy the unit from `systemd\u002Fydotoold.service` in this repo).\n- **`input.uinput.ok = false`** — `\u002Fdev\u002Fuinput` isn't accessible to your user. Fix: add yourself to the `input` group (`sudo usermod -aG input $USER`) and re-login. On distros that ship `uinput` as a kernel module without auto-loading it, add `uinput` to `\u002Fetc\u002Fmodules-load.d\u002F`.\n- **Portal calls hang or time out** — `xdg-desktop-portal` or its backend (`-gnome`, `-gtk`, `-kde`, `-wlr`) crashed. Fix: check `journalctl --user -u xdg-desktop-portal -u xdg-desktop-portal-gnome --since '5 min ago'` and restart the relevant unit.\n- **KWin \u002F Hyprland \u002F i3 \u002F COSMIC windowing is unavailable** — check `doctor.windowing.backends`. KWin needs session-bus scripting; Hyprland needs `hyprctl`; i3 needs `i3-msg` and its IPC socket. COSMIC needs `computer-use-linux-cosmic`, which the standard installers provide automatically; if you copied binaries by hand, copy the helper too or set `COMPUTER_USE_LINUX_COSMIC_HELPER`.\n- **Screenshots return black frames on multi-monitor setups** — known portal \u002F compositor edge case. Use `get_app_state` with `include_screenshot: false` and rely on AT-SPI until the portal backend is healthy.\n- **`type_text` types into the wrong window** — pass an explicit target (`window_id`, `pid`, `wm_class`, `title`, or for terminals `tty` \u002F `terminal_pid` \u002F `terminal_command` \u002F `terminal_cwd`). Without a target, input goes to whatever window currently has compositor focus.\n\nIf `doctor` is green and a specific tool still misbehaves, file an issue with the JSON output of `doctor` and the failing tool's request payload.\n\n## Related\n\n- [agent-workspace-linux](https:\u002F\u002Fgithub.com\u002Fagent-sh\u002Fagent-workspace-linux) — the sibling MCP that gives an agent its **own** isolated Linux desktop (a hidden Xvfb display with its own apps and browser) instead of driving yours. It is the inverse of this project: `computer-use-linux` automates the desktop you are already on; `agent-workspace-linux` sandboxes the agent in a separate one. Use them together.\n\n## Contributing\n\nContributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for the local development workflow, CI gates, and PR expectations. Report security vulnerabilities through [SECURITY.md](SECURITY.md), not public issues.\n\n## Credits\n\nExtracted from [`codex-desktop-linux`](https:\u002F\u002Fgithub.com\u002Favifenesh\u002Fcodex-desktop-linux), the Linux distribution of Codex Desktop, which continues to ship this same binary as a bundled plugin. Maintained by [Avi Fenesh](https:\u002F\u002Fgithub.com\u002Favifenesh).\n\nBuilt on top of:\n\n- [`atspi`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fatspi) — AT-SPI bindings\n- [`zbus`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fzbus) — async DBus\n- [`rmcp`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Frmcp) — MCP runtime\n- [`ydotool`](https:\u002F\u002Fgithub.com\u002FReimuNotMoe\u002Fydotool) — Wayland-friendly uinput driver\n- [`cosmic-protocols`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fcosmic-protocols) — COSMIC Wayland toplevel protocol bindings\n\n## Publishing\n\nPublishing is tag-driven from GitHub Actions. The repository needs these Actions secrets:\n\n```bash\ngh secret set CARGO_REGISTRY_TOKEN -R agent-sh\u002Fcomputer-use-linux\ngh secret set NPM_TOKEN -R agent-sh\u002Fcomputer-use-linux\n```\n\nThen bump `Cargo.toml` and `package.json` together, update `CHANGELOG.md`, and push a `vX.Y.Z` tag. CI runs the full Rust and MCP safety gates, builds release assets for both architectures, publishes `computer-use-linux` to crates.io, and publishes the npm wrapper after the GitHub release binaries are available.\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","`computer-use-linux` 是一个用于从任何 MCP 主机控制真实 Linux 桌面的工具。该项目使用 Rust 语言开发，通过 AT-SPI、GNOME Shell、Wayland portals 和 ydotool 等技术，实现了对 GNOME、KDE\u002FKWin、Hyprland、i3 和 COSMIC 等桌面环境的访问树读取、屏幕截图以及点击、滚动和按键等操作的支持。其核心功能包括支持 Wayland 的指针动作、基于合成器的窗口定位策略以及使用语义选择器而非像素坐标来执行操作，从而提升了跨不同桌面环境的操作准确性和兼容性。该工具适合需要远程或自动化控制 Linux 桌面的应用场景，例如自动化测试、辅助技术开发或远程管理。","2026-06-11 04:01:04","CREATED_QUERY"]