[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74972":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},74972,"droidclaw","unitedbyai\u002Fdroidclaw","unitedbyai","turn old phones into ai agents - give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done. ","https:\u002F\u002Fdroidclaw.ai",null,"TypeScript",1523,225,6,10,0,4,11,108,12,75.56,false,"main",[],"2026-06-12 04:01:16","# droidclaw\n\n> an ai agent that controls your android phone. give it a goal in plain english — it figures out what to tap, type, and swipe.\n\n**[Download Android APK (v0.5.3)](https:\u002F\u002Fgithub.com\u002Funitedbyai\u002Fdroidclaw\u002Freleases\u002Fdownload\u002Fv0.5.3\u002Fapp-debug.apk)** | **[Dashboard](https:\u002F\u002Fapp.droidclaw.ai)** | **[Discord](https:\u002F\u002Fdiscord.gg\u002FSaCs3cPQdY)**\n\ni wanted to turn my old android devices into ai agents. after a few hours reverse engineering accessibility trees and playing with tailscale.. it worked.\n\nthink of it this way — a few years back, we could automate android with predefined flows. now imagine that automation layer has an llm brain. it can read any screen, understand what's happening, decide what to do, and execute. you don't need api's. you don't need to build integrations. just install your favourite apps and tell the agent what you want done.\n\none of the coolest things it can do right now is delegate incoming requests to chatgpt, gemini, or google search on the device... and bring the result back. no api keys for those services needed — it just uses the apps like a human would.\n\n```\n$ bun run src\u002Fkernel.ts\nenter your goal: open youtube and search for \"lofi hip hop\"\n\n--- step 1\u002F30 ---\nthink: i'm on the home screen. launching youtube.\naction: launch (842ms)\n\n--- step 2\u002F30 ---\nthink: youtube is open. tapping search icon.\naction: tap (623ms)\n\n--- step 3\u002F30 ---\nthink: search field focused.\naction: type \"lofi hip hop\" (501ms)\n\n--- step 4\u002F30 ---\naction: enter (389ms)\n\n--- step 5\u002F30 ---\nthink: search results showing. done.\naction: done (412ms)\n```\n\n---\n\n## how it works\n\nthe core idea is dead simple — a **perception → reasoning → action** loop that repeats until the goal is done (or it runs out of steps).\n\n```\n                         ┌─────────────────────────────────────────┐\n                         │              your goal                  │\n                         │   \"send good morning to mom on whatsapp\"│\n                         └────────────────┬────────────────────────┘\n                                          │\n                                          ▼\n                    ┌─────────────────────────────────────────────────┐\n                    │                                                 │\n                    │              ┌──────────────┐                   │\n                    │              │  1. perceive  │                   │\n                    │              └──────┬───────┘                   │\n                    │                     │                           │\n                    │    dump accessibility tree via adb               │\n                    │    parse xml → interactive ui elements           │\n                    │    diff with previous screen (detect changes)    │\n                    │    optionally capture screenshot                 │\n                    │                     │                           │\n                    │                     ▼                           │\n                    │              ┌──────────────┐                   │\n                    │              │  2. reason    │                   │\n                    │              └──────┬───────┘                   │\n                    │                     │                           │\n                    │    send screen state + goal + history to llm     │\n                    │    llm returns { think, plan, action }           │\n                    │    \"i see the search icon at (890, 156).         │\n                    │     i should tap it.\"                            │\n                    │                     │                           │\n                    │                     ▼                           │\n                    │              ┌──────────────┐                   │\n                    │              │  3. act       │                   │\n                    │              └──────┬───────┘                   │\n                    │                     │                           │\n                    │    execute via adb: tap, type, swipe, etc.       │\n                    │    feed result back to llm on next step          │\n                    │    check if goal is done                        │\n                    │                     │                           │\n                    │                     ▼                           │\n                    │               done? ─────── yes ──→ exit        │\n                    │                │                                │\n                    │                no                               │\n                    │                │                                │\n                    │                └─────── loop back to perceive   │\n                    │                                                 │\n                    └─────────────────────────────────────────────────┘\n```\n\n### what makes it not fall apart\n\nllms controlling ui's sounds fragile. and it is, if you don't handle the failure modes. here's what droidclaw does:\n\n- **stuck loop detection** — if the screen doesn't change for 3 steps, recovery hints get injected into the prompt. context-aware hints based on what type of action is failing (tap vs swipe vs wait).\n- **repetition tracking** — a sliding window of recent actions catches retry loops even across screen changes. if the agent taps the same coordinates 3+ times, it gets told to stop and try something else.\n- **drift detection** — if the agent spams navigation actions (swipe, back, wait) without interacting with anything, it gets nudged to take direct action.\n- **vision fallback** — when the accessibility tree is empty (webviews, flutter apps, games), a screenshot gets sent to the llm instead, with coordinate-based tap suggestions.\n- **action feedback** — every action result (success\u002Ffailure + message) gets fed back to the llm on the next step. the agent knows whether its last move worked.\n- **multi-turn memory** — conversation history is maintained across steps so the llm has context about what it already tried.\n\n---\n\n## setup\n\n### quick install\n\n```bash\ncurl -fsSL https:\u002F\u002Fdroidclaw.ai\u002Finstall.sh | sh\n```\n\nthis installs bun and adb if missing, clones the repo, and sets up `.env`.\n\n### manual install\n\n**prerequisites:**\n\n- [bun](https:\u002F\u002Fbun.sh) (required — node\u002Fnpm won't work. droidclaw uses bun-specific apis like `Bun.spawnSync` and native `.env` loading)\n- [adb](https:\u002F\u002Fdeveloper.android.com\u002Ftools\u002Fadb) (android debug bridge — comes with android sdk platform tools)\n- an android phone with usb debugging enabled\n- an llm provider api key (or ollama for fully local)\n\n```bash\n# install adb\n# macos:\nbrew install android-platform-tools\n# linux:\nsudo apt install android-tools-adb\n# windows:\n# download from https:\u002F\u002Fdeveloper.android.com\u002Ftools\u002Freleases\u002Fplatform-tools\n\n# install bun\ncurl -fsSL https:\u002F\u002Fbun.sh\u002Finstall | bash\n\n# clone and setup\ngit clone https:\u002F\u002Fgithub.com\u002Funitedbyai\u002Fdroidclaw.git\ncd droidclaw\nbun install\ncp .env.example .env\n```\n\n### configure your llm\n\nedit `.env` and pick a provider. fastest way to start is groq (free tier):\n\n```bash\nLLM_PROVIDER=groq\nGROQ_API_KEY=gsk_your_key_here\n```\n\nor run fully local with [ollama](https:\u002F\u002Follama.com) (no api key, no internet needed):\n\n```bash\nollama pull llama3.2\n# then in .env:\nLLM_PROVIDER=ollama\nOLLAMA_MODEL=llama3.2\n```\n\n### connect your phone\n\n1. go to **settings → about phone → tap \"build number\" 7 times** to enable developer options\n2. go to **settings → developer options → enable \"usb debugging\"**\n3. plug in via usb and tap \"allow\" on the phone when prompted\n\n```bash\nadb devices   # should show your device\n```\n\n### run it\n\n```bash\nbun run src\u002Fkernel.ts\n# type your goal and press enter\n```\n\n---\n\n## three ways to use it\n\ndroidclaw has three modes, each for a different use case:\n\n```\n┌─────────────────────────────────────────────────────────────────────┐\n│                                                                     │\n│   interactive mode          workflows             flows             │\n│   ─────────────────    ─────────────────    ─────────────────       │\n│                                                                     │\n│   type a goal and       chain goals          fixed sequences        │\n│   the agent figures     across multiple      of taps and types.     │\n│   it out on the fly.    apps with ai.        no llm, instant.       │\n│                                                                     │\n│   $ bun run              --workflow            --flow               │\n│     src\u002Fkernel.ts         file.json             file.yaml           │\n│                                                                     │\n│   best for:             best for:            best for:              │\n│   one-off tasks,        multi-app tasks,     things you do          │\n│   exploration,          recurring routines,  exactly the same       │\n│   quick commands        morning briefings    way every time         │\n│                                                                     │\n│   uses llm: yes         uses llm: yes        uses llm: no          │\n│                                                                     │\n└─────────────────────────────────────────────────────────────────────┘\n```\n\n### interactive mode\n\njust type what you want:\n\n```bash\nbun run src\u002Fkernel.ts\n# enter your goal: open settings and turn on dark mode\n```\n\n### workflows (ai-powered, multi-app)\n\nworkflows are json files describing a sequence of sub-goals. each step can optionally switch to a different app. the llm decides how to navigate, what to tap, what to type.\n\n```bash\nbun run src\u002Fkernel.ts --workflow examples\u002Fworkflows\u002Fresearch\u002Fweather-to-whatsapp.json\n```\n\n```json\n{\n  \"name\": \"weather to whatsapp\",\n  \"steps\": [\n    {\n      \"app\": \"com.google.android.googlequicksearchbox\",\n      \"goal\": \"search for chennai weather today\"\n    },\n    {\n      \"goal\": \"share the result to whatsapp contact Sanju\"\n    }\n  ]\n}\n```\n\nyou can inject specific data into steps using `formData`:\n\n```json\n{\n  \"name\": \"slack standup\",\n  \"steps\": [\n    {\n      \"app\": \"com.Slack\",\n      \"goal\": \"open #standup channel, type the message and send it\",\n      \"formData\": {\n        \"Message\": \"yesterday: api integration\\ntoday: tests\\nblockers: none\"\n      }\n    }\n  ]\n}\n```\n\n### flows (no ai, instant execution)\n\nfor tasks where you don't need ai thinking — just a fixed sequence of taps and types. no llm calls, instant execution. think of it like a macro.\n\n```bash\nbun run src\u002Fkernel.ts --flow examples\u002Fflows\u002Fsend-whatsapp.yaml\n```\n\n```yaml\nappId: com.whatsapp\nname: Send WhatsApp Message\n---\n- launchApp\n- wait: 2\n- tap: \"Contact Name\"\n- wait: 1\n- tap: \"Message\"\n- type: \"hello from droidclaw\"\n- tap: \"Send\"\n- done: \"Message sent\"\n```\n\n### quick comparison\n\n| | workflows | flows |\n|---|---|---|\n| format | json | yaml |\n| uses ai | yes | no |\n| handles ui changes | yes | no |\n| speed | slower (llm calls) | instant |\n| best for | complex\u002Fmulti-app tasks | simple repeatable tasks |\n\n---\n\n## example workflows\n\n35 ready-to-use workflows organised by category:\n\n**[messaging](examples\u002Fworkflows\u002Fmessaging\u002F)** — whatsapp, telegram, slack, email\n- [slack-standup](examples\u002Fworkflows\u002Fmessaging\u002Fslack-standup.json) — post daily standup to a channel\n- [whatsapp-broadcast](examples\u002Fworkflows\u002Fmessaging\u002Fwhatsapp-broadcast.json) — send a message to multiple contacts\n- [telegram-send-message](examples\u002Fworkflows\u002Fmessaging\u002Ftelegram-send-message.json) — send a telegram message\n- [email-reply](examples\u002Fworkflows\u002Fmessaging\u002Femail-reply.json) — draft and send an email reply\n- [whatsapp-to-email](examples\u002Fworkflows\u002Fmessaging\u002Fwhatsapp-to-email.json) — forward whatsapp messages to email\n- [slack-check-messages](examples\u002Fworkflows\u002Fmessaging\u002Fslack-check-messages.json) — read unread slack messages\n- [email-digest](examples\u002Fworkflows\u002Fmessaging\u002Femail-digest.json) — summarise recent emails\n- [telegram-channel-digest](examples\u002Fworkflows\u002Fmessaging\u002Ftelegram-channel-digest.json) — digest a telegram channel\n- [whatsapp-reply](examples\u002Fworkflows\u002Fmessaging\u002Fwhatsapp-reply.json) — reply to a whatsapp message\n- [send-whatsapp-vi](examples\u002Fworkflows\u002Fmessaging\u002Fsend-whatsapp-vi.json) — send whatsapp to a specific contact\n\n**[social](examples\u002Fworkflows\u002Fsocial\u002F)** — instagram, youtube, cross-posting\n- [social-media-post](examples\u002Fworkflows\u002Fsocial\u002Fsocial-media-post.json) — post across platforms\n- [social-media-engage](examples\u002Fworkflows\u002Fsocial\u002Fsocial-media-engage.json) — like\u002Fcomment on posts\n- [instagram-post-check](examples\u002Fworkflows\u002Fsocial\u002Finstagram-post-check.json) — check recent instagram posts\n- [youtube-watch-later](examples\u002Fworkflows\u002Fsocial\u002Fyoutube-watch-later.json) — save videos to watch later\n\n**[productivity](examples\u002Fworkflows\u002Fproductivity\u002F)** — calendar, notes, github, notifications\n- [morning-briefing](examples\u002Fworkflows\u002Fproductivity\u002Fmorning-briefing.json) — read messages, calendar, weather across apps\n- [github-check-prs](examples\u002Fworkflows\u002Fproductivity\u002Fgithub-check-prs.json) — check open pull requests\n- [calendar-create-event](examples\u002Fworkflows\u002Fproductivity\u002Fcalendar-create-event.json) — create a calendar event\n- [notes-capture](examples\u002Fworkflows\u002Fproductivity\u002Fnotes-capture.json) — capture a quick note\n- [notification-cleanup](examples\u002Fworkflows\u002Fproductivity\u002Fnotification-cleanup.json) — clear and triage notifications\n- [screenshot-share-slack](examples\u002Fworkflows\u002Fproductivity\u002Fscreenshot-share-slack.json) — screenshot and share to slack\n- [translate-and-reply](examples\u002Fworkflows\u002Fproductivity\u002Ftranslate-and-reply.json) — translate a message and reply\n- [logistics-workflow](examples\u002Fworkflows\u002Fproductivity\u002Flogistics-workflow.json) — multi-app logistics coordination\n\n**[research](examples\u002Fworkflows\u002Fresearch\u002F)** — search, compare, monitor\n- [weather-to-whatsapp](examples\u002Fworkflows\u002Fresearch\u002Fweather-to-whatsapp.json) — get weather via google, share to whatsapp\n- [multi-app-research](examples\u002Fworkflows\u002Fresearch\u002Fmulti-app-research.json) — research across multiple apps\n- [price-comparison](examples\u002Fworkflows\u002Fresearch\u002Fprice-comparison.json) — compare prices across shopping apps\n- [news-roundup](examples\u002Fworkflows\u002Fresearch\u002Fnews-roundup.json) — collect news from multiple sources\n- [google-search-report](examples\u002Fworkflows\u002Fresearch\u002Fgoogle-search-report.json) — search google and save results\n- [check-flight-status](examples\u002Fworkflows\u002Fresearch\u002Fcheck-flight-status.json) — check flight status\n\n**[lifestyle](examples\u002Fworkflows\u002Flifestyle\u002F)** — food, transport, music, fitness\n- [food-order](examples\u002Fworkflows\u002Flifestyle\u002Ffood-order.json) — order food from a delivery app\n- [uber-ride](examples\u002Fworkflows\u002Flifestyle\u002Fuber-ride.json) — book an uber ride\n- [spotify-playlist](examples\u002Fworkflows\u002Flifestyle\u002Fspotify-playlist.json) — create or add to a spotify playlist\n- [maps-commute](examples\u002Fworkflows\u002Flifestyle\u002Fmaps-commute.json) — check commute time\n- [fitness-log](examples\u002Fworkflows\u002Flifestyle\u002Ffitness-log.json) — log a workout\n- [expense-tracker](examples\u002Fworkflows\u002Flifestyle\u002Fexpense-tracker.json) — log an expense\n- [wifi-password-share](examples\u002Fworkflows\u002Flifestyle\u002Fwifi-password-share.json) — share wifi password\n- [do-not-disturb](examples\u002Fworkflows\u002Flifestyle\u002Fdo-not-disturb.json) — toggle do not disturb with exceptions\n\n**[flows](examples\u002Fflows\u002F)** — 5 deterministic flow templates (no ai)\n- [send-whatsapp](examples\u002Fflows\u002Fsend-whatsapp.yaml) — send a whatsapp message\n- [google-search](examples\u002Fflows\u002Fgoogle-search.yaml) — run a google search\n- [create-contact](examples\u002Fflows\u002Fcreate-contact.yaml) — add a new contact\n- [clear-notifications](examples\u002Fflows\u002Fclear-notifications.yaml) — clear all notifications\n- [toggle-wifi](examples\u002Fflows\u002Ftoggle-wifi.yaml) — toggle wifi on\u002Foff\n\n---\n\n## actions\n\nthe agent has 28 actions it can use. these are the building blocks — each one maps to an adb command.\n\n**basic interactions:**\n`tap` `type` `enter` `longpress` `clear` `paste` `swipe` `scroll`\n\n**navigation:**\n`home` `back` `launch` `switch_app` `open_url` `open_settings` `notifications`\n\n**clipboard:**\n`clipboard_get` `clipboard_set`\n\n**multi-step skills** (compound actions that handle common patterns):\n`read_screen` `submit_message` `copy_visible_text` `wait_for_content` `find_and_tap` `compose_email`\n\n**system:**\n`screenshot` `shell` `keyevent` `pull_file` `push_file` `wait` `done`\n\nthe multi-step skills are interesting — they replace 5-10 manual actions with a single call. for example, `read_screen` auto-scrolls through the entire screen, collects all text, and copies it to clipboard. `compose_email` fills To, Subject, and Body fields in the correct order using android intents. these dramatically reduce the number of llm decisions needed.\n\n---\n\n## providers\n\n| provider | cost | vision | notes |\n|---|---|---|---|\n| groq | free tier | no | fastest to start, great for most tasks |\n| ollama | free (local) | yes* | no api key, runs entirely on your machine |\n| openrouter | per token | yes | 200+ models, single api |\n| openai | per token | yes | gpt-4o, strong reasoning |\n| bedrock | per token | yes | claude\u002Fllama on aws |\n\n*ollama vision requires a vision-capable model like `llama3.2-vision` or `llava`\n\n---\n\n## config\n\nall configuration lives in `.env`. here's what you can tweak:\n\n| key | default | what it does |\n|---|---|---|\n| `LLM_PROVIDER` | groq | which llm to use (groq\u002Fopenai\u002Follama\u002Fbedrock\u002Fopenrouter) |\n| `MAX_STEPS` | 30 | how many steps before the agent gives up |\n| `STEP_DELAY` | 2 | seconds to wait between actions (lets the ui settle) |\n| `STUCK_THRESHOLD` | 3 | how many unchanged steps before stuck recovery kicks in |\n| `VISION_MODE` | fallback | `off` \u002F `fallback` (only when accessibility tree is empty) \u002F `always` |\n| `MAX_ELEMENTS` | 40 | max ui elements sent to the llm per step (scored & ranked) |\n| `MAX_HISTORY_STEPS` | 10 | how many past steps to keep in conversation context |\n| `STREAMING_ENABLED` | true | stream llm responses (shows progress dots) |\n| `LOG_DIR` | logs | directory for session json logs |\n\n---\n\n## source code\n\nthe entire agent is ~10 files in `src\u002F`:\n\n```\nsrc\u002F\n├── kernel.ts          the main perception → reasoning → action loop\n├── actions.ts         28 action implementations (tap, type, swipe, etc.)\n├── skills.ts          6 multi-step skills (read_screen, compose_email, etc.)\n├── workflow.ts        workflow orchestration engine (multi-app sub-goals)\n├── flow.ts            yaml flow runner (deterministic, no llm)\n├── llm-providers.ts   5 providers + the system prompt that teaches the llm\n├── sanitizer.ts       accessibility xml parser → structured ui elements\n├── config.ts          env config loader with validation\n├── constants.ts       keycodes, swipe coordinates, defaults\n└── logger.ts          session logging (json, crash-safe partial writes)\n```\n\n### data flow through the codebase\n\n```\n                    kernel.ts\n                       │\n          ┌────────────┼────────────────┐\n          │            │                │\n          ▼            ▼                ▼\n     sanitizer.ts   llm-providers.ts   actions.ts\n     (parse screen)  (ask the llm)     (execute via adb)\n                                        │\n                                        ├── skills.ts\n                                        │   (multi-step compound actions)\n                                        │\n     config.ts ◄────── all files read config\n     constants.ts ◄─── keycodes, coordinates\n\n     workflow.ts ── calls kernel.runAgent() per sub-goal\n     flow.ts ────── calls actions.executeAction() directly (no llm)\n     logger.ts ◄─── kernel writes step logs here\n```\n\n---\n\n## remote control with tailscale\n\nthe default setup is usb — phone plugged into your laptop. but you can go much further.\n\ninstall [tailscale](https:\u002F\u002Ftailscale.com) on both your android device and your laptop\u002Fserver. once they're on the same tailnet, connect adb over the network:\n\n```bash\n# on your phone: enable wireless debugging\n# settings → developer options → wireless debugging\n# note the ip:port shown\n\n# from anywhere in the world:\nadb connect \u003Cphone-tailscale-ip>:\u003Cport>\nadb devices   # should show your phone\n\nbun run src\u002Fkernel.ts\n```\n\nnow your phone is a remote ai agent. leave it on a desk plugged into power, and control it from a vps, your laptop at a cafe, or a cron job running workflows every morning at 8am. the phone doesn't need to be on the same wifi or even in the same country.\n\nthis is what makes old android devices useful again — they become always-on agents that can do things on apps that don't have api's.\n\n---\n\n## commands\n\n```bash\nbun run src\u002Fkernel.ts                          # interactive mode (prompts for goal)\nbun run src\u002Fkernel.ts --workflow file.json     # run a workflow\nbun run src\u002Fkernel.ts --flow file.yaml         # run a deterministic flow\nbun install                                    # install dependencies\nbun run build                                  # compile to dist\u002F\nbun run typecheck                              # type-check (tsc --noEmit)\n```\n\n---\n\n## troubleshooting\n\n**\"adb: command not found\"** — install adb (`brew install android-platform-tools` on mac) or set `ADB_PATH` in `.env` to point to your adb binary.\n\n**\"no devices found\"** — make sure usb debugging is enabled, you've tapped \"allow\" on the phone, and the cable supports data transfer (not just charging).\n\n**agent keeps repeating the same action** — stuck detection should handle this automatically. if it persists, try a stronger model (groq's llama-3.3-70b or openai's gpt-4o).\n\n**empty accessibility tree** — some apps (flutter, webviews, games) don't expose accessibility info. set `VISION_MODE=always` in `.env` to send screenshots every step instead.\n\n**swipe coordinates seem off** — droidclaw auto-detects screen resolution at startup. if your device has an unusual resolution, check the console output on step 1 for the detected resolution.\n\n---\n\n## contributors\n\nbuilt by [unitedby.ai](https:\u002F\u002Funitedby.ai) — an open ai community\n\n- [sanju sivalingam](https:\u002F\u002Fsanju.sh)\n- [somasundaram mahesh](https:\u002F\u002Fmsomu.com)\n\n## license\n\nmit\n","Droidclaw 是一个将旧手机转变为 AI 代理的项目，用户只需用自然语言设定目标，它就能通过读取屏幕、思考并执行操作来完成任务。其核心功能包括使用 ADB 进行屏幕识别和交互控制，以及通过感知-推理-行动循环实现自动化流程，无需 API 或额外集成即可与设备上的应用如 ChatGPT、Gemini 等进行交互。适合于需要对安卓设备进行自动化的场景，例如日常任务管理、信息查询等，特别对于拥有闲置安卓设备的用户来说，是一个很好的利用资源的方式。",2,"2026-06-11 03:51:46","high_star"]