[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75125":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":9,"languages":9,"totalLinesOfCode":9,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":14,"starSnapshotCount":14,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},75125,"VisionClaw","Intent-Lab\u002FVisionClaw","Intent-Lab","Real-time AI assistant for Meta Ray-Ban smart glasses -- voice + vision + agentic actions via Gemini Live and OpenClaw",null,2368,444,16,18,0,11,26,97,33,29.95,"Other",false,"main",true,[],"2026-06-12 02:03:32","# VisionClaw\n\n![VisionClaw](assets\u002Fteaserimage.png)\n\nA real-time AI assistant for Meta Ray-Ban smart glasses. See what you see, hear what you say, and take actions on your behalf -- all through voice.\n\n![Cover](assets\u002Fcover.png)\n\nBuilt on [Meta Wearables DAT SDK](https:\u002F\u002Fgithub.com\u002Ffacebook\u002Fmeta-wearables-dat-ios) (iOS) \u002F [DAT Android SDK](https:\u002F\u002Fgithub.com\u002Fnichochar\u002Fopenclaw) (Android) + [Gemini Live API](https:\u002F\u002Fai.google.dev\u002Fgemini-api\u002Fdocs\u002Flive) + [OpenClaw](https:\u002F\u002Fgithub.com\u002Fnichochar\u002Fopenclaw) (optional).\n\n**Supported platforms:** iOS (iPhone) and Android (Pixel, Samsung, etc.)\n\n## What It Does\n\nPut on your glasses, tap the AI button, and talk:\n\n- **\"What am I looking at?\"** -- Gemini sees through your glasses camera and describes the scene\n- **\"Add milk to my shopping list\"** -- delegates to OpenClaw, which adds it via your connected apps\n- **\"Send a message to John saying I'll be late\"** -- routes through OpenClaw to WhatsApp\u002FTelegram\u002FiMessage\n- **\"Search for the best coffee shops nearby\"** -- web search via OpenClaw, results spoken back\n\nThe glasses camera streams at ~1fps to Gemini for visual context, while audio flows bidirectionally in real-time.\n\n## How It Works\n\n![How It Works](assets\u002Fhow.png)\n\n```\nMeta Ray-Ban Glasses (or phone camera)\n       |\n       | video frames + mic audio\n       v\niOS \u002F Android App (this project)\n       |\n       | JPEG frames (~1fps) + PCM audio (16kHz)\n       v\nGemini Live API (WebSocket)\n       |\n       |-- Audio response (PCM 24kHz) --> App --> Speaker\n       |-- Tool calls (execute) -------> App --> OpenClaw Gateway\n       |                                              |\n       |                                              v\n       |                                      56+ skills: web search,\n       |                                      messaging, smart home,\n       |                                      notes, reminders, etc.\n       |                                              |\n       |\u003C---- Tool response (text) \u003C----- App \u003C-------+\n       |\n       v\n  Gemini speaks the result\n```\n\n**Key pieces:**\n- **Gemini Live** -- real-time voice + vision AI over WebSocket (native audio, not STT-first)\n- **OpenClaw** (optional) -- local gateway that gives Gemini access to 56+ tools and all your connected apps\n- **Phone mode** -- test the full pipeline using your phone camera instead of glasses\n- **WebRTC streaming** -- share your glasses POV live to a browser viewer\n\n---\n\n## Quick Start (iOS)\n\n### 1. Clone and open\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsseanliu\u002FVisionClaw.git\ncd VisionClaw\u002Fsamples\u002FCameraAccess\nopen CameraAccess.xcodeproj\n```\n\n### 2. Add your secrets\n\nCopy the example file and fill in your values:\n\n```bash\ncp CameraAccess\u002FSecrets.swift.example CameraAccess\u002FSecrets.swift\n```\n\nEdit `Secrets.swift` with your [Gemini API key](https:\u002F\u002Faistudio.google.com\u002Fapikey) (required) and optional OpenClaw\u002FWebRTC config.\n\n### 3. Build and run\n\nSelect your iPhone as the target device and hit Run (Cmd+R).\n\n### 4. Try it out\n\n**Without glasses (iPhone mode):**\n1. Tap **\"Start on iPhone\"** -- uses your iPhone's back camera\n2. Tap the **AI button** to start a Gemini Live session\n3. Talk to the AI -- it can see through your iPhone camera\n\n**With Meta Ray-Ban glasses:**\n\nFirst, enable Developer Mode in the Meta AI app:\n\n1. Open the **Meta AI** app on your iPhone\n2. Go to **Settings** (gear icon, bottom left)\n3. Tap **App Info**\n4. Tap the **App version** number **5 times** -- this unlocks Developer Mode\n5. Go back to Settings -- you'll now see a **Developer Mode** toggle. Turn it on.\n\n![How to enable Developer Mode](assets\u002Fdev_mode.png)\n\nThen in VisionClaw:\n1. Tap **\"Start Streaming\"** in the app\n2. Tap the **AI button** for voice + vision conversation\n\n---\n\n## Quick Start (Android)\n\n### 1. Clone and open\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsseanliu\u002FVisionClaw.git\n```\n\nOpen `samples\u002FCameraAccessAndroid\u002F` in Android Studio.\n\n### 2. Configure GitHub Packages (DAT SDK)\n\nThe Meta DAT Android SDK is distributed via GitHub Packages. You need a GitHub Personal Access Token with `read:packages` scope.\n\n1. Go to [GitHub > Settings > Developer Settings > Personal Access Tokens](https:\u002F\u002Fgithub.com\u002Fsettings\u002Ftokens) and create a **classic** token with `read:packages` scope\n2. In `samples\u002FCameraAccessAndroid\u002Flocal.properties`, add:\n\n```properties\ngithub_token=YOUR_GITHUB_TOKEN\n```\n\n> **Tip:** If you have the `gh` CLI installed, you can run `gh auth token` to get a valid token. Make sure it has `read:packages` scope -- if not, run `gh auth refresh -s read:packages`.\n>\n> **Note:** GitHub Packages requires authentication even for public repositories. The 401 error means your token is missing or invalid.\n\n### 3. Add your secrets\n\n```bash\ncd samples\u002FCameraAccessAndroid\u002Fapp\u002Fsrc\u002Fmain\u002Fjava\u002Fcom\u002Fmeta\u002Fwearable\u002Fdat\u002Fexternalsampleapps\u002Fcameraaccess\u002F\ncp Secrets.kt.example Secrets.kt\n```\n\nEdit `Secrets.kt` with your [Gemini API key](https:\u002F\u002Faistudio.google.com\u002Fapikey) (required) and optional OpenClaw\u002FWebRTC config.\n\n### 4. Build and run\n\n1. Let Gradle sync in Android Studio (it will download the DAT SDK from GitHub Packages)\n2. Select your Android phone as the target device\n3. Click Run (Shift+F10)\n\n> **Wireless debugging:** You can also install via ADB wirelessly. Enable **Wireless debugging** in your phone's Developer Options, then pair with `adb pair \u003Cip>:\u003Cport>`.\n\n### 5. Try it out\n\n**Without glasses (Phone mode):**\n1. Tap **\"Start on Phone\"** -- uses your phone's back camera\n2. Tap the **AI button** (sparkle icon) to start a Gemini Live session\n3. Talk to the AI -- it can see through your phone camera\n\n**With Meta Ray-Ban glasses:**\n\nEnable Developer Mode in the Meta AI app (same steps as iOS above), then:\n1. Tap **\"Start Streaming\"** in the app\n2. Tap the **AI button** for voice + vision conversation\n\n---\n\n## Setup: OpenClaw (Optional)\n\nOpenClaw gives Gemini the ability to take real-world actions: send messages, search the web, manage lists, control smart home devices, and more. Without it, Gemini is voice + vision only.\n\n### 1. Install and configure OpenClaw\n\nFollow the [OpenClaw setup guide](https:\u002F\u002Fgithub.com\u002Fnichochar\u002Fopenclaw). Make sure the gateway is enabled:\n\nIn `~\u002F.openclaw\u002Fopenclaw.json`:\n\n```json\n{\n  \"gateway\": {\n    \"port\": 18789,\n    \"bind\": \"lan\",\n    \"auth\": {\n      \"mode\": \"token\",\n      \"token\": \"your-gateway-token-here\"\n    },\n    \"http\": {\n      \"endpoints\": {\n        \"chatCompletions\": { \"enabled\": true }\n      }\n    }\n  }\n}\n```\n\nKey settings:\n- `bind: \"lan\"` -- exposes the gateway on your local network so your phone can reach it\n- `chatCompletions.enabled: true` -- enables the `\u002Fv1\u002Fchat\u002Fcompletions` endpoint (off by default)\n- `auth.token` -- the token your app will use to authenticate\n\n### 2. Configure the app\n\n**iOS** -- In `Secrets.swift`:\n```swift\nstatic let openClawHost = \"http:\u002F\u002FYour-Mac.local\"\nstatic let openClawPort = 18789\nstatic let openClawGatewayToken = \"your-gateway-token-here\"\n```\n\n**Android** -- In `Secrets.kt`:\n```kotlin\nconst val openClawHost = \"http:\u002F\u002FYour-Mac.local\"\nconst val openClawPort = 18789\nconst val openClawGatewayToken = \"your-gateway-token-here\"\n```\n\nTo find your Mac's Bonjour hostname: **System Settings > General > Sharing** -- it's shown at the top (e.g., `Johns-MacBook-Pro.local`).\n\n> Both iOS and Android also have an in-app Settings screen where you can change these values at runtime without editing source code.\n\n### 3. Start the gateway\n\n```bash\nopenclaw gateway restart\n```\n\nVerify it's running:\n\n```bash\ncurl http:\u002F\u002Flocalhost:18789\u002Fhealth\n```\n\nNow when you talk to the AI, it can execute tasks through OpenClaw.\n\n---\n\n## Architecture\n\n### Key Files (iOS)\n\nAll source code is in `samples\u002FCameraAccess\u002FCameraAccess\u002F`:\n\n| File | Purpose |\n|------|---------|\n| `Gemini\u002FGeminiConfig.swift` | API keys, model config, system prompt |\n| `Gemini\u002FGeminiLiveService.swift` | WebSocket client for Gemini Live API |\n| `Gemini\u002FAudioManager.swift` | Mic capture (PCM 16kHz) + audio playback (PCM 24kHz) |\n| `Gemini\u002FGeminiSessionViewModel.swift` | Session lifecycle, tool call wiring, transcript state |\n| `OpenClaw\u002FToolCallModels.swift` | Tool declarations, data types |\n| `OpenClaw\u002FOpenClawBridge.swift` | HTTP client for OpenClaw gateway |\n| `OpenClaw\u002FToolCallRouter.swift` | Routes Gemini tool calls to OpenClaw |\n| `iPhone\u002FIPhoneCameraManager.swift` | AVCaptureSession wrapper for iPhone camera mode |\n| `WebRTC\u002FWebRTCClient.swift` | WebRTC peer connection + SDP negotiation |\n| `WebRTC\u002FSignalingClient.swift` | WebSocket signaling for WebRTC rooms |\n\n### Key Files (Android)\n\nAll source code is in `samples\u002FCameraAccessAndroid\u002Fapp\u002Fsrc\u002Fmain\u002Fjava\u002F...\u002Fcameraaccess\u002F`:\n\n| File | Purpose |\n|------|---------|\n| `gemini\u002FGeminiConfig.kt` | API keys, model config, system prompt |\n| `gemini\u002FGeminiLiveService.kt` | OkHttp WebSocket client for Gemini Live API |\n| `gemini\u002FAudioManager.kt` | AudioRecord (16kHz) + AudioTrack (24kHz) |\n| `gemini\u002FGeminiSessionViewModel.kt` | Session lifecycle, tool call wiring, UI state |\n| `openclaw\u002FToolCallModels.kt` | Tool declarations, data classes |\n| `openclaw\u002FOpenClawBridge.kt` | OkHttp HTTP client for OpenClaw gateway |\n| `openclaw\u002FToolCallRouter.kt` | Routes Gemini tool calls to OpenClaw |\n| `phone\u002FPhoneCameraManager.kt` | CameraX wrapper for phone camera mode |\n| `webrtc\u002FWebRTCClient.kt` | WebRTC peer connection (stream-webrtc-android) |\n| `webrtc\u002FSignalingClient.kt` | OkHttp WebSocket signaling for WebRTC rooms |\n| `settings\u002FSettingsManager.kt` | SharedPreferences with Secrets.kt fallback |\n\n### Audio Pipeline\n\n- **Input**: Phone mic -> AudioManager (PCM Int16, 16kHz mono, 100ms chunks) -> Gemini WebSocket\n- **Output**: Gemini WebSocket -> AudioManager playback queue -> Phone speaker\n- **iOS iPhone mode**: Uses `.voiceChat` audio session for echo cancellation + mic gating during AI speech\n- **iOS Glasses mode**: Uses `.videoChat` audio session (mic is on glasses, speaker is on phone -- no echo)\n- **Android**: Uses `VOICE_COMMUNICATION` audio source for built-in acoustic echo cancellation\n\n### Video Pipeline\n\n- **Glasses**: DAT SDK video stream (24fps) -> throttle to ~1fps -> JPEG (50% quality) -> Gemini\n- **Phone**: Camera capture (30fps) -> throttle to ~1fps -> JPEG -> Gemini\n\n### Tool Calling\n\nGemini Live supports function calling. Both apps declare a single `execute` tool that routes everything through OpenClaw:\n\n1. User says \"Add eggs to my shopping list\"\n2. Gemini speaks \"Sure, adding that now\" (verbal acknowledgment before tool call)\n3. Gemini sends `toolCall` with `execute(task: \"Add eggs to the shopping list\")`\n4. `ToolCallRouter` sends HTTP POST to OpenClaw gateway\n5. OpenClaw executes the task using its 56+ connected skills\n6. Result returns to Gemini via `toolResponse`\n7. Gemini speaks the confirmation\n\n### WebRTC Live Streaming\n\nShare your glasses POV in real-time to a browser viewer with bidirectional audio and video.\n\n1. Tap the **Live** button in the app\n2. The app connects to a signaling server and gets a 6-character room code\n3. Share the code -- the viewer opens the server URL in a browser and enters it\n4. WebRTC peer connection is established (SDP + ICE via the signaling server)\n5. Media flows peer-to-peer: glasses video to browser, browser camera back to iOS PiP\n\n**Key details:**\n- **Signaling server**: Node.js + WebSocket, located at `samples\u002FCameraAccess\u002Fserver\u002F` -- serves the browser viewer and relays SDP\u002FICE\n- **NAT traversal**: Google STUN servers + ExpressTURN relay (fetched from `\u002Fapi\u002Fturn`)\n- **Video**: 24 fps, 2.5 Mbps max bitrate\n- **Background handling**: 60-second grace period for iOS app backgrounding -- room stays alive for reconnection\n- **Constraint**: Cannot run simultaneously with Gemini Live (audio device conflict)\n\nFor full details, see [`samples\u002FCameraAccess\u002FCameraAccess\u002FWebRTC\u002FREADME.md`](samples\u002FCameraAccess\u002FCameraAccess\u002FWebRTC\u002FREADME.md).\n\n---\n\n## Requirements\n\n### iOS\n- iOS 17.0+\n- Xcode 15.0+\n- Gemini API key ([get one free](https:\u002F\u002Faistudio.google.com\u002Fapikey))\n- Meta Ray-Ban glasses (optional -- use iPhone mode for testing)\n- OpenClaw on your Mac (optional -- for agentic actions)\n\n### Android\n- Android 14+ (API 34+)\n- Android Studio Ladybug or newer\n- GitHub account with `read:packages` token (for DAT SDK)\n- Gemini API key ([get one free](https:\u002F\u002Faistudio.google.com\u002Fapikey))\n- Meta Ray-Ban glasses (optional -- use Phone mode for testing)\n- OpenClaw on your Mac (optional -- for agentic actions)\n\n---\n\n## Troubleshooting\n\n### General\n\n**Gemini doesn't hear me** -- Check that microphone permission is granted. The app uses aggressive voice activity detection -- speak clearly and at normal volume.\n\n**OpenClaw connection timeout** -- Make sure your phone and Mac are on the same Wi-Fi network, the gateway is running (`openclaw gateway restart`), and the hostname matches your Mac's Bonjour name.\n\n**OpenClaw opens duplicate browser tabs** -- This is a known upstream issue in OpenClaw's CDP (Chrome DevTools Protocol) connection management ([#13851](https:\u002F\u002Fgithub.com\u002Fnichochar\u002Fopenclaw\u002Fissues\u002F13851), [#12317](https:\u002F\u002Fgithub.com\u002Fnichochar\u002Fopenclaw\u002Fissues\u002F12317)). Using `profile: \"openclaw\"` (managed Chrome) instead of the default extension relay may improve stability.\n\n### iOS-specific\n\n**\"Gemini API key not configured\"** -- Add your API key in Secrets.swift or in the in-app Settings.\n\n**Echo\u002Ffeedback in iPhone mode** -- The app mutes the mic while the AI is speaking. If you still hear echo, try turning down the volume.\n\n### Android-specific\n\n**Gradle sync fails with 401 Unauthorized** -- Your GitHub token is missing or doesn't have `read:packages` scope. Check `local.properties` for `gpr.user` and `gpr.token`. Generate a new token at [github.com\u002Fsettings\u002Ftokens](https:\u002F\u002Fgithub.com\u002Fsettings\u002Ftokens).\n\n**Gemini WebSocket times out** -- The Gemini Live API sends binary WebSocket frames. If you're building a custom client, make sure to handle both text and binary frame types.\n\n**Audio not working** -- Ensure `RECORD_AUDIO` permission is granted. On Android 13+, you may need to grant this permission manually in Settings > Apps.\n\n**Phone camera not starting** -- Ensure `CAMERA` permission is granted. CameraX requires both the permission and a valid lifecycle.\n\nFor DAT SDK issues, see the [developer documentation](https:\u002F\u002Fwearables.developer.meta.com\u002Fdocs\u002Fdevelop\u002F) or the [discussions forum](https:\u002F\u002Fgithub.com\u002Ffacebook\u002Fmeta-wearables-dat-ios\u002Fdiscussions).\n\n## Citation\n\nIf you use VisionClaw in your research, please cite our paper:\n\n```bibtex\n@article{liu2026visionclaw,\n  title={VisionClaw: Always-On AI Agents through Smart Glasses},\n  author={Liu, Xiaoan and Lee, DaeHo and Gonzalez, Eric J and Gonzalez-Franco, Mar and Suzuki, Ryo},\n  journal={arXiv preprint arXiv:2604.03486},\n  year={2026}\n}\n```\n\n## License\n\nThis source code is licensed under the license found in the [LICENSE](LICENSE) file in the root directory of this source tree.\n","VisionClaw 是一款为 Meta Ray-Ban 智能眼镜设计的实时AI助手，通过语音和视觉交互来执行各种任务。其核心功能包括利用Gemini Live API进行场景描述、物品识别等视觉处理，并通过OpenClaw网关连接到56多种工具和服务，如添加购物清单、发送消息、搜索信息等。技术上，该系统能够以约1fps的速度将摄像头捕捉的画面传输给Gemini Live API进行分析，同时支持双向实时音频流处理。此外，还提供了手机模式和WebRTC直播功能，方便开发者测试和演示。此项目适用于需要增强现实体验或希望通过智能穿戴设备简化日常任务管理的用户。",2,"2026-06-11 03:52:25","high_star"]