[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-85174":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":16,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":10,"trendingCount":16,"starSnapshotCount":16,"syncStatus":13,"lastSyncTime":36,"discoverSource":37},85174,"ScreenMind","ayushh0110\u002FScreenMind","ayushh0110"," AI-powered screen memory — captures, analyzes, and lets you search\u002Fchat your screen history. Powered by Gemma 4 . 100% local, 100% private.","",null,"Python",69,2,60,1,0,37.43,"MIT License",false,"main",true,[23,24,25,26,27,28,29,30,31,32,33],"ai","fastapi","gemma","llama-cpp","localai","multimodal","privacy","productivity","python","recall-alternative","screenrecorder","2026-06-15 10:05:13","\u003Cdiv align=\"center\">\n\n\u003Cbr>\n\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🧠_ScreenMind-Your_AI_Memory-8B5CF6?style=for-the-badge&labelColor=0a0e1a\" alt=\"ScreenMind\" height=\"40\">\n\n\u003Cbr>\u003Cbr>\n\n**Captures your screen → Analyzes with Gemma 4 → Builds a searchable AI memory**\u003Cbr>\n**100% local. 100% private. Zero cloud dependencies.**\n\n\u003Cbr>\n\n[![Python 3.10+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10+-3776AB?style=flat-square&logo=python&logoColor=white)](https:\u002F\u002Fpython.org)\n[![Gemma 4 E2B](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGemma_4-E2B_Vision+Audio-8B5CF6?style=flat-square&logo=google&logoColor=white)](https:\u002F\u002Fai.google.dev\u002Fgemma)\n[![llama.cpp](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fllama.cpp-Local_Inference-333?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp)\n[![License MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-10B981?style=flat-square)](LICENSE)\n[![MCP Ready](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMCP-Claude_%7C_Cursor_%7C_VSCode-F59E0B?style=flat-square)](MCP_SETUP.md)\n\n\u003Cbr>\n\n[**Features**](#-features) · [**Gemma 4 Deep Dive**](#-how-gemma-4-is-used) · [**Quick Start**](#-quick-start) · [**Architecture**](#-architecture) · [**Agent Platform**](#-agent-platform) · [**MCP**](#-mcp-server-claude--cursor--vs-code) · [**API**](#-api-reference)\n\n\u003Cbr>\n\n![Timeline — AI-analyzed screen activity feed](docs\u002Fscreenshots\u002Fagents.png)\n\n| Agents | Chat with your memory |\n|:---:|:---:|\n| ![Agents](docs\u002Fscreenshots\u002Ftimeline.png) | ![Chat](docs\u002Fscreenshots\u002Fchat.png) |\n\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n> **Microsoft showed the world wants screen-aware AI with Recall.** But Recall stores data in plaintext, sends telemetry, and was met with massive privacy backlash. ScreenMind is the open-source, privacy-first alternative — every screenshot analyzed, every insight generated, every search result — all computed locally using Gemma 4's multimodal capabilities.\n>\n> It's not just a screen recorder. It's an **AI memory** you can talk to, search through, and build automations on top of.\n\n---\n\n\n## ✨ Features\n\n### 🧠 Core Intelligence\n\n- **📸 Smart Capture** — Content-change detection, not a fixed timer. Captures when your screen *actually* changes.\n- **🔬 Gemma 4 Vision Analysis** — Every screenshot analyzed: app detection, activity categorization, mood, scene description, spatial layout regions.\n- **🔍 Hybrid Search** — Semantic embeddings (MiniLM) + FTS5 keyword search. Find anything by *meaning*, not just keywords.\n- **💬 Chat with Memory** — Conversational RAG with follow-up support. Ask \"what did Ishaa say on Discord?\" → get the actual message.\n- **🎙️ Voice Memos** — Hold `Ctrl+Shift+V` → Gemma 4's native audio encoder transcribes. Screenshot captured alongside.\n- **🎤 Meeting Transcription** — Auto-detects Zoom\u002FTeams\u002FMeet, records audio, transcribes, generates structured summaries.\n- **📊 Analytics Dashboard** — Category breakdown, top apps, hourly heatmap, meeting stats, focus metrics.\n- **⏪ Day Rewind** — Timelapse playback of your entire day with play\u002Fpause\u002Fscrub\u002Fspeed controls.\n\n### ⚡ Performance\n\n- **Three Analysis Modes** — Accurate (~76s, deep thinking + layout), Balanced (~40s, thinking), or Fast (~12s, no thinking). You choose.\n- **Per-App pHash Cache** — 3-tier caching with app-aware staleness. Communication apps refresh faster than IDEs. Significantly fewer inference calls.\n- **Chat-First GPU Priority** — Chat cancels in-flight analysis instantly. GPU freed in \u003C1s.\n- **Auto-Pause Heavy Apps** — Games, video editors, 3D software detected → capture pauses automatically.\n\n### 🔒 Privacy & Security\n\n- **100% Local** — All data stays on your machine. Zero network calls after initial model download. No telemetry. Ever.\n- **Sensitive Data Filter** — Auto-redacts credit cards, SSNs, API keys, passwords before storage.\n- **Encryption at Rest** — AES encryption for screenshots (Fernet + OS keyring).\n- **Dashboard PIN Lock** — Session-based auth with configurable auto-lock timeout.\n- **Incognito Mode** — One-click pause. Nothing recorded.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>🔌 Integrations & Extensibility\u003C\u002Fb>\u003C\u002Fsummary>\n\n\u003Cbr>\n\n| Integration | Description |\n|---|---|\n| 🤖 **Agent Platform** | Build automations in Markdown (English) or Python. Drop a file, get an agent. |\n| 🔌 **MCP Server** | Expose screen history to Claude Desktop, Cursor, VS Code |\n| 📓 **Obsidian** | Auto-sync daily summaries to your vault |\n| 📋 **Notion** | Push summaries to a Notion database |\n| 🪝 **Webhooks** | Fire events to Slack, Discord, IFTTT (HMAC signed, auto-retry) |\n| 🔔 **Smart Notifications** | Distraction alerts, break reminders |\n| ⭐ **Auto-Bookmark** | Keyword triggers (`git push`, `deploy`) auto-flag important moments |\n\n\u003C\u002Fdetails>\n\n### ⌨️ System-Wide Hotkeys\n\n| Hotkey | Action |\n|---|---|\n| `Ctrl+Shift+B` | 📸 Instant bookmarked capture |\n| `Ctrl+Shift+P` | ⏸ Toggle pause\u002Fresume |\n| `Ctrl+Shift+V` | 🎤 Hold to record voice memo |\n\n> All hotkeys customizable from Settings.\n\n---\n\n## 🧠 How Gemma 4 Is Used\n\nGemma 4 E2B is not a bolt-on — it's architecturally load-bearing. ScreenMind uses **all three modalities**:\n\n### 1. Vision — Screenshot Analysis\nEvery screenshot is sent to Gemma 4 with OCR context. It returns structured JSON:\n- App name, activity category, summary, detailed context\n- Mood classification, confidence score\n- Rich scene description (every visible element inventoried)\n- Layout regions (sidebar, chat area, toolbar boundaries)\n\n**Three modes:**\n- **Accurate** — single call with thinking (~76s). Best layout detection.\n- **Balanced** — thinking enabled, analysis-only (~40s). Richer descriptions than Fast.\n- **Fast** — no-thinking prefill trick (~12s). Layout via OCR clustering instead.\n\n### 2. Audio — Voice Memos & Meeting Transcription\nGemma 4 E2B has a native audio encoder. ScreenMind uses it for:\n- Voice memo transcription (hold hotkey → speak → release)\n- Meeting transcription (15s chunks, map-reduce summarization for long meetings)\n\nNo Whisper dependency. One model handles everything.\n\n### 3. Reasoning — Summaries, Chat, Agents\n- **Daily summaries** with deep reasoning (`think=True`)\n- **Chat answers** grounded in actual screen data (text-first RAG with vision fallback)\n- **Agent execution** — Gemma processes markdown agent prompts with injected screen data\n\n### Why E2B Specifically?\n\n| Constraint | Why It Rules Out Alternatives |\n|---|---|\n| Must run **continuously in background** | Rules out 12B+ models (too heavy) |\n| Must understand **screenshots natively** | Rules out text-only models |\n| Must stay **100% local** for privacy | Rules out cloud APIs |\n| Must handle **audio natively** | Rules out models without audio encoder |\n| Must be **fast enough** for 30s cycle | E2B processes in 12-76s depending on mode |\n\nGemma 4 E2B is the only model that checks all five boxes.\n\n---\n\n## 🚀 Quick Start\n\n> **Requirements:** Python 3.10+ · GPU recommended (4GB+ VRAM) · ~5GB disk for model\n\n#### 1️⃣ Clone & Install\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fayushh0110\u002FScreenMind.git\ncd ScreenMind\n\npython -m venv venv\nvenv\\Scripts\\activate        # Windows\n# source venv\u002Fbin\u002Factivate   # macOS\u002FLinux\n\npip install -r requirements.txt\n```\n\n#### 2️⃣ Run\n\n```bash\npython main.py\n```\n\n#### 3️⃣ Open → **http:\u002F\u002F127.0.0.1:7777** \n\nOn first run, ScreenMind will:\n- Auto-download Gemma 4 E2B GGUF model (~5GB, one time)\n- Start `llama-server` in background\n- Show the welcome screen to set up an optional PIN\n- Create `~\u002F.screenmind\u002F` for data storage\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>⚙️ Optional: Configure via .env\u003C\u002Fb>\u003C\u002Fsummary>\n\n\u003Cbr>\n\n```bash\ncp .env.example .env\n# Edit capture interval, blocked apps, hotkeys, etc.\n```\n\nOr configure everything from the **Settings** tab in the dashboard.\n\n\u003C\u002Fdetails>\n\n---\n\n## 🏗️ Architecture\n\n```\n┌─────────────────────────────────────────────────────────────────────┐\n│                          ScreenMind                                  │\n│                                                                     │\n│  ┌────────────┐    ┌──────────────┐    ┌─────────────────────────┐ │\n│  │  Capture   │───▶│  Async Queue │───▶│    Analysis Worker      │ │\n│  │  Worker    │    │  (max: 100)  │    │                         │ │\n│  │            │    └──────────────┘    │  ┌───────────────────┐  │ │\n│  │ • Screen   │                        │  │  Per-App pHash    │  │ │\n│  │ • Window   │                        │  │  Cache (3-tier)   │  │ │\n│  │ • Dedup    │                        │  └───────────────────┘  │ │\n│  │ • A11y     │                        │           │             │ │\n│  │ • Privacy  │                        │           ▼             │ │\n│  └────────────┘                        │  ┌───────────────────┐  │ │\n│                                        │  │   EasyOCR         │  │ │\n│  ┌────────────┐                        │  │   (text extract)  │  │ │\n│  │   Audio    │                        │  └───────────────────┘  │ │\n│  │   Worker   │                        │           │             │ │\n│  │            │                        │           ▼             │ │\n│  │ • Meeting  │                        │  ┌───────────────────┐  │ │\n│  │   detect   │                        │  │   Gemma 4 E2B     │  │ │\n│  │ • Record   │                        │  │   (via llama.cpp) │  │ │\n│  │ • Transcr. │                        │  │   Vision + Audio  │  │ │\n│  └────────────┘                        │  └───────────────────┘  │ │\n│                                        │           │             │ │\n│  ┌────────────┐                        │           ▼             │ │\n│  │   Agent    │                        │  ┌───────────────────┐  │ │\n│  │  Scheduler │                        │  │  Layout Analyzer  │  │ │\n│  │            │                        │  │  (spatial OCR)    │  │ │\n│  │ • .md AI   │                        │  └───────────────────┘  │ │\n│  │ • .py code │                        │           │             │ │\n│  └────────────┘                        │           ▼             │ │\n│                                        │  ┌───────────────────┐  │ │\n│                                        │  │  MiniLM-L6-v2     │  │ │\n│                                        │  │  (embeddings)     │  │ │\n│                                        │  └───────────────────┘  │ │\n│                                        └─────────────────────────┘ │\n│                                                    │               │\n│                                                    ▼               │\n│                                        ┌───────────────────┐       │\n│                                        │   SQLite (WAL)    │       │\n│                                        │   + FTS5 index    │       │\n│                                        └─────────┬─────────┘       │\n│                                                  │                 │\n│  ┌───────────────────────────────────────────────┘                 │\n│  │                                                                 │\n│  ▼                                                                 │\n│  ┌───────────────────────────────────────────────────────────────┐ │\n│  │                    FastAPI REST Server                         │ │\n│  │  \u002Ftimeline · \u002Fsearch · \u002Fchat · \u002Fstats · \u002Fagents · \u002Fmcp       │ │\n│  │                                                               │ │\n│  │  ┌───────────────────────────────────────────────────────┐   │ │\n│  │  │           Web Dashboard (Vanilla JS SPA)               │   │ │\n│  │  │  Timeline · Chat · Search · Analytics · Agents · Settings │ │\n│  │  └───────────────────────────────────────────────────────┘   │ │\n│  └───────────────────────────────────────────────────────────────┘ │\n└─────────────────────────────────────────────────────────────────────┘\n```\n\n### Multi-Model AI Pipeline\n\n```\nScreenshot → EasyOCR (text) → Gemma 4 E2B (understanding) → MiniLM (embeddings) → SQLite + FTS5\n                                     ↑\n                              OCR text fed as context\n                              (Gemma sees image + reads text)\n```\n\nFour AI models working in concert, with Gemma 4 as the brain:\n1. **EasyOCR** — extracts raw screen text\n2. **Gemma 4 E2B** — understands what you're doing (vision + reasoning)\n3. **MiniLM-L6-v2** — generates semantic vectors for natural language search\n4. **FTS5** — indexes text for instant keyword search\n\n---\n\n## 🤖 Agent Platform\n\nScreenMind includes a full agent\u002Fplugin system. Build any automation on top of your screen data.\n\n### Two Modes\n\n| Mode | File Type | For | Example |\n|---|---|---|---|\n| 🤖 AI Agent | `.md` | Everyone | Write a prompt in English → Gemma runs it on your data |\n| 🐍 Python Plugin | `.py` | Developers | Full code with SDK access, state persistence, LLM calls |\n\n### Markdown Agent Example\n\n```markdown\n---\nname: Daily Focus Report\nschedule: every 6h\ndata: timeline, apps, mood\noutput: local, obsidian\n---\n\nAnalyze my screen activity and generate a focus report:\n- How many hours of deep work vs shallow work?\n- What were my main distractions?\n- Give me a focus score out of 10.\n```\n\nDrop this file in `~\u002F.screenmind\u002Fagents\u002F` — it runs automatically.\n\n### Python Plugin SDK\n\n```python\nfrom screenmind_sdk import ScreenMindSDK\n\nsdk = ScreenMindSDK(\"my-tracker\")\n\n# Get today's activities filtered by app\nactivities = sdk.get_activities(app=\"Chrome\", limit=20)\n\n# Persistent state across runs\nlast_count = sdk.load_state(\"url_count\", 0)\nurls = sdk.get_urls_visited()\nsdk.save_state(\"url_count\", len(urls))\n\n# Ask Gemma (GPU-safe — waits for idle)\ninsight = sdk.ask_gemma(f\"Summarize these URLs: {urls}\")\nprint(insight)\n```\n\n### Data Selectors (Frontmatter)\n\nMarkdown agents declare what data they need:\n\n| Selector | Injects |\n|---|---|\n| `timeline` | Recent activities with timestamps, apps, summaries |\n| `apps` | App usage counts + category breakdown |\n| `urls` | URLs visited (extracted from browser address bars) |\n| `meetings` | Meeting summaries and durations |\n| `mood` | Mood\u002Fsentiment from screen analysis |\n\nData injection auto-scales to your model's context window.\n\n### 4 Agents Ship Built-In\n\n- **daily-journal.md** — First-person journal entry from your day\n- **focus-report.md** — Focus score, deep work hours, distractions\n- **meeting-actions.md** — Extract action items from meeting transcripts\n- **code-changelog.md** — Summarize coding activity (commits, files, repos)\n\n---\n\n## 🔌 MCP Server (Claude \u002F Cursor \u002F VS Code)\n\nScreenMind exposes your screen history to any MCP-compatible AI tool:\n\n```bash\npython mcp_server.py  # stdio transport\n```\n\n**Claude Desktop config** (`~\u002F.claude\u002Fclaude_desktop_config.json`):\n```json\n{\n  \"mcpServers\": {\n    \"screenmind\": {\n      \"command\": \"python\",\n      \"args\": [\"C:\u002Fpath\u002Fto\u002Fscreenmind\u002Fmcp_server.py\"]\n    }\n  }\n}\n```\n\n### Tools Available\n\n| Tool | Description |\n|---|---|\n| `search_screen` | Semantic + keyword search across all history |\n| `get_recent_activity` | Last N activities with full details |\n| `get_activity_by_time` | Activities for a specific date\u002Ftime range |\n| `get_daily_summary` | AI-generated daily summary |\n| `capture_now` | Trigger instant screenshot |\n| `get_stats` | Usage statistics |\n| `search_audio` | Search meeting transcripts |\n| `get_screenshot` | Retrieve screenshot path by activity ID |\n\n---\n\n## 📡 API Reference\n\nFull Swagger docs at `http:\u002F\u002F127.0.0.1:7777\u002Fdocs`\n\n### Key Endpoints\n\n| Method | Endpoint | Description |\n|--------|----------|-------------|\n| `GET` | `\u002Fapi\u002Fstatus` | System health, worker stats |\n| `GET` | `\u002Fapi\u002Ftimeline?date=2026-05-21` | Activities for a date |\n| `GET` | `\u002Fapi\u002Fsearch?q=debugging auth` | Hybrid semantic + keyword search |\n| `POST` | `\u002Fapi\u002Fchat` | Conversational AI with screen memory (SSE stream) |\n| `GET` | `\u002Fapi\u002Fstats?range=day` | Analytics (categories, apps, meetings) |\n| `GET` | `\u002Fapi\u002Frewind?date=2026-05-21` | Timelapse frames |\n| `POST` | `\u002Fapi\u002Fsummary\u002Fgenerate` | Generate AI daily summary |\n| `GET` | `\u002Fapi\u002Fagents` | List all agents |\n| `POST` | `\u002Fapi\u002Fagents\u002F{name}\u002Frun` | Trigger agent execution |\n| `POST` | `\u002Fapi\u002Fcapture\u002Fpause` | Pause capture |\n| `POST` | `\u002Fapi\u002Fincognito\u002Ftoggle` | Toggle incognito mode |\n\n---\n\n\u003Cdetails>\n\u003Csummary>\u003Ch2>⚙️ Configuration\u003C\u002Fh2>\u003C\u002Fsummary>\n\n\u003Cbr>\n\nAll settings configurable via `.env`, environment variables, or the **Settings** dashboard (persists to `settings.json`).\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `CAPTURE_INTERVAL` | `40` | Seconds between captures |\n| `ANALYSIS_MODE` | `merged` | `merged` (accurate, ~76s) or `fast` (~12s) |\n| `PERFORMANCE_MODE` | `balanced` | GPU layers: `minimal` \u002F `balanced` \u002F `maximum` |\n| `BLOCKED_APPS` | *(empty)* | Comma-separated apps to never capture |\n| `MEETING_TRANSCRIPTION` | `false` | Auto-transcribe when meeting apps detected |\n| `RETENTION_DAYS` | `7` | Auto-delete data older than N days (0 = forever) |\n| `ENCRYPTION_ENABLED` | `false` | Encrypt screenshots at rest |\n| `SENSITIVE_FILTER_ENABLED` | `true` | Redact credit cards, SSNs, API keys |\n\n> See `.env.example` for the full list.\n\n\u003C\u002Fdetails>\n\n---\n\n## 🔧 Tech Stack\n\n| Layer | Technology | Why |\n|-------|-----------|-----|\n| **Vision + Audio AI** | Gemma 4 E2B (via llama.cpp) | Only model with vision + audio + reasoning that runs locally on 4GB VRAM |\n| **Inference Server** | llama-server (llama.cpp) | Direct GGUF inference, OpenAI-compatible API |\n| **OCR** | EasyOCR | Extracts screen text fed to Gemma as context |\n| **Embeddings** | all-MiniLM-L6-v2 | 80MB, runs on CPU, 384-dim vectors for semantic search |\n| **Backend** | FastAPI + Uvicorn | Async-first, auto-generated API docs |\n| **Database** | SQLite (WAL) + FTS5 | Zero-config, concurrent reads, full-text search |\n| **Capture** | mss + ctypes\u002FUI Automation | Native screen capture + accessibility text extraction |\n| **Frontend** | Vanilla JS + CSS | No build step, instant load, dark glassmorphism UI |\n| **Platform** | Windows \u002F macOS \u002F Linux | Abstraction layer with OS-specific adapters |\n\n---\n\n\u003Cdetails>\n\u003Csummary>\u003Ch2>📁 Project Structure\u003C\u002Fh2>\u003C\u002Fsummary>\n\n\u003Cbr>\n\n```\nscreenmind\u002F\n├── main.py                    # Entry point — starts all services\n├── config.py                  # Pydantic settings (env + runtime overrides)\n├── requirements.txt           # Python dependencies\n├── mcp_server.py              # MCP server for Claude\u002FCursor\u002FVS Code\n├── screenmind_sdk.py          # SDK for Python plugin agents\n│\n├── capture\u002F                   # Screenshot capture layer\n│   ├── screen.py              # mss-based capture + encryption\n│   ├── window.py              # Active window detection\n│   ├── dedup.py               # Perceptual hash deduplication\n│   ├── hotkey.py              # Global hotkeys (bookmark, pause, voice)\n│   └── voice_recorder.py      # Mic recording for voice memos\n│\n├── engine\u002F                    # AI & intelligence layer\n│   ├── analyzer.py            # Gemma 4 vision analysis (dual mode)\n│   ├── llm_client.py          # llama-server client (chat, vision, audio)\n│   ├── model_manager.py       # Server lifecycle, model download\u002Fswitch\n│   ├── embedder.py            # MiniLM semantic embeddings\n│   ├── ocr.py                 # EasyOCR text extraction\n│   ├── layout_analyzer.py     # Spatial OCR organization\n│   ├── dev_context.py         # Git repo\u002Fbranch\u002Fdiff detection\n│   ├── a11y_extractor.py      # Accessibility API text extraction\n│   └── agent_runner.py        # Agent scheduling & execution\n│\n├── workers\u002F                   # Background processing\n│   ├── capture_worker.py      # Smart capture loop + privacy filtering\n│   ├── analysis_worker.py     # OCR → Gemma → Layout → Embed → Store\n│   └── audio_worker.py        # Meeting detection & transcription\n│\n├── storage\u002F                   # Data persistence\n│   ├── database.py            # SQLite + FTS5 + migrations\n│   └── models.py              # Pydantic data models\n│\n├── privacy\u002F                   # Privacy & security\n│   ├── encryption.py          # Fernet AES encryption at rest\n│   └── data_filter.py         # Sensitive data redaction\n│\n├── platform_support\u002F          # Cross-platform abstraction\n│   ├── windows.py             # Win32 + UI Automation\n│   ├── macos.py               # AppKit + AXUIElement\n│   └── linux.py               # xdotool + AT-SPI\n│\n├── integrations\u002F              # External connections\n│   ├── obsidian.py            # Vault markdown export\n│   ├── notion.py              # Notion API export\n│   ├── webhooks.py            # HTTP webhooks (HMAC, retry)\n│   └── smart_notify.py        # Distraction\u002Fbreak notifications\n│\n├── api\u002F                       # REST API + dashboard\n│   ├── server.py              # FastAPI app + auth middleware\n│   ├── dependencies.py        # Shared state for routes\n│   ├── routes\u002F                # 16 route modules\n│   └── static\u002F                # Web dashboard (HTML + CSS + JS)\n│\n├── default_agents\u002F            # 4 built-in agents\n│   ├── daily-journal.md\n│   ├── focus-report.md\n│   ├── meeting-actions.md\n│   └── code-changelog.md\n│\n└── docs\u002F\n    └── BUILD_YOUR_OWN_AGENT.md\n```\n\n\u003C\u002Fdetails>\n\n---\n\n## 🛡️ Error Handling & Resilience\n\n| Scenario | Behavior |\n|----------|----------|\n| **llama-server not running** | Auto-starts on launch. Captures continue; analysis retried with backoff. |\n| **Model not downloaded** | Auto-downloads GGUF on first start via HuggingFace. |\n| **GPU out of memory** | Detects OOM, retries with delay, re-queues on persistent failure. |\n| **Duplicate frames** | pHash dedup skips identical screenshots (threshold: 8 hamming distance). |\n| **Stale queue items** | Captures >3 min old auto-skipped. Backfilled during idle. |\n| **App in blocklist** | Silently skips — no screenshot saved. |\n| **Meeting app closed** | Process-alive check + silence detection + 5-min hard timeout. |\n| **Chat during analysis** | Cancels in-flight inference, frees GPU in \u003C1s, re-queues analysis. |\n| **Crash recovery** | Stale meetings cleaned on startup. Unanalyzed entries backfilled. |\n\n---\n\n## 🎨 Dashboard\n\nThe web dashboard at `http:\u002F\u002F127.0.0.1:7777` features:\n\n- **Timeline** — Browse activities by date with thumbnails, AI summaries, category badges\n- **Chat** — Conversational AI with screen memory. Ask anything about your history.\n- **Search** — Semantic + keyword hybrid search with OCR highlighting on screenshots\n- **Analytics** — Category charts, top apps, hourly heatmap, meeting stats\n- **Rewind** — Timelapse player with play\u002Fpause\u002Fscrub\u002Fspeed controls\n- **Memos** — Voice memo list with audio player\n- **Agents** — Create, edit, run, and monitor agents\n- **Settings** — 8 organized sections: Shortcuts, Capture, AI, Audio, Privacy, Automation, Integrations, Storage\n\nDark glassmorphism UI. No build step. Instant load.\n\n---\n\n## 🤝 Contributing\n\nContributions welcome! Here are some high-impact areas:\n\n- 🍎 **macOS\u002FLinux testing** — platform adapters exist, need real hardware testing\n- 🐳 **Docker container** — one-command setup\n- 🧩 **Community agent registry** — share agents between users\n- 🌐 **Browser extension** — richer URL\u002Ftab context\n- 📤 **Export formats** — Markdown, CSV, JSON\n\n---\n\n## ⭐ Show Your Support\n\nIf you find ScreenMind useful, please consider:\n\n- **⭐ Star this repo** — it helps others discover the project\n- **🍴 Fork it** — build your own agents and features\n- **🐛 Report issues** — help us improve\n- **📣 Share it** — tell others about privacy-first AI\n\n\u003Cdiv align=\"center\">\n\n\u003Cbr>\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fayushh0110\u002FScreenMind\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fayushh0110\u002FScreenMind?style=social\" alt=\"Stars\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fayushh0110\u002FScreenMind\u002Fnetwork\u002Fmembers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fayushh0110\u002FScreenMind?style=social\" alt=\"Forks\">\u003C\u002Fa>\n\n\u003Cbr>\n\n\u003C\u002Fdiv>\n\n---\n\n## 📝 License\n\nMIT License — see [LICENSE](LICENSE) for details.\n\n---\n\n\u003Cdiv align=\"center\">\n\n\u003Cbr>\n\n**Built with 🧠 Gemma 4 E2B · 🔒 100% Local · 🚀 Zero Cloud Dependencies**\n\n*Vision + Audio + Reasoning — all three modalities, one model, your machine.*\n\n\u003Cbr>\n\n\u003Csub>Made with ❤️ by \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fayushh0110\">ayushh0110\u003C\u002Fa>\u003C\u002Fsub>\n\n\u003C\u002Fdiv>\n","2026-06-15 02:30:14","CREATED_QUERY"]