[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74051":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":38,"readmeContent":39,"aiSummary":40,"trendingCount":16,"starSnapshotCount":16,"syncStatus":41,"lastSyncTime":42,"discoverSource":43},74051,"sentrysearch","ssrajadh\u002Fsentrysearch","ssrajadh","Semantic search over videos using Gemini Embedding 2 or Qwen3-VL.","",null,"Python",4270,404,19,7,0,8,31,178,24,29.82,"Apache License 2.0",false,"master",true,[27,28,29,30,31,32,33,34,35,36,37],"chromadb","dashcam","gemini","gemini-embedding-2","mp4","qwen3-vl","search","search-engine","semantic-search","tesla","video","2026-06-12 02:03:21","# SentrySearch\n\nSemantic search over video footage. Type what you're looking for, get a trimmed clip back.\n\n**Languages:** English · [简体中文](README.zh.md)\n\n**The Pipeline:**\n1. SentrySearch (find an event in your footage)\n2. [SentryMerge](https:\u002F\u002Fgithub.com\u002Fssrajadh\u002Fsentrymerge) (auto-cut the multi-cam footage into one video that follows the subject across cameras)\n3. [SentryBlur](https:\u002F\u002Fgithub.com\u002Fssrajadh\u002Fsentryblur) (auto-redact sensitive information)\n\n**New:** [`sentrysearch highlights`](#highlights): surface the most anomalous clips in your footage when you don't know what to search for.\n\n[\u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fssrajadh\u002Fsentrysearch\u002Fraw\u002Fmain\u002Fdocs\u002Fdemo.mp4\" controls width=\"100%\">\u003C\u002Fvideo>](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fbaf98fad-080b-48e1-97f5-a2db2cbd53f5)\n\n## Table of Contents\n\n- [How it works](#how-it-works)\n- [Getting Started](#getting-started)\n- [Usage](#usage)\n  - [Init](#init)\n  - [Index footage](#index-footage)\n  - [Search](#search)\n  - [Search by image](#search-by-image)\n  - [Highlights](#highlights)\n  - [Qwen Cloud (Alibaba DashScope)](#qwen-cloud-alibaba-dashscope)\n  - [Local Backend (no API key needed)](#local-backend-no-api-key-needed)\n  - [Why the local model is fast](#why-the-local-model-is-fast)\n  - [Tesla Metadata Overlay](#tesla-metadata-overlay)\n  - [Stitch with SentryMerge](#stitch-with-sentrymerge)\n  - [Redact with SentryBlur](#redact-with-sentryblur)\n  - [Managing the index](#managing-the-index)\n  - [Verbose mode](#verbose-mode)\n- [How is this possible?](#how-is-this-possible)\n- [Cost](#cost)\n- [Known Warnings (harmless)](#known-warnings-harmless)\n- [Limitations & Future Work](#limitations--future-work)\n- [Compatibility](#compatibility)\n- [Requirements](#requirements)\n\n## How it works\n\nSentrySearch splits your videos into overlapping chunks, embeds each chunk as video using Google's Gemini Embedding API, Alibaba DashScope (**qwen-cloud**), or a local Qwen3-VL model, and stores the vectors in a local ChromaDB database. When you search, your text query (or image, see [search by image](#search-by-image)) is embedded into the same vector space and matched against the stored video embeddings. The top match is automatically trimmed from the original file and saved as a clip.\n\n## Getting Started\n\n1. Install [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) (if you don't have it):\n\n**macOS\u002FLinux:**\n```bash\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n```\n\n**Windows:**\n```powershell\npowershell -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n```\n\n\n2. Clone and install:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fssrajadh\u002Fsentrysearch.git\ncd sentrysearch\nuv tool install .\n```\n\n> **Requires Python 3.11 or 3.12** (PyTorch wheels don't yet support 3.13+). If your default Python is newer, install a managed 3.12 and pin the tool install:\n> ```bash\n> uv python install 3.12\n> uv tool install --python 3.12 .\n> ```\n\n3. Set up your API key (or [use a local model instead](#local-backend-no-api-key-needed)) — **only needed for the default Gemini backend**; skip if you use `--backend local` or `--backend qwen-cloud` with `DASHSCOPE_API_KEY` in `.env`.\n\n```bash\nsentrysearch init\n```\n\nThis prompts for your Gemini API key, writes it to `.env`, and validates it with a test embedding.\n\n4. Index your footage:\n\n```bash\nsentrysearch index \u002Fpath\u002Fto\u002Ffootage\n```\n\n5. Search:\n\n```bash\nsentrysearch search \"red truck running a stop sign\"\n```\n\nffmpeg is required for video chunking and trimming. If you don't have it system-wide, the bundled `imageio-ffmpeg` is used automatically.\n\n> **Manual setup:** If you prefer not to use `sentrysearch init`, you can copy `.env.example` to `.env` and add your key from [aistudio.google.com\u002Fapikey](https:\u002F\u002Faistudio.google.com\u002Fapikey) manually.\n\n## Usage\n\n### Init\n\n```bash\n$ sentrysearch init\nEnter your Gemini API key (get one at https:\u002F\u002Faistudio.google.com\u002Fapikey): ****\nValidating API key...\nSetup complete. You're ready to go — run `sentrysearch index \u003Cdirectory>` to get started.\n```\n\nIf a key is already configured, you'll be asked whether to overwrite it.\n\n> **Tip:** Set a spending limit at [aistudio.google.com\u002Fbilling](https:\u002F\u002Faistudio.google.com\u002Fbilling) to prevent accidental overspending.\n\n### Index footage\n\n```bash\n$ sentrysearch index \u002Fpath\u002Fto\u002Fvideo\u002Ffootage\nIndexing file 1\u002F3: front_2024-01-15_14-30.mp4 [chunk 1\u002F4]\nIndexing file 1\u002F3: front_2024-01-15_14-30.mp4 [chunk 2\u002F4]\n...\nIndexed 12 new chunks from 3 files. Total: 12 chunks from 3 files.\n```\n\nOptions:\n\n- `--chunk-duration 30` — seconds per chunk\n- `--overlap 5` — overlap between chunks\n- `--no-preprocess` — skip downscaling\u002Fframe rate reduction (send raw chunks)\n- `--target-resolution 480` — target height in pixels for preprocessing\n- `--target-fps 5` — target frame rate for preprocessing\n- `--no-skip-still` — embed all chunks, even ones with no visual change\n- `--backend local` — use a local model instead of Gemini ([details below](#local-backend-no-api-key-needed))\n\n### Search\n\n```bash\n$ sentrysearch search \"red truck running a stop sign\"\n  #1 [0.87] front_2024-01-15_14-30.mp4 @ 02:15-02:45\n  #2 [0.74] left_2024-01-15_14-30.mp4 @ 02:10-02:40\n  #3 [0.61] front_2024-01-20_09-15.mp4 @ 00:30-01:00\n\nSaved clip: .\u002Fmatch_front_2024-01-15_14-30_02m15s-02m45s.mp4\n```\n\nIf the best result's similarity score is below the confidence threshold (default 0.41), you'll be prompted before trimming:\n\n```\nNo confident match found (best score: 0.28). Show results anyway? [y\u002FN]:\n```\n\nWith `--no-trim`, low-confidence results are shown with a note instead of a prompt.\n\nOptions: `--results N`, `--output-dir DIR`, `--no-trim` to skip auto-trimming, `--threshold 0.5` to adjust the confidence cutoff, `--save-top N` to save the top N clips instead of just the best match. Backend and model are auto-detected from the index — pass `--backend` or `--model` only to override.\n\n### Search by image\n\nUse a reference image as the query — useful for \"find clips that look like this\" when describing the scene in words is awkward (a screenshot of a specific car, a reference frame from another video, etc.).\n\n```bash\n$ sentrysearch img ~\u002FDownloads\u002Fimage.jpg\n  #1 [0.72] 2026-03-12_10-44-17-left_repeater.mp4 @ 00:00-00:30\n  #2 [0.69] 2026-03-12_10-44-17-left_repeater.mp4 @ 00:25-00:55\n  #3 [0.67] 2026-02-12_20-02-15-front.mp4 @ 00:00-00:18\n\nSaved clip: .\u002Fmatch_2026-03-12_10-44-17-left_repeater_00m00s-00m30s.mp4\n```\n\nThe image is embedded into the same vector space as the indexed video chunks and ranked by cosine similarity. All `search` flags are supported (`--results`, `--threshold`, `--save-top`, `--overlay`, `--no-trim`, `--backend`, `--model`).\n\nSupported formats: JPG, PNG, WEBP, GIF, HEIC\u002FHEIF on the Gemini backend; the local backend additionally accepts anything PIL can decode (BMP, TIFF, etc.).\n\n> **Note:** Image search returns *visually similar* matches, not necessarily the same object. A red sedan query may surface other red sedans of similar shape — calibrate expectations accordingly.\n\n### Highlights\n\nDon't know what to search for? `sentrysearch highlights` ranks the most anomalous clips in your index — chunks whose embeddings sit far from everything else — and trims them automatically. Good for skimming a fresh dump of footage.\n\n```bash\n$ sentrysearch highlights -n 3\n  #1 [0.165] 2026-02-12_20-02-15-back.mp4 @ 00:00-00:18\n  #2 [0.163] 2026-02-12_20-02-15-right_repeater.mp4 @ 00:00-00:18\n  #3 [0.149] 2026-02-12_20-02-15-front.mp4 @ 00:00-00:18\n...\n```\n\nScoring methods (`--method`):\n\n- **`knn`** (default) — mean cosine distance to a chunk's *k* nearest neighbors. Robust; surfaces clips with no near-twins.\n- **`centroid`** — distance from the index mean. Cheapest, biased toward whatever's underrepresented.\n- **`lof`** — Local Outlier Factor. Best when the index has multiple distinct \"normal\" modes (day vs. night vs. garage).\n\nRefinement options:\n\n- `--against \"\u003Cquery>\"` — score anomaly *relative to* a query. With `--against-mode within` (default), ranks anomalies among the top matches of the query (\"the weird pedestrians in pedestrian clips\"). With `--against-mode global`, finds clips that match the query *but* are unlike the rest of the index (\"rare events of this type\").\n- `--dedupe 0.9` — drop results too similar to a higher-ranked pick (default 0.9 cosine similarity). Prevents near-duplicate frames from filling the list.\n- `--exclude-baseline` — drop the half of the index nearest the centroid before scoring. Useful when the index is dominated by repetitive \"boring\" footage.\n- `-k, --neighbors 10` — *k* for `knn`\u002F`lof`.\n- `--no-trim` — print the ranking without writing clips.\n\n> **Caveat:** Statistically anomalous ≠ interesting. Sensor glitches, lens flare, night frames in a mostly-daytime index, and the lone garage clip all rank high. Use `--exclude-baseline` and `--dedupe` to filter the noise, or `--against` to constrain by topic.\n\n### Qwen Cloud (Alibaba DashScope)\n\nUse the optional **qwen-cloud** backend for [DashScope](https:\u002F\u002Fwww.alibabacloud.com\u002Fhelp\u002Fen\u002Fmodel-studio\u002Fqwen-api-via-dashscope) \u002F Model Studio multimodal embeddings (default model `qwen3-vl-embedding`, overridable with `--dashscope-model` or `DASHSCOPE_EMBEDDING_MODEL`):\n\n```bash\nuv tool install \".[qwen-cloud]\"\nexport DASHSCOPE_API_KEY=...\nsentrysearch index \u002Fpath\u002Fto\u002Ffootage --backend qwen-cloud\nsentrysearch search \"your query\" --backend qwen-cloud\n```\n\n**Video uploads:** local chunk files are sent to **DashScope-managed temporary OSS by the official Python SDK** before the API consumes them (the HTTP API expects a URL; the SDK handles upload for you).\n\n### Local Backend (no API key needed)\n\nIndex and search using a local Qwen3-VL-Embedding model instead of the Gemini API. Free, private, and runs entirely on your machine. For the best search quality, use the Gemini backend — the local 8B model is a solid alternative when you need offline\u002Fprivate search, and the 2B model is a fallback when hardware can't support 8B.\n\nThe model is **auto-detected from your hardware** — qwen8b for NVIDIA GPUs and Macs with 24 GB+ RAM, qwen2b for smaller Macs and CPU-only systems. You can override with `--model qwen2b` or `--model qwen8b`. Pick an install based on your hardware:\n\n| Hardware | Install command | Auto-detected model | Notes |\n|---|---|---|---|\n| **Apple Silicon, 24 GB+ RAM** | `uv tool install \".[local]\"` | qwen8b | Full float16 via MPS |\n| **Apple Silicon, 16 GB RAM** | `uv tool install \".[local]\"` | qwen2b | 8B won't fit; 2B uses ~6 GB |\n| **Apple Silicon, 8 GB RAM** | `uv tool install \".[local]\"` | qwen2b | Tight — may swap under load; Gemini API recommended instead |\n| **NVIDIA, 18 GB+ VRAM** | `uv tool install \".[local]\"` | qwen8b | Full bf16 precision (CUDA wheels pulled automatically on Linux\u002FWindows) |\n| **NVIDIA, 8–16 GB VRAM** | `uv tool install \".[local-quantized]\"` | qwen8b | 4-bit quantization (~6–8 GB) |\n\n> **Won't work well:** Intel Macs and machines without a dedicated GPU. These fall back to CPU with float32 — too slow and memory-hungry for practical use. Use the **Gemini API backend** (the default) instead.\n\n> **Not sure?** On Mac, use `\".[local]\"`. On NVIDIA, use `\".[local-quantized]\"` — 4-bit quantization works on the widest range of NVIDIA hardware with minimal quality loss. (bitsandbytes requires CUDA and does not work on Mac\u002FMPS.)\n\n**Python version:** PyTorch wheels lag behind new Python releases, so the local backend requires Python 3.11 or 3.12. If your default Python is 3.13+, install a managed 3.12 and pin the tool install to it:\n\n```bash\nuv python install 3.12\nuv tool install --python 3.12 \".[local]\"\n```\n\n**Mac prerequisite:** Install system FFmpeg (the local model's video processor requires it — the Gemini backend uses a bundled ffmpeg instead):\n\n```bash\nbrew install ffmpeg\n```\n\nIndex with `--backend local` and search — no extra flags needed:\n\n```bash\nsentrysearch index \u002Fpath\u002Fto\u002Ffootage --backend local\nsentrysearch search \"car running a red light\"\n```\n\nThe search command auto-detects the backend and model from whatever you indexed with. You can also use `--model` as a shorthand — it implies `--backend local`:\n\n```bash\nsentrysearch index \u002Fpath\u002Fto\u002Ffootage --model qwen2b   # same as --backend local --model qwen2b\nsentrysearch search \"car running a red light\"          # auto-detects local\u002Fqwen2b from index\n```\n\nOptions:\n- `--model qwen2b` — smaller model, lower quality but only ~6 GB memory (also accepts full HuggingFace IDs)\n- `--quantize` \u002F `--no-quantize` — force 4-bit quantization on or off (default: auto-detect based on whether bitsandbytes is installed)\n\nNotes:\n- First run downloads the model (~16 GB for 8B, ~4 GB for 2B).\n- Embeddings from different backends and models are **not compatible**. Each backend\u002Fmodel combination gets its own isolated index, so they can't accidentally mix. If you search with a model that has no indexed data, you'll be told which model was actually used.\n- Speed varies by GPU core count — base M-series chips are slower than Pro\u002FMax but produce identical results.\n\n### Why the local model is fast\n\nThe local backend stays fast and memory-efficient through a few techniques that compound:\n\n- **Preprocessing shrinks chunks before they hit the model.** Each 30s chunk is downscaled to 480p at 5fps via ffmpeg before embedding. A ~19 MB dashcam chunk becomes ~1 MB — a 95% reduction in pixels the model has to process. Model inference time scales with pixel count, not video duration, so this is the single biggest speedup.\n- **Low frame sampling.** The video processor sends at most 32 frames per chunk to the model (`fps=1.0`, `max_frames=32`). A 30-second chunk produces ~30 frames — not hundreds.\n- **MRL dimension truncation.** Qwen3-VL-Embedding supports [Matryoshka Representation Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.13147). Only the first 768 dimensions of each embedding are kept and L2-normalized, reducing storage and distance computation in ChromaDB.\n- **Auto-quantization.** On NVIDIA GPUs with limited VRAM, the 8B model is automatically loaded in 4-bit (bitsandbytes) — dropping from ~18 GB to ~6-8 GB with minimal quality loss. A 4090 (24 GB) runs the full bf16 model with headroom to spare.\n- **Still-frame skipping.** Chunks with no meaningful visual change (e.g. a parked car) are detected by comparing JPEG file sizes across sampled frames and skipped entirely — saving a full forward pass per chunk.\n\nWith all of this, expect ~2-5s per chunk on an A100 and ~3-8s on a T4. On a 4090, the 8B model in bf16 should be in the low single digits per chunk.\n\n### Tesla Metadata Overlay\n\nBurn speed, location, and time onto trimmed clips:\n\n```bash\nsentrysearch search \"car cutting me off\" --overlay\n```\n\nThis extracts telemetry embedded in Tesla dashcam files (speed, GPS) and renders a HUD overlay. The overlay shows:\n\n- **Top center:** speed and MPH label on a light gray card\n- **Below card:** date and time (12-hour with AM\u002FPM)\n- **Top left:** city and road name (via reverse geocoding)\n\n![tesla overlay](docs\u002Ftesla-overlay.png)\n\nRequirements:\n\n- Tesla firmware 2025.44.25 or later, HW3+\n- SEI metadata is only present in driving footage (not parked\u002FSentry Mode)\n- Reverse geocoding uses [OpenStreetMap's Nominatim API](https:\u002F\u002Fnominatim.openstreetmap.org\u002F) via geopy (optional)\n\nInstall with Tesla overlay support:\n\n```bash\nuv tool install \".[tesla]\"\n```\n\nWithout geopy, the overlay still works but omits the city\u002Froad name.\n\nSource: [teslamotors\u002Fdashcam](https:\u002F\u002Fgithub.com\u002Fteslamotors\u002Fdashcam)\n\n### Stitch with SentryMerge\n\n[SentryMerge](https:\u002F\u002Fgithub.com\u002Fssrajadh\u002Fsentrymerge) is a sibling tool that auto-cuts a single cross-camera video of one event from a SentrySearch result. Every time `sentrysearch search` runs, it caches the result list to `~\u002F.sentrysearch\u002Flast_search.json`; SentryMerge picks that up via `--last`, picks the best multi-camera clip-set, asks a VLM for sub-second visibility ranges per camera, and stitches one frame-accurate video that follows the subject across cameras:\n\n```bash\nsentrysearch search \"\u003Cquery>\"\nsentrymerge --last                      # → merge.mp4\n```\n\n`--last` works without re-running search; `sentrymerge --query \"...\"` re-runs search under the hood. See the [SentryMerge README](https:\u002F\u002Fgithub.com\u002Fssrajadh\u002Fsentrymerge#readme) for install instructions, VLM backend options (Gemini \u002F OpenAI \u002F local Qwen), and the modular cam-config system for non-Tesla dashcams.\n\n### Redact with SentryBlur\n\n[SentryBlur](https:\u002F\u002Fgithub.com\u002Fssrajadh\u002Fsentryblur) is a sibling tool for local face, license plate, and natural-language redaction of video. Every time `sentrysearch search` saves a clip, it caches the path to `~\u002F.sentrysearch\u002Flast_clip.json`; SentryBlur picks that up via `--last`, so search-then-redact is two commands and no path-passing:\n\n```bash\nsentrysearch search \"car cuts me off\"\nsentryblur prompt --last \"road signs\"   # → match_\u003C...>_blurred.mp4\n```\n\n`sentryblur faces --last` and `sentryblur plates --last` work the same way. Pick `faces` or `plates` for fast CPU detectors; use `prompt \"\u003Ctext>\"` for arbitrary objects (phone screens, monitors, name tags) — `prompt` requires an NVIDIA GPU or Apple Silicon. See the [SentryBlur README](https:\u002F\u002Fgithub.com\u002Fssrajadh\u002Fsentryblur#readme) for install instructions and hardware notes.\n\n### Managing the index\n\n```bash\n# Show index info (files marked [missing] no longer exist on disk)\nsentrysearch stats\n\n# Remove specific files by path substring\nsentrysearch remove path\u002Fto\u002Ffootage\n\n# Wipe the entire index\nsentrysearch reset\n```\n\n### Verbose mode\n\nAdd `--verbose` to either command for debug info (embedding dimensions, API response times, similarity scores).\n\n## How is this possible?\n\nBoth Gemini Embedding 2 and Qwen3-VL-Embedding can natively embed video — raw video pixels are projected into the same vector space as text queries. There's no transcription, no frame captioning, no text middleman. A text query like \"red truck at a stop sign\" is directly comparable to a 30-second video clip at the vector level. This is what makes sub-second semantic search over hours of footage practical.\n\n## Cost\n\n### Gemini\n\nIndexing 1 hour of footage costs ~$2.84 with Gemini's embedding API (default settings: 30s chunks, 5s overlap):\n\n> 1 hour = 3,600 seconds of video = 3,600 frames processed by the model.\n> 3,600 frames × $0.00079 = ~$2.84\u002Fhr\n\nThe Gemini API natively extracts and tokenizes exactly 1 frame per second from uploaded video, regardless of the file's actual frame rate. The preprocessing step (which downscales chunks to 480p at 5fps via ffmpeg) is a local\u002Fbandwidth optimization — it keeps payload sizes small so API requests are fast and don't timeout — but does not change the number of frames the API processes.\n\nTwo built-in optimizations help reduce costs in different ways:\n\n- **Preprocessing** (on by default) — chunks are downscaled to 480p at 5fps before uploading. Since the API processes at 1fps regardless, this only reduces upload size and transfer time, not the number of frames billed. It primarily improves speed and prevents request timeouts.\n- **Still-frame skipping** (on by default) — chunks with no meaningful visual change (e.g. a parked car) are skipped entirely. This saves real API calls and directly reduces cost. The savings depend on your footage — Sentry Mode recordings with hours of idle time benefit the most, while action-packed driving footage may have nothing to skip.\n\nSearch queries are negligible (text embedding only).\n\n### Qwen Cloud (DashScope, Qwen3-VL-Embedding)\n\nDashScope bills **multimodal embedding** in **CNY per 1,000 input tokens**, by modality. For default model `qwen3-vl-embedding`, Alibaba’s published rates (check the doc below for your region and any updates) are along the lines of:\n\n- **Text input:** about **¥0.0007** per 1k input tokens  \n- **Image \u002F video input:** about **¥0.0018** per 1k input tokens  \n\nIndexing sends **video** chunks (video modality); each `search` \u002F `img` query is mostly **text** or **image** tokens, which are cheaper per token than video. Your real cost is the **token counts returned by DashScope** for each API call (depends on resolution, duration, sampling such as `DASHSCOPE_VIDEO_FPS`, etc.)—there is no fixed “$ per hour of footage” like Gemini’s published per-frame USD rate without measuring your workload.\n\nAlibaba also documents a **free token allowance** (e.g. 1M tokens within a limited period after activation); confirm in the [DashScope multimodal embedding metering & billing](https:\u002F\u002Fhelp.aliyun.com\u002Fdashscope\u002Fdeveloper-reference\u002Fone-peace-multimodal-embedding-metering-and-billing) page and in the Model Studio \u002F billing console, since **pricing, regions, and promotions change**.\n\n### Indexing tuning (both backends)\n\nThese flags affect chunking and preprocessing for **both** Gemini and qwen-cloud:\n\n- `--chunk-duration` \u002F `--overlap` — longer chunks with less overlap = fewer API calls = lower cost\n- `--no-skip-still` — embed every chunk even if nothing is happening\n- `--target-resolution` \u002F `--target-fps` — adjust preprocessing quality\n- `--no-preprocess` — send raw chunks to the API\n\n## Known Warnings (harmless)\n\nThe local backend may print warnings during indexing and search. These are cosmetic and don't affect results:\n\n- **`MPS: nonzero op is not natively supported`** — A known PyTorch limitation on Apple Silicon. The operation falls back to CPU for one step; everything else stays on the GPU. No impact on output quality.\n- **`video_reader_backend torchcodec error, use torchvision as default`** — torchcodec can't find a compatible FFmpeg on macOS. The video processor falls back to torchvision automatically. This is expected and produces identical results.\n- **`You are sending unauthenticated requests to the HF Hub`** — The model downloads from Hugging Face without a token. Download speeds may be slightly lower, but the model loads fine. Set a `HF_TOKEN` environment variable to silence this if it bothers you.\n\n## Limitations & Future Work\n\n- **Still-frame detection is heuristic** — it uses JPEG file size comparison across sampled frames. It may occasionally skip chunks with subtle motion or embed chunks that are truly static. Disable with `--no-skip-still` if you need every chunk indexed.\n- **Search quality depends on chunk boundaries** — if an event spans two chunks, the overlapping window helps but isn't perfect. Smarter chunking (e.g. scene detection) could improve this.\n- **Gemini Embedding 2 is in preview** — API behavior and pricing may change.\n\n## Compatibility\n\nThis works with `.mp4` and `.mov` footage, not just Tesla Sentry Mode. The directory scanner recursively finds both file types regardless of folder structure.\n\n## Requirements\n\n- Python 3.11+\n- `ffmpeg` on PATH, or use bundled ffmpeg via `imageio-ffmpeg` (installed by default)\n- **Gemini backend:** Gemini API key ([get one free](https:\u002F\u002Faistudio.google.com\u002Fapikey))\n- **Local backend:**\n  - GPU with CUDA or Apple Metal (see [hardware table](#local-backend-no-api-key-needed) for VRAM\u002FRAM requirements)\n  - **macOS:** `brew install ffmpeg` (required by the video decoder)\n  - **Linux\u002FWindows:** no extra system dependencies\n","SentrySearch 是一个基于视频内容的语义搜索工具，使用 Gemini Embedding 2 或 Qwen3-VL 技术。该项目能够将用户输入的文字或图片转换为向量，并在预先处理好的视频片段中进行匹配，返回最相关的剪辑片段。它支持多种嵌入模型和本地数据库存储，提供了灵活的搜索方式（包括文字和图像搜索），并能自动提取异常片段。SentrySearch 适用于需要从大量视频资料中快速查找特定事件或对象的场景，如行车记录仪录像分析、监控视频检索等。此外，该项目还集成了视频合并与敏感信息遮挡功能，进一步增强了其实用性。",2,"2026-06-11 03:48:36","high_star"]