[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-831":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},831,"claude-video","bradautomates\u002Fclaude-video","bradautomates","Give Claude the ability to watch any video. \u002Fwatch downloads, extracts frames, transcribes, hands it all to Claude.",null,"Python",1933,360,25,3,0,136,338,888,408,20.67,"MIT License",false,"main",true,[],"2026-06-12 02:00:19","# \u002Fwatch\n\n**Give Claude the ability to watch any video.**\n\nClaude Code:\n```\n\u002Fplugin marketplace add bradautomates\u002Fclaude-video\n\u002Fplugin install watch@claude-video\n```\n\nclaude.ai (web): [download `watch.skill`](https:\u002F\u002Fgithub.com\u002Fbradautomates\u002Fclaude-video\u002Freleases\u002Flatest) and drop it into Settings → Capabilities → Skills.\n\nCodex \u002F generic skills:\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fbradautomates\u002Fclaude-video.git ~\u002F.codex\u002Fskills\u002Fwatch\n```\n\nZero config to start — `yt-dlp` and `ffmpeg` install on first run via `brew` on macOS (Linux\u002FWindows print exact commands). Captions cover most public videos for free. Whisper API key is only needed when a video has no captions.\n\n---\n\nClaude can read a webpage, run a script, browse a repo. What it can't do, out of the box, is *watch a video*. You paste a YouTube link and it has to either guess from the title or pull a transcript that's missing 90% of what's on screen.\n\nWith Claude Video `\u002Fwatch` you can paste a URL or a local path, ask a question, and Claude downloads the video, extracts frames at an auto-scaled rate, pulls a timestamped transcript (free captions when available, Whisper API as fallback), and `Read`s every frame as an image. By the time it answers, it has *seen* the video and *heard* the audio.\n\n```\n\u002Fwatch https:\u002F\u002Fyoutu.be\u002FdQw4w9WgXcQ what happens at the 30 second mark?\n```\n\n## Why this exists\n\nI built this because I'm constantly using video to keep up with content. If I see a YouTube video that's blowing up, I want to know how the creator structured the hook — what's on screen in the first 3 seconds, what they said, why it worked. That used to mean watching it myself with a notepad. Now I just paste the URL and ask.\n\nThe other half is summarization. Most YouTube videos don't deserve 20 minutes of my attention. I hand the URL to Claude, it pulls the transcript, and tells me what actually happened. If the visual matters, frames come along too. If it's a podcast or a talking head, transcript is enough.\n\nClaude is great at reading and synthesizing — but until now, video was the one input I couldn't hand it. Pasting a YouTube link got you nothing useful. `\u002Fwatch` closes that gap.\n\n## What people actually use it for\n\n**Analyze someone else's content.** `\u002Fwatch https:\u002F\u002Fyoutu.be\u002F\u003Cviral-video> what hook did they open with?` Claude looks at the first frames, reads the opening transcript, breaks down the structure. Same for ad creative, competitor launches, podcast intros, anything where the *how* matters as much as the *what*.\n\n**Diagnose a bug from a video.** Someone sends you a screen recording of something broken. `\u002Fwatch bug-repro.mov what's going wrong?` Claude watches the recording, finds the frame where the issue appears, describes what's on screen, often catches the cause without you ever opening the file.\n\n**Summarize a video.** `\u002Fwatch https:\u002F\u002Fyoutu.be\u002F\u003Clong-thing> summarize this` does the obvious thing — pulls the structure, the key moments, what was actually said and shown. Faster than watching at 2x.\n\n## How it works\n\n1. **You paste a video and a question.** URL (anything yt-dlp supports — YouTube, Loom, TikTok, X, Instagram, plus a few hundred more) or a local path (`.mp4`, `.mov`, `.mkv`, `.webm`).\n2. **`yt-dlp` downloads it.** For URLs, into a temp working directory. For local files, no download — just probed in place.\n3. **`ffmpeg` extracts frames at an auto-scaled rate.** The frame budget is duration-aware: ≤30s gets ~30 frames, 30-60s gets ~40, 1-3min gets ~60, 3-10min gets ~80, longer gets 100 sparsely. Hard ceilings: 2 fps, 100 frames. JPEGs at 512px wide by default — bump with `--resolution 1024` if Claude needs to read on-screen text.\n4. **The transcript comes from one of two places.** First try: `yt-dlp` pulls native captions (manual or auto-generated) from the source. Free, instant, accurate-ish. Fallback: extract a mono 16 kHz audio clip and ship it to Whisper — Groq's `whisper-large-v3` (preferred — cheaper and faster) or OpenAI's `whisper-1`.\n5. **Frames + transcript are handed to Claude.** The script prints frame paths with `t=MM:SS` markers and the transcript with timestamps. Claude `Read`s each frame in parallel — JPEGs render directly as images in its context.\n6. **Claude answers grounded in what's actually on screen and in the audio.** Not \"based on the description\" or \"according to the title.\" It saw the frames. It heard the transcript. It answers the way someone who watched the video would.\n7. **Cleanup.** The script prints a working directory at the end. If you're not asking follow-ups, Claude removes it.\n\n## Frame budget — why it matters\n\nToken cost is dominated by frames. Every frame is an image; image tokens add up fast. The script's auto-fps logic exists so you don't blow your context budget on a sparse scan of a 30-minute video that would have been better answered by a focused 30-second window.\n\n| Duration | Default frame budget | What you get |\n|----------|---------------------|--------------|\n| ≤30 s | ~30 frames | Dense — basically every key moment |\n| 30 s - 1 min | ~40 frames | Still dense |\n| 1 - 3 min | ~60 frames | Comfortable |\n| 3 - 10 min | ~80 frames | Sparse but workable |\n| > 10 min | 100 frames | \"Sparse scan\" warning — re-run focused |\n\nWhen the user names a moment (\"around 2:30\", \"the last 30 seconds\", \"from 0:45 to 1:00\"), pass `--start` \u002F `--end`. Focused mode gets denser per-second budgets, capped at 2 fps. Far more useful than a sparse pass over the whole thing.\n\n## Install\n\n| Surface | Install |\n|---------|---------|\n| **Claude Code** | `\u002Fplugin marketplace add bradautomates\u002Fclaude-video` then `\u002Fplugin install watch@claude-video` |\n| **claude.ai** (web) | [Download `watch.skill`](https:\u002F\u002Fgithub.com\u002Fbradautomates\u002Fclaude-video\u002Freleases\u002Flatest) → Settings → Capabilities → Skills → `+` |\n| **Codex** | `git clone https:\u002F\u002Fgithub.com\u002Fbradautomates\u002Fclaude-video.git ~\u002F.codex\u002Fskills\u002Fwatch` |\n| **Manual \u002F dev** | `git clone https:\u002F\u002Fgithub.com\u002Fbradautomates\u002Fclaude-video.git ~\u002F.claude\u002Fskills\u002Fwatch` |\n\n### Claude Code\n\n```\n\u002Fplugin marketplace add bradautomates\u002Fclaude-video\n\u002Fplugin install watch@claude-video\n```\n\nUpdate later with `\u002Fplugin update watch@claude-video`.\n\n### claude.ai (web)\n\n1. [Download `watch.skill`](https:\u002F\u002Fgithub.com\u002Fbradautomates\u002Fclaude-video\u002Freleases\u002Flatest) from the latest release.\n2. Go to Settings → Capabilities → Skills.\n3. Click `+` and drop the file in.\n\nEnable \"Code execution and file creation\" under Capabilities first — the skill shells out to `ffmpeg` and `yt-dlp`, so it won't run without it.\n\n### Codex\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fbradautomates\u002Fclaude-video.git ~\u002F.codex\u002Fskills\u002Fwatch\n```\n\n### Manual (developer)\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fbradautomates\u002Fclaude-video.git ~\u002F.claude\u002Fskills\u002Fwatch\n```\n\n## First run\n\nOn the first `\u002Fwatch` call, the skill runs `scripts\u002Fsetup.py --check`. If `ffmpeg` \u002F `yt-dlp` aren't on your PATH, or no Whisper API key is set, it walks you through fixing it:\n\n- **macOS** — auto-runs `brew install ffmpeg yt-dlp`.\n- **Linux** — prints the exact `apt` \u002F `dnf` \u002F `pipx` commands.\n- **Windows** — prints the `winget` \u002F `pip` commands.\n- **API key** — scaffolds `~\u002F.config\u002Fwatch\u002F.env` (mode `0600`) with commented placeholders for `GROQ_API_KEY` (preferred) and `OPENAI_API_KEY`.\n\nAfter setup, preflight is silent and `\u002Fwatch` just works. The check is a sub-100ms lookup, so it doesn't slow you down on subsequent runs.\n\n## Bring your own keys\n\nCaptions cover the majority of public videos for free. The Whisper fallback only kicks in when a video genuinely has no caption track — typically local files, TikToks, some Vimeos, and the occasional caption-less YouTube upload.\n\n| Capability | What you need | Cost |\n|------------|---------------|------|\n| Download + native captions | `yt-dlp` + `ffmpeg` | Free |\n| Whisper fallback (preferred) | [Groq API key](https:\u002F\u002Fconsole.groq.com\u002Fkeys) — `whisper-large-v3` | Cheap, fast |\n| Whisper fallback (alt) | [OpenAI API key](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys) — `whisper-1` | Standard pricing |\n| Disable Whisper entirely | `--no-whisper` | Free, frames-only when no captions |\n\n## Usage\n\n```\n\u002Fwatch https:\u002F\u002Fyoutu.be\u002FdQw4w9WgXcQ what happens at the 30 second mark?\n\u002Fwatch https:\u002F\u002Fwww.tiktok.com\u002F@user\u002Fvideo\u002F123 summarize this\n\u002Fwatch ~\u002FMovies\u002Fscreen-recording.mp4 when does the UI break?\n\u002Fwatch https:\u002F\u002Fvimeo.com\u002F123 what tools does she mention?\n```\n\nFocused on a specific section — denser frame budget, lower token cost:\n```\n\u002Fwatch https:\u002F\u002Fyoutu.be\u002Fabc --start 2:15 --end 2:45\n\u002Fwatch video.mp4 --start 50 --end 60\n\u002Fwatch \"$URL\" --start 1:12:00            # from 1h12m to end\n```\n\nOther knobs (passed to `scripts\u002Fwatch.py`):\n\n- `--max-frames N` — lower the frame cap for a tighter token budget.\n- `--resolution W` — bump frame width to 1024 px when Claude needs to read on-screen text (slides, terminals, code).\n- `--fps F` — override the auto-fps calculation (still capped at 2 fps).\n- `--whisper groq|openai` — force a specific Whisper backend.\n- `--no-whisper` — disable transcription entirely; frames only.\n- `--out-dir DIR` — keep working files somewhere specific (default: auto-generated tmp dir).\n\n## Limits\n\n- **Best accuracy: under 10 minutes.** Past that the script prints a \"sparse scan\" warning — re-run focused on the part you actually care about with `--start`\u002F`--end`.\n- **Hard caps: 2 fps, 100 frames.** Frame count drives token cost; the script enforces this even when the auto-fps math would imply higher.\n- **Whisper upload limit: 25 MB.** At mono 16 kHz that's about 50 minutes of audio. Longer videos need either captions or `--start`\u002F`--end` to a smaller window.\n- **No private platforms.** This skill doesn't log into anything. Public URLs and local files only. If yt-dlp can't reach it without auth, neither can `\u002Fwatch`.\n\n## Structure\n\n```\n.\n├── SKILL.md                 # skill contract — loaded by all three surfaces\n├── scripts\u002F\n│   ├── watch.py             # entry point — orchestrates download → frames → transcript\n│   ├── download.py          # yt-dlp wrapper\n│   ├── frames.py            # ffmpeg frame extraction + auto-fps logic\n│   ├── transcribe.py        # VTT parsing + dedupe + Whisper orchestration\n│   ├── whisper.py           # Groq \u002F OpenAI clients (pure stdlib)\n│   ├── setup.py             # preflight + installer\n│   └── build-skill.sh       # build dist\u002Fwatch.skill for claude.ai upload\n├── hooks\u002F                   # SessionStart status hook (Claude Code only)\n├── .claude-plugin\u002F          # plugin.json + marketplace.json (Claude Code)\n├── .codex-plugin\u002F           # codex packaging\n└── .github\u002Fworkflows\u002F       # release.yml — auto-builds watch.skill on tag push\n```\n\n## Develop\n\n```bash\n# Build the claude.ai upload bundle:\nbash scripts\u002Fbuild-skill.sh      # → dist\u002Fwatch.skill\n```\n\nReleasing: tag `vX.Y.Z`, push the tag. The workflow builds `dist\u002Fwatch.skill` and attaches it to the GitHub release.\n\nSee [CHANGELOG.md](CHANGELOG.md) for version history.\n\n## Open source\n\nMIT license.\n\nBuilt on `yt-dlp`, `ffmpeg`, and Claude's multimodal `Read` tool. Whisper transcription via [Groq](https:\u002F\u002Fgroq.com) or [OpenAI](https:\u002F\u002Fopenai.com).\n\n---\n\n[github.com\u002Fbradautomates\u002Fclaude-video](https:\u002F\u002Fgithub.com\u002Fbradautomates\u002Fclaude-video) · [LICENSE](LICENSE)\n","该项目赋予Claude观看并分析视频的能力。其核心功能包括自动下载视频、提取关键帧、生成带时间戳的转录文本（优先使用免费字幕，必要时调用Whisper API），并将这些信息提供给Claude进行分析。项目采用Python编写，并利用了yt-dlp和ffmpeg等工具处理视频文件。适用于需要快速理解或总结视频内容的场景，如分析竞争对手的创意视频、诊断屏幕录制中的软件问题或为长视频生成摘要。通过简单的命令行操作即可启动服务，极大提高了工作效率。",2,"2026-06-11 02:39:37","CREATED_QUERY"]