[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-76106":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":14,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":14,"lastSyncTime":29,"discoverSource":30},76106,"watch-cli","sonpiaz\u002Fwatch-cli","sonpiaz","Watch any social video → get an architecture diagram, working component, runnable notebook, or step-by-step cheat sheet — automatically.","https:\u002F\u002Fkymaapi.com",null,"Shell",222,57,2,1,0,3,16,6,5.29,"MIT License",false,"main",true,[],"2026-06-12 02:03:40","# watch-cli\n\nGive your AI agent eyes and ears for any social video.\n\n```bash\nwatch https:\u002F\u002Ftwitter.com\u002Fanyone\u002Fstatus\u002F12345\n```\n\nYou get back: a video file, evenly-spaced frames as JPGs, and the full\naudio transcript. Your agent reads them and \"watches\" the video — works\non YouTube, X, LinkedIn, TikTok, Reddit, Vimeo, and Facebook. Login-walled\nposts (LinkedIn, X, FB) fall back to your browser cookies automatically.\n\n---\n\n## Why this exists\n\nLarge language models can't watch video natively — they read text and\nlook at still images. Modern multimodal APIs *will* analyze a full video\nfor you, but they're slow and expensive. The trick: **you almost never\nneed them**.\n\nA video is just frames + audio. Each piece has a fast, near-free tool\nalready:\n\n- `yt-dlp` downloads from any social platform\n- `ffmpeg` extracts evenly-spaced frames\n- An ASR model transcribes the audio\n- A multimodal LLM hears tone, music, SFX, language, mood\n\nCompose them and your agent has video understanding — without burning a\nmultimodal LLM on every frame.\n\n---\n\n## What it looks like\n\n```text\n$ watch https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Fsome-talk_activity-12345\n\nVIDEO: \u002Ftmp\u002Fdl-video\u002Fabc123.mp4\nDURATION: 218\nFRAMES:\n  \u002Ftmp\u002Fframes_abc123\u002Fframe_01.jpg\n  \u002Ftmp\u002Fframes_abc123\u002Fframe_02.jpg\n  …\nTRANSCRIPT:\n  Today I want to talk about how decomposition unlocks 10× cost reduction in\n  multimodal pipelines …\n```\n\nYour agent reads the JPGs and the transcript. That's the whole watch.\n\n---\n\n## Benchmark\n\nAfter noticing how much we burned on multimodal calls, we measured:\n\n| Approach | Cost \u002F 1-hour video | Time |\n|---|---|---|\n| Multimodal LLM on full video | ~$5 | 30–60s |\n| watch-cli (Kyma audio) | \u003C $0.10 | ~10–15s |\n| **Ratio** | **~50× cheaper** | **~5× faster** |\n\nThe savings compound when an agent watches dozens of videos in a session.\n\n---\n\n## Install\n\n```bash\ncurl -fsSL https:\u002F\u002Fraw.githubusercontent.com\u002Fsonpiaz\u002Fwatch-cli\u002Fmain\u002Finstall.sh | bash\n```\n\nOr from a clone:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsonpiaz\u002Fwatch-cli ~\u002F.watch-cli\ncd ~\u002F.watch-cli && .\u002Finstall.sh\n```\n\nThe installer checks for `yt-dlp`, `ffmpeg`, `jq`, `curl`, `python3` and\nsymlinks the commands into `~\u002F.local\u002Fbin`. On macOS:\n\n```bash\nbrew install yt-dlp ffmpeg jq\n```\n\nOn Debian\u002FUbuntu:\n\n```bash\nsudo apt install yt-dlp ffmpeg jq python3 curl\n```\n\n---\n\n## Setup\n\n```bash\nexport KYMA_API_KEY=kyma-xxxxxxxx\n```\n\nGet a Kyma key at [kymaapi.com](https:\u002F\u002Fkymaapi.com) — 60 seconds, no card.\n\nPrefer bring-your-own-keys? Comment in `GROQ_API_KEY` and `GOOGLE_AI_KEY`\nin `.env.example` and watch-cli falls back to direct provider calls.\n\n---\n\n## Why Kyma\n\nwatch-cli uses Kyma as its AI backend. A few things you get for free:\n\n![models](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fapi.kymaapi.com\u002Fapi\u002Fbadge\u002Fmodels.json)\n![creators](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fapi.kymaapi.com\u002Fapi\u002Fbadge\u002Fcreators.json)\n![free credit](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fapi.kymaapi.com\u002Fapi\u002Fbadge\u002Ffree-credit.json)\n\n- **One key, every model in this CLI.** `transcribe` today is Whisper v3\n  turbo. When Kyma swaps in Voxtral or Whisper v4, your watch-cli scripts\n  keep working with zero changes — the alias stays.\n- **Per-call cost in the response.** Every transcribe gives you a real\n  number, not an end-of-month dashboard surprise.\n- **Auto-fallback across providers.** If the underlying audio provider is\n  throttling or down, Kyma routes through another. Your script never sees\n  the outage.\n- **Free credit at signup.** About an hour of audio. Enough to know if\n  you like it before you spend a cent.\n\nThe badges above pull live from `api.kymaapi.com\u002Fapi\u002Fstats`, so the model\ncount and free-credit number stay current without a watch-cli release.\n\n---\n\n## Commands\n\n```text\nwatch \u003Curl> [frame-count] [--cookies \u003Cfile>]\n  Orchestrator. Downloads, extracts frames, transcribes — one block out.\n\ndl-video \u003Curl> [out-dir] [--cookies \u003Cfile>]\n  Just download the video. Returns the local mp4 path.\n\nextract-frames \u003Cvideo> [count] [out-dir]\n  Pull N evenly-spaced JPG frames. Default 8.\n\ntranscribe \u003Caudio-or-video> [language]\n  Speech-to-text. Auto-extracts audio from video first.\n\naudio-q \u003Caudio-or-video> \"\u003Cquestion>\"\n  Audio scene Q&A — tone, music, SFX, language, emotion.\n  Beyond pure transcription.\n\nmodels [--all]\n  List audio models available on Kyma (live, no hardcoded list).\n  --all to see every Kyma SKU (text + image + video + audio).\n```\n\n### How `transcribe` and `audio-q` stay current\n\nThe scripts call Kyma using the `transcribe` and `audio-understand` aliases,\nnot raw model IDs. When Kyma swaps the underlying model (Whisper v4,\nVoxtral, a faster ASR), watch-cli keeps working without an update — the\nalias points to whichever model is current. Run `watch-cli models` any time\nto see what's behind the alias today.\n\n---\n\n## Login-walled videos\n\nMost YouTube \u002F TikTok \u002F Reddit \u002F Vimeo \u002F public X work without setup.\nLinkedIn, private X posts, and Facebook need a session.\n\nwatch-cli auto-detects cookies from any signed-in browser\n(Chrome → Firefox → Safari → Edge → Brave → Chromium). Just sign in\nnormally and re-run.\n\nFor servers \u002F CI without browsers, pass a manual cookies file:\n\n```bash\nwatch \u003Curl> --cookies ~\u002Fcookies.txt\n```\n\nFull setup walkthrough: [docs\u002Fcookies.md](docs\u002Fcookies.md).\n\n---\n\n## Use with Claude Code (or any agent)\n\n```text\nYou have access to a `watch` command that takes a URL and returns\na video, 8 frames, and the transcript. Read the frames as images and\nthe transcript as text — that's enough to \"watch\" any social video.\n```\n\nThe output block is structured so an agent can parse it without help:\n`VIDEO:` line, `FRAMES:` block (one path per line), `TRANSCRIPT:` block.\n\n---\n\n## Prompt library\n\nBeyond the generic prompt above, five copy-paste prompts in\n[`prompts\u002F`](prompts\u002F) turn `watch` output into a specific artifact:\n\n| Goal | File |\n|---|---|\n| Coding walkthrough → working project | [`implement-from-video.md`](prompts\u002Fimplement-from-video.md) |\n| System talk → interactive architecture diagram | [`extract-architecture.md`](prompts\u002Fextract-architecture.md) |\n| UI \u002F motion demo → working React component | [`clone-ux.md`](prompts\u002Fclone-ux.md) |\n| Paper \u002F research talk → runnable notebook | [`paper-to-code.md`](prompts\u002Fpaper-to-code.md) |\n| Long tutorial → step-by-step cheat sheet | [`tutorial-walkthrough.md`](prompts\u002Ftutorial-walkthrough.md) |\n\nPaste the chosen prompt above the `watch` output, hand the whole thing\nto your agent.\n\n### Use as a Claude Code skill\n\nDrop [`skills\u002Fwatch-cli\u002F`](skills\u002Fwatch-cli\u002F) into your\n`~\u002F.claude\u002Fskills\u002F` folder and the agent will pick up `\u002Fwatch \u003Curl>`\nas a first-class command, including the prompt library above.\n\n```bash\nmkdir -p ~\u002F.claude\u002Fskills\ncp -r skills\u002Fwatch-cli ~\u002F.claude\u002Fskills\u002F\n```\n\n---\n\n## How it works\n\n```text\nURL ──▶ yt-dlp ──▶ video.mp4 ──┬──▶ ffmpeg ──▶ frames\u002F*.jpg\n                                │\n                                └──▶ ffmpeg ──▶ audio.mp3 ──┬──▶ Kyma \u002Fv1\u002Faudio\u002Ftranscriptions\n                                                            │     (Whisper Large v3 Turbo, 228× realtime)\n                                                            │\n                                                            └──▶ Kyma \u002Fv1\u002Faudio\u002Funderstand\n                                                                  (Gemini 3 Flash audio — tone\u002Fmusic\u002FSFX)\n```\n\nEach step is a primitive. None of them needs a vision LLM.\n\n---\n\n## Show what you build\n\nBuilt something cool from a video? Drop it in\n[Discussions](https:\u002F\u002Fgithub.com\u002Fsonpiaz\u002Fwatch-cli\u002Fdiscussions) under\n**Show and tell**. Post the source URL, the prompt you used, and your\nartifact. Curated highlights make it back into the README.\n\n---\n\n## Limitations and cost\n\nWatch-cli is fast and cheap because it composes primitives instead of\ncalling a video LLM. The tradeoffs are honest.\n\n### Cost per video\n\nTranscription is the only paid step. Frame extraction is local ffmpeg,\nfree.\n\n| Video length | Transcribe cost |\n|---|---|\n| 5 minutes (tweet, short demo) | ~$0.003 |\n| 1 hour (LinkedIn talk, podcast) | ~$0.04 |\n| 2 hours (conference talk) | ~$0.08 |\n\nFree credit at Kyma signup covers roughly 25 hours of audio. Bring your\nown Groq key and the price stays the same (Whisper v3 turbo, $0.04\u002Fhour\nboth ways).\n\n### What works well\n\n- Talking-head content: tutorials, conference talks, lectures, walkthroughs\n- Architecture and system diagrams shown for at least 3 seconds\n- Code that stays on screen long enough to read\n- ~95 languages (anything Whisper v3 turbo supports)\n\n### What works poorly\n\n- Music videos, action movies, fast-cut content. Eight evenly-spaced\n  frames miss key moments. Bump count: `watch \u003Curl> 24`.\n- Editor sessions that scroll fast through code. Same fix.\n- Audio with heavy background music and overlapping speakers. Transcript\n  quality drops. Use `audio-q` for a scene description instead.\n- Videos longer than ~2 hours. The transcribe provider has a 25MB audio\n  cap. Watch-cli auto-downsamples but a 3-hour talk may still exceed.\n  Workaround: split via `ffmpeg -ss` before piping.\n\n### What does not work yet\n\n- Region-locked videos (some YouTube, TikTok). yt-dlp returns an error;\n  watch-cli surfaces it.\n- Live streams. Download finishes only after the stream ends.\n- Silent screencasts. Transcribe returns empty. Increase frame count and\n  use `audio-q` for any sound design instead.\n\n### Frame count guidance\n\n| Video type | Recommended `frame-count` |\n|---|---|\n| Short tweet \u002F clip (\u003C2 min) | 4 to 8 (default) |\n| Standard tutorial \u002F talk (5–20 min) | 8 to 16 |\n| Long talk \u002F lecture (20–60 min) | 16 to 24 |\n| Conference talk \u002F multi-hour (>1 hr) | 24 to 32 |\n| Fast-cut or dense UI demo | Double the recommendation for that length |\n\n---\n\n## License\n\nMIT. © 2026 Son Piaz.\n","watch-cli 是一个为AI代理提供处理社交媒体视频能力的工具，能够以远低于直接调用多模态API的成本实现视频内容的理解。其核心功能包括从多个社交平台下载视频、提取均匀间隔的帧图像以及生成完整的音频转录文本，从而让AI能够“观看”并理解视频内容。该工具利用了`yt-dlp`、`ffmpeg`等开源软件和自动语音识别模型来完成这些任务，适用于需要对大量视频进行分析但预算有限的场景，如社交媒体监控、内容审核或基于视频的数据挖掘等。通过组合使用这些高效且低成本的技术手段，watch-cli 为用户提供了比传统多模态API更经济快捷的解决方案。","2026-06-11 03:54:30","CREATED_QUERY"]