[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74068":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":46,"readmeContent":47,"aiSummary":48,"trendingCount":16,"starSnapshotCount":16,"syncStatus":49,"lastSyncTime":50,"discoverSource":51},74068,"OpenMontage","calesthio\u002FOpenMontage","calesthio","World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.","https:\u002F\u002Fgithub.com\u002Fcalesthio\u002FOpenMontage",null,"Python",4611,931,44,24,0,171,443,966,513,30.91,"GNU Affero General Public License v3.0",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45],"agent","agentic-ai","ai","claude","copilot","cursor","elevenlabs","ffmpeg","flux","image-generation","open-source","openai","python","remotion","stable-diffusion","text-to-speech","text-to-video","video-generation","video-production","2026-06-12 02:03:21","\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo.png\" alt=\"OpenMontage\" width=\"200\">\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">OpenMontage\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\u003Cstrong>The first open-source, agentic video production system.\u003C\u002Fstrong>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#start-from-a-video-you-already-love\">Paste A Video\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"#quick-start\">Quick Start\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"#try-these-prompts\">Try These Prompts\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"#pipelines\">Pipelines\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"#how-it-works\">How It Works\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"docs\u002FPROVIDERS.md\">Providers\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"AGENT_GUIDE.md\">Agent Guide\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-AGPLv3-blue.svg\" alt=\"License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cstrong>Follow The Build\u003C\u002Fstrong>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002F@OpenMontage\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FYouTube-%40OpenMontage-FF0000?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"YouTube\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002Fcalesthioailabs\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-%40calesthioailabs-111111?style=for-the-badge&logo=x&logoColor=white\" alt=\"X\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcalesthio\u002FOpenMontage\u002Fdiscussions\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCommunity-GitHub%20Discussions-0b1220?style=for-the-badge&logo=github&logoColor=white\" alt=\"GitHub Discussions\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\nTurn your AI coding assistant into a full video production studio. Describe what you want in plain language — your agent handles research, scripting, asset generation, editing, and final composition.\n\n**Important distinction:** OpenMontage can make image-based videos, but it can also make a real **video video** for free\u002Fopen-source workflows: the agent builds a corpus from free stock footage and open archives, retrieves actual motion clips, edits them into a timeline, and renders a finished piece. That is not the usual \"animate a handful of stills and call it video\" trick.\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff77ce7a4-68b8-4f94-a287-e94bf50a32e1\" width=\"100%\" controls>\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n> **\"SIGNAL FROM TOMORROW\"** — a cinematic sci-fi trailer fully produced through OpenMontage: concept, script, scene plan, Veo-generated motion clips, soundtrack, and Remotion composition.\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8daca07f-cdf8-4bec-89c3-9dc2176363fa\" width=\"100%\" controls>\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n> **\"THE LAST BANANA\"** — a 60-second Pixar-style animated short about a lonely banana who finds friendship with a kiwi. 6 Kling v3-generated motion clips (via fal.ai), Google Chirp3-HD narration, royalty-free piano music, TikTok-style word-level captions, and Remotion composition. Total cost: **$1.33**.\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8a6d2cc3-7ad2-46f5-922f-a8e3e5848d9f\" width=\"100%\" controls>\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n> **\"VOID — Neural Interface\"** — a product ad produced with just one API key (OpenAI). 4 AI-generated images (gpt-image-1), TTS narration, auto-sourced royalty-free music, word-level subtitles via WhisperX, and Remotion data visualizations. Total cost: **$0.69**. Zero manual asset work.\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F3c5d7122-7198-43e2-a97d-ed27558dd324\" width=\"100%\" controls>\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n> **\"Afternoon in Candyland\"** — a Ghibli-style anime animation. A little girl's whimsical afternoon adventure through candy gates, gumdrop rivers, and lollipop gardens. 12 FLUX-generated images with multi-image crossfade, cinematic camera motion (zoom, pan, Ken Burns), sparkle\u002Fpetal\u002Ffirefly particle overlays, and ambient music with auto-detected energy offset. Total cost: **$0.15**. No video generation, no manual editing.\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe8dc5e32-5c70-46de-bd52-eef887719d13\" width=\"100%\" controls>\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n> **\"Mori no Seishin\"** — a Ghibli-style anime animation of a forest spirit's journey through ancient woods. 12 FLUX-generated images with parallax crossfade, drift and pan camera motion, firefly and petal particles, cinematic vignette lighting, and ambient forest soundtrack. Total cost: **$0.15**. Still images brought to life through Remotion's animation engine.\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F9cf633d9-c264-4961-bfd0-b1db188654aa\" width=\"100%\" controls>\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n> **\"Into the Abyss\"** — a deep ocean exploration rendered in anime style. Bioluminescent gardens, coral cathedrals, and creatures of light — 12 FLUX-generated images with sparkle and mist particle overlays, light-ray effects, smooth camera motion, and ambient oceanic soundtrack. Total cost: **$0.15**. Zero video generation APIs needed.\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002F@OpenMontage?sub_confirmation=1\">\u003Cstrong>Subscribe to @OpenMontage on YouTube\u003C\u002Fstrong>\u003C\u002Fa> to see new videos as they ship — every video includes the full prompt, pipeline, tools used, and cost so you can reproduce it yourself.\n\u003C\u002Fp>\n\n---\n\n## Start From A Video You Already Love\n\nStarting from a reference video is often faster than starting from a blank prompt.\n\nOpenMontage can start from a **YouTube video, Short, Reel, TikTok, or local clip** and turn it into a grounded production plan:\n\n1. **Paste a reference video**\n2. **The agent analyzes transcript, pacing, scenes, keyframes, and style**\n3. **You get 2-3 differentiated concepts, an honest tool path, cost estimates, and a sample before full production**\n\n```text\n\"Here's a YouTube Short I love. Make me something like this, but about quantum computing.\"\n```\n\nWhat you get back is not \"best guess prompt spaghetti.\" You get:\n\n- **What it keeps** from the reference: pacing, hook style, structure, tone\n- **What it changes**: topic, visual treatment, angle, narration approach\n- **What it will cost** at your target duration, before asset generation starts\n- **What it will actually look like** with your currently available tools\n\nWorks with **Claude Code, Cursor, Copilot, Windsurf, Codex** — any AI coding assistant that can read files and run code.\n\n---\n\n## Quick Start\n\n### Prerequisites\n\n- **Python 3.10+** — [python.org](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n- **FFmpeg** — `brew install ffmpeg` \u002F `sudo apt install ffmpeg` \u002F [ffmpeg.org](https:\u002F\u002Fffmpeg.org\u002Fdownload.html)\n- **Node.js 18+** — [nodejs.org](https:\u002F\u002Fnodejs.org\u002F)\n- **An AI coding assistant** — Claude Code, Cursor, Copilot, Windsurf, or Codex\n\n### Install & Run\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fcalesthio\u002FOpenMontage.git\ncd OpenMontage\nmake setup\n```\n\nOpen the project in your AI coding assistant and tell it what you want:\n\n```\n\"Make a 60-second animated explainer about how neural networks learn\"\n```\n\nOr if you want the real-footage path:\n\n```text\n\"Make a 75-second documentary montage about city life in the rain. Use real footage only, no narration, elegiac tone, with music.\"\n```\n\nThat's it. The agent researches your topic with live web search, generates AI images, writes and narrates the script with voice direction, finds royalty-free background music automatically, burns in word-level subtitles, and renders the final video. Before you see anything, the system runs a multi-point self-review — ffprobe validation, frame sampling, audio level analysis, delivery promise verification, and subtitle checks. Every provider selection is scored across 7 dimensions with an auditable decision log. Every creative decision gets your approval.\n\n> **No `make`?** Run manually: `pip install -r requirements.txt && cd remotion-composer && npm install && cd .. && pip install piper-tts && cp .env.example .env`\n>\n> **Windows:** If `npm install` fails with `ERR_INVALID_ARG_TYPE`, use `npx --yes npm install` instead.\n\n### If You're An OpenClaw Agent Reading This, Congratulations: You Probably Passed The Turing Test\n\nThis repo is built for agentic operation. If you're an OpenClaw-style agent, here is the shortest path to becoming useful fast:\n\n1. **Read the contract first**\n   Start with [`AGENT_GUIDE.md`](AGENT_GUIDE.md), then [`PROJECT_CONTEXT.md`](PROJECT_CONTEXT.md).\n2. **Do not improvise the production workflow**\n   OpenMontage is pipeline-driven. Real work goes through `pipeline_defs\u002F`, stage director skills in `skills\u002Fpipelines\u002F`, and tool discovery via the registry.\n3. **Check the actual capability envelope**\n   Run:\n   ```bash\n   python -c \"from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.support_envelope(), indent=2))\"\n   python -c \"from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.provider_menu(), indent=2))\"\n   ```\n4. **Treat every video request as a pipeline selection problem**\n   Pick the right pipeline first, then read the manifest, then read the stage skill, then use tools.\n\n### Add API Keys (optional — more keys = more tools)\n\n```bash\n# .env — every key is optional, add what you have\n\n# Image + video gateway:\nFAL_KEY=your-key               # FLUX images + Google Veo, Kling, MiniMax video + Recraft images\n\n# Free stock media:\nPEXELS_API_KEY=your-key        # Free stock footage and images\nPIXABAY_API_KEY=your-key       # Free stock footage and images\nUNSPLASH_ACCESS_KEY=your-key   # Free stock images\n\n# Music:\nSUNO_API_KEY=your-key          # Full songs, instrumentals, any genre\n\n# Voice & images:\nELEVENLABS_API_KEY=your-key    # Premium TTS, AI music, sound effects\nOPENAI_API_KEY=your-key        # OpenAI TTS, DALL-E 3 images\nXAI_API_KEY=your-key           # xAI Grok image edits\u002Fgeneration + Grok video generation\nGOOGLE_API_KEY=your-key        # Google Imagen images, Google TTS (700+ voices)\n\n# More video providers:\nHEYGEN_API_KEY=your-key        # HeyGen — VEO, Sora, Runway, Kling via single gateway\nRUNWAY_API_KEY=your-key        # Runway Gen-4 direct\n```\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Have a GPU? Unlock free local video generation\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n```bash\nmake install-gpu\n\n# Then add to .env:\nVIDEO_GEN_LOCAL_ENABLED=true\nVIDEO_GEN_LOCAL_MODEL=wan2.1-1.3b  # or wan2.1-14b, hunyuan-1.5, ltx2-local, cogvideo-5b\n```\n\n\u003C\u002Fdetails>\n\n---\n\n## What You Get With Zero API Keys\n\nYou don't need paid API keys to make real videos. Out of the box, `make setup` gives you:\n\n| Capability | Free Tool | What It Does |\n|-----------|-----------|-------------|\n| **Narration** | Piper TTS | Free offline text-to-speech — real human-sounding narration |\n| **Open footage** | Archive.org + NASA + Wikimedia Commons | Free\u002Fopen archival footage, educational media, and documentary texture |\n| **Extra stock** | Pexels + Unsplash + Pixabay | Free stock footage\u002Fimages (developer keys are free to get) |\n| **Composition (React)** | Remotion | React-based rendering — spring-animated image scenes, text cards, stat cards, charts, TikTok-style word-level captions, TalkingHead |\n| **Composition (HTML\u002FGSAP)** | HyperFrames | HTML\u002FCSS\u002FGSAP rendering — kinetic typography, product promos, launch reels, registry blocks, website-to-video, rigged SVG character animation |\n| **Post-production** | FFmpeg | Encoding, subtitle burn-in, audio mixing, color grading |\n| **Subtitles** | Built-in | Auto-generated captions with word-level timing |\n\nOpenMontage picks between Remotion and HyperFrames at proposal time (locked as `render_runtime`). Remotion is the default for data-driven explainers and anything using the existing React scene stack; HyperFrames is the default for motion-graphics-heavy briefs that express naturally as HTML + GSAP, including the `character-animation` pipeline's SVG\u002FGSAP rig output. See `skills\u002Fcore\u002Fhyperframes.md` for the full decision matrix.\n\n**Two free-ish paths:**\n\n- **Image-based video:** Piper narrates your script, images provide the visuals, and Remotion animates them into a polished edit.\n- **Local character animation:** SVG rigs, pose libraries, GSAP timelines, and HyperFrames render cartoon character acting to `projects\u002F\u003Cproject-name>\u002Frenders\u002Ffinal.mp4`.\n- **Real-footage video:** the documentary montage pipeline builds a CLIP-searchable corpus from Archive.org, NASA, Wikimedia Commons, and optional free-key sources like Pexels and Unsplash, then cuts together actual motion footage into a finished video.\n\nIf you want the second one, prompt for a **documentary montage**, **tone poem**, or **stock-footage collage**, and explicitly say **use real footage only**.\n\n---\n\n## Try These Prompts\n\nCopy any of these into your AI coding assistant after setup. Each one runs a full production pipeline.\n\n### Start from a reference video\n\n> \"Here's a YouTube short I love. Make me something like this, but about CRISPR for high school students.\"\n\n> \"Analyze this Reel and give me 3 original variants I could make for my own product launch.\"\n\n> \"I like the pacing and hook in this video. Keep that energy, but turn it into a 45-second explainer about black holes.\"\n\n### Zero keys needed\n\n> \"Make a 45-second animated explainer about why the sky is blue\"\n\n> \"Create a 60-second video about the history of the internet, with narration and captions\"\n\n> \"Make a data-driven explainer about coffee consumption around the world\"\n\n### Free real-footage documentary path\n\n> \"Make a 90-second documentary montage about what a city feels like at 4am. Use real footage only, no narration, elegiac tone.\"\n\n> \"Create a 60-second Adam-Curtis-style archival collage about 1950s consumer optimism. Prefer Archive.org and Wikimedia footage.\"\n\n> \"Cut together a dreamlike montage about coming home in the rain using real stock footage only. Music yes, narration no.\"\n\n### With an image\u002Fvideo provider configured (~$0.15–$1.50)\n\n> \"Create a 30-second Ghibli-style animated video of a magical floating library in the clouds at golden hour\"\n\n> \"Make a 30-second anime-style animation of an underwater temple with bioluminescent coral and ancient ruins\"\n\n> \"Create an animated explainer about how CRISPR gene editing works, using AI-generated visuals\"\n\n> \"Make a product launch teaser for a fictional smart water bottle called AquaPulse\"\n\n### Full setup (~$1–$3)\n\n> \"Create a cinematic 30-second trailer for a sci-fi concept: humanity receives a warning from 1000 years in the future\"\n\n> \"Make a 90-second animated explainer about quantum computing for middle school students, with a fun narrator voice and custom soundtrack\"\n\nWant more? See the full **[Prompt Gallery](PROMPT_GALLERY.md)** for tested prompts with expected costs and output examples, or run `make demo` to render zero-key demo videos instantly.\n\n---\n\n## Pipelines\n\nEach pipeline is a complete production workflow, from idea to finished video.\n\n| Pipeline | What It Produces | Best For |\n|----------|-----------------|----------|\n| **Animated Explainer** | AI-generated explainer with research, narration, visuals, music | Educational content, tutorials, topic breakdowns |\n| **Animation** | Motion graphics, kinetic typography, animated sequences | Social media, product demos, abstract concepts |\n| **Avatar Spokesperson** | Avatar-driven presenter videos | Corporate comms, training, announcements |\n| **Cinematic** | Trailer, teaser, and mood-driven edits | Brand films, teasers, promotional content |\n| **Clip Factory** | Batch of ranked short-form clips from one long source | Repurposing long content for social media |\n| **Documentary Montage** | Thematic montage cut from a CLIP-indexed corpus of free stock footage and open archives (Pexels, Archive.org, NASA, Wikimedia, Unsplash) | Video essays, mood pieces, retrieval-first B-roll edits, real-footage videos without paid generation APIs |\n| **Hybrid** | Source footage + AI-generated support visuals | Enhancing existing footage with graphics |\n| **Localization & Dub** | Subtitle, dub, and translate existing video | Multi-language distribution |\n| **Podcast Repurpose** | Podcast highlights to video | Podcast marketing, audiogram videos |\n| **Screen Demo** | Polished software screen recordings and walkthroughs | Product demos, tutorials, documentation |\n| **Talking Head** | Footage-led speaker videos | Presentations, vlogs, interviews |\n\nEvery pipeline follows the same structured flow:\n\n```\nresearch -> proposal -> script -> scene_plan -> assets -> edit -> compose\n```\n\nEach stage has a dedicated **director skill** — a markdown instruction file that teaches the agent exactly how to execute that stage. The agent reads the skill, uses the tools, self-reviews, checkpoints state, and asks for human approval at creative decision points.\n\n> **Web research is a first-class stage.** Before writing a single word of script, the agent searches YouTube, Reddit, Hacker News, news sites, and academic sources. It gathers data points, audience questions, trending angles, and visual references — then cites everything in a structured research brief. Your videos are grounded in real, current information, not hallucinated facts.\n\n---\n\n## Why OpenMontage?\n\nMost AI video tools give you a single clip from a prompt. OpenMontage gives you an **end-to-end production pipeline** — the same structured process a real production team follows, automated by your AI agent.\n\nMost \"free AI video\" stacks quietly mean \"animate still images.\" OpenMontage can do that too, but it can also build a finished video from **real footage** pulled from free\u002Fopen sources, ranked semantically, edited intentionally, and rendered as a proper timeline.\n\nEdit your own talking-head footage. Generate a fully animated explainer from scratch. Cut a 2-hour podcast into a dozen social clips. Translate and dub your content into 10 languages. Build a cinematic brand teaser from stock footage and AI-generated scenes. **If a production team can make it, OpenMontage can orchestrate it.**\n\n- **12 production pipelines** — explainers, talking heads, screen demos, cinematic trailers, animations, podcasts, localization, documentary montages, and more\n- **52 production tools** — spanning video generation, image creation, text-to-speech, music, audio mixing, subtitles, enhancement, and analysis\n- **400+ agent skills** — production skills, pipeline directors, creative techniques, quality checklists, and deep technology knowledge packs that teach the agent how to use every tool like an expert\n- **Reference-driven creation** — paste a video you like and the agent turns it into a grounded, differentiated production plan instead of forcing you to invent the perfect prompt from scratch\n- **Real-footage documentary creation without paid video models** — build actual edited videos from free\u002Fopen motion footage and archival sources, not just Ken Burns over images\n- **Live web research built in** — before writing a single word of script, the agent runs 15-25+ web searches across YouTube, Reddit, news sites, and academic sources to ground your video in real, current data\n- **Both free\u002Flocal AND cloud providers** — every capability supports open-source local alternatives alongside premium APIs. Use what you have.\n- **No vendor lock-in** — swap providers freely. The scored selector ranks every provider across 7 dimensions (task fit, output quality, control, reliability, cost efficiency, latency, continuity) and picks the best match automatically.\n- **Production-grade quality gates** — delivery promise enforcement blocks slideshow-looking renders, pre-compose validation catches broken plans before wasting GPU time, and mandatory post-render self-review (ffprobe + frame extraction + audio analysis) ensures the agent never presents garbage. Every provider choice, style decision, and fallback gets logged in an auditable decision trail.\n- **Budget governance built in** — cost estimation before execution, spend caps, per-action approval thresholds. No surprise bills.\n\n---\n\n## How It Works\n\nOpenMontage uses an **agent-first architecture**. There is no code orchestrator. Your AI coding assistant IS the orchestrator.\n\n```\nYou: \"Make an explainer video about how black holes form\"\n |\n v\nAgent reads pipeline manifest (YAML) -- stages, tools, review criteria, success gates\n |\n v\nAgent reads stage director skill (Markdown) -- HOW to execute each stage\n |\n v\nAgent calls Python tools -- scored provider selection ranks every tool across 7 dimensions\n |\n v\nAgent self-reviews using reviewer skill -- schema validation, playbook compliance, quality checks\n |\n v\nAgent checkpoints state (JSON) -- resumable, with decision log and cost snapshot\n |\n v\nAgent presents for your approval -- you stay in control at every creative decision\n |\n v\nPre-compose validation gate -- delivery promise, slideshow risk, renderer governance\n |\n v\nRender (Remotion or FFmpeg) -- composition engine matched to visual grammar\n |\n v\nPost-render self-review -- ffprobe, frame extraction, audio analysis, promise verification\n |\n v\nFinal video output -- only if self-review passes\n```\n\n**Python provides tools and persistence.** All creative decisions, orchestration logic, review criteria, and quality standards live in readable instruction files (YAML manifests + Markdown skills) that you can inspect and customize. Every decision is logged with alternatives considered, confidence scores, and the reasoning behind each choice.\n\n---\n\n## Architecture\n\n```\nOpenMontage\u002F\n├── tools\u002F              # 48 Python tools (the agent's hands)\n│   ├── video\u002F          # 13 video gen tools + compose, stitch, trim\n│   ├── audio\u002F          # 4 TTS providers + Suno\u002FElevenLabs music, mixing, enhancement\n│   ├── graphics\u002F       # 9 image\u002Fgraphics generation tools + diagrams, code snippets, math\n│   ├── enhancement\u002F    # Upscale, bg remove, face enhance, color grade\n│   ├── analysis\u002F       # Transcription, scene detect, frame sampling\n│   ├── avatar\u002F         # Talking head, lip sync\n│   └── subtitle\u002F       # SRT\u002FVTT generation\n│\n├── pipeline_defs\u002F      # YAML pipeline manifests (the agent's playbook)\n├── skills\u002F             # Markdown skill files (the agent's knowledge)\n│   ├── pipelines\u002F      # Per-pipeline stage director skills\n│   ├── creative\u002F       # Creative technique skills\n│   ├── core\u002F           # Core tool skills\n│   └── meta\u002F           # Reviewer, checkpoint protocol\n│\n├── schemas\u002F            # 15 JSON Schemas (contract validation)\n├── styles\u002F             # Visual style playbooks (YAML)\n├── remotion-composer\u002F  # React\u002FRemotion video composition engine\n├── lib\u002F                # Core infrastructure (config, checkpoints, pipeline loader)\n└── tests\u002F              # Contract tests, QA integration tests, eval harness\n```\n\n### Three-Layer Knowledge Architecture\n\n```\nLayer 1: tools\u002F + pipeline_defs\u002F     \"What exists\" — executable capabilities + orchestration\nLayer 2: skills\u002F                     \"How to use it\" — OpenMontage conventions and quality bars\nLayer 3: .agents\u002Fskills\u002F             \"How it works\" — external technology knowledge packs\n```\n\nEach tool declares which Layer 3 skills it relies on. The agent reads Layer 1 to know what's available, Layer 2 to know how OpenMontage wants it used, and Layer 3 for deep technical knowledge when needed.\n\n---\n\n## Supported Providers\n\n> **Full setup guide with pricing and free tiers:** [`docs\u002FPROVIDERS.md`](docs\u002FPROVIDERS.md)\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Video Generation — 14 providers\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n| Provider | Type | Notes |\n|----------|------|-------|\n| **Kling** | Cloud API | High quality, fast |\n| **Runway Gen-4** | Cloud API | Cinematic quality, Gen-3 Alpha Turbo \u002F Gen-4 Turbo \u002F Gen-4 Aleph |\n| **Google Veo 3** | Cloud API | Long-form, cinematic. Via fal.ai or HeyGen. |\n| **Grok Imagine Video** | Cloud API | Strong reference-image video and xAI-native short-form generation |\n| **Higgsfield** | Cloud API | Multi-model orchestrator with Soul ID for character consistency |\n| **MiniMax** | Cloud API | Cost-effective |\n| **HeyGen** | Cloud API | Multi-model gateway |\n| **WAN 2.1** | Local GPU | Free, 1.3B and 14B variants |\n| **Hunyuan** | Local GPU | Free, high quality |\n| **CogVideo** | Local GPU | Free, 2B and 5B variants |\n| **LTX-Video** | Local GPU \u002F Modal | Free locally, or self-hosted cloud |\n| **Pexels** | Stock | Free stock footage |\n| **Pixabay** | Stock | Free stock footage |\n| **Wikimedia Commons** | Stock | Free\u002Fopen stock footage and archival video |\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Image Generation — 10 tools\u002Fproviders\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n| Provider | Type | Notes |\n|----------|------|-------|\n| **FLUX** | Cloud API | State-of-the-art quality |\n| **Google Imagen** | Cloud API | Imagen 4 — high-quality, multiple aspect ratios |\n| **Grok Imagine Image** | Cloud API | Strong image edits, style transfer, and multi-image compositing |\n| **DALL-E 3** | Cloud API | OpenAI's image model |\n| **Recraft** | Cloud API | Design-focused generation |\n| **Local Diffusion** | Local GPU | Stable Diffusion, free |\n| **Pexels** | Stock | Free stock images |\n| **Pixabay** | Stock | Free stock images |\n| **Unsplash** | Stock | Free stock images |\n| **ManimCE** | Local | Mathematical animations |\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Text-to-Speech — 4 providers\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n| Provider | Type | Notes |\n|----------|------|-------|\n| **ElevenLabs** | Cloud API | Premium voice quality |\n| **Google TTS** | Cloud API | 700+ voices, 50+ languages — best for localization |\n| **OpenAI TTS** | Cloud API | Fast, affordable |\n| **Piper** | Local | Completely free, offline |\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Music, Sound & Post-Production\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n**Music & Sound:**\n\n| Provider | Type | Notes |\n|----------|------|-------|\n| **Suno AI** | Cloud API | Full song generation with vocals, lyrics, any genre. Up to 8 minutes. |\n| **ElevenLabs Music** | Cloud API | AI music generation |\n| **ElevenLabs SFX** | Cloud API | Sound effect generation |\n\n**Post-Production (always available, always free):**\n\n| Tool | What It Does |\n|------|-------------|\n| **FFmpeg** | Video composition, encoding, subtitle burn-in, audio muxing |\n| **Video Stitch** | Multi-clip assembly, crossfades, picture-in-picture, spatial layouts |\n| **Video Trimmer** | Precision cutting and extraction |\n| **Audio Mixer** | Multi-track mixing, ducking, fades |\n| **Audio Enhance** | Noise reduction, normalization |\n| **Color Grade** | LUT-based color grading |\n| **Subtitle Gen** | SRT\u002FVTT generation from timestamps |\n\n**Enhancement:**\n\n| Tool | What It Does |\n|------|-------------|\n| **Upscale** | Real-ESRGAN image\u002Fvideo upscaling |\n| **Background Remove** | rembg \u002F U2Net background removal |\n| **Face Enhance** | Face quality enhancement |\n| **Face Restore** | CodeFormer \u002F GFPGAN face restoration |\n\n**Analysis:**\n\n| Tool | What It Does |\n|------|-------------|\n| **Transcriber** | WhisperX speech-to-text with word-level timestamps |\n| **Scene Detect** | Automatic scene boundary detection |\n| **Frame Sampler** | Intelligent frame extraction |\n| **Video Understand** | CLIP\u002FBLIP-2 vision-language analysis |\n\n**Avatar & Lip Sync:**\n\n| Tool | What It Does |\n|------|-------------|\n| **Talking Head** | SadTalker \u002F MuseTalk avatar animation |\n| **Lip Sync** | Wav2Lip audio-driven lip synchronization |\n\n**Composition & Rendering:**\n\n| Engine | Type | What It Does |\n|--------|------|-------------|\n| **Remotion** | Local (Node.js) | React-based programmatic video — spring-animated image scenes, stat reveals, section titles, hero cards, TikTok-style word-by-word captions, scene transitions (fade\u002Fslide\u002Fwipe\u002Fflip), Google Fonts, audio with fade curves, and the TalkingHead avatar composition. **When no video generation providers are configured, the agent generates still images and Remotion turns them into fully animated video.** |\n| **HyperFrames** | Local (Node.js ≥ 22) | HTML\u002FCSS\u002FGSAP programmatic video — kinetic typography, product promos, launch reels, custom motion graphics, registry blocks (data charts, grain overlays, shader transitions), website-to-video workflows, and rigged SVG character animation. Consumed via `npx hyperframes`; no monorepo checkout needed. |\n| **FFmpeg** | Local | Core video assembly, encoding, subtitle burn, audio muxing, color grading |\n\nRuntime is chosen at proposal (`render_runtime`) and locked through `edit_decisions`. Silent swaps between runtimes are a governance violation — see `skills\u002Fcore\u002Fhyperframes.md`.\n\n\u003C\u002Fdetails>\n\n---\n\n## Style System\n\nStyle playbooks define the visual language for your productions:\n\n| Playbook | Best For |\n|----------|----------|\n| **Clean Professional** | Corporate, educational, SaaS |\n| **Flat Motion Graphics** | Social media, TikTok, startups |\n| **Minimalist Diagram** | Technical deep-dives, architecture |\n\nPlaybooks control typography, color palettes, motion styles, audio profiles, and quality rules. The agent reads the playbook and applies it consistently across all generated assets.\n\n---\n\n## Platform Output Profiles\n\nBuilt-in render profiles for every major platform:\n\n| Profile | Resolution | Aspect Ratio |\n|---------|-----------|--------------|\n| YouTube Landscape | 1920x1080 | 16:9 |\n| YouTube 4K | 3840x2160 | 16:9 |\n| YouTube Shorts | 1080x1920 | 9:16 |\n| Instagram Reels | 1080x1920 | 9:16 |\n| Instagram Feed | 1080x1080 | 1:1 |\n| TikTok | 1080x1920 | 9:16 |\n| LinkedIn | 1920x1080 | 16:9 |\n| Cinematic | 2560x1080 | 21:9 |\n\n---\n\n## Production Governance\n\nOpenMontage treats video production like real engineering — with quality gates, audit trails, and enforcement at every stage.\n\n### Quality Gates\n\n- **Pre-compose validation** — blocks render if the delivery promise is violated (e.g. \"motion-led\" video with 80% still images), slideshow risk score is critical, or renderer family is missing. Catches broken plans before wasting GPU time.\n- **Post-render self-review** — after every render, the runtime runs ffprobe validation, extracts frames at 4 positions to check for black frames and broken overlays, analyzes audio levels for silence and clipping, verifies the delivery promise was honored, and checks subtitle presence. If the review fails, the video is not presented.\n- **Slideshow risk scoring** — 6-dimension analysis (repetition, decorative visuals, weak motion, shot intent, typography overreliance, unsupported cinematic claims) prevents \"animated PowerPoint\" outputs.\n- **Source media inspection** — when users supply their own footage, the system probes every file (resolution, codec, audio channels, duration) and builds planning implications before a single creative decision is made. No hallucinating content from filenames.\n\n### Scored Provider Selection\n\nEvery tool selection (video generation, image generation, TTS, music) runs through a 7-dimension scoring engine: task fit (30%), output quality (20%), control features (15%), reliability (15%), cost efficiency (10%), latency (5%), continuity (5%). The winning provider and its score are logged in the decision trail with all alternatives considered.\n\nSelectors normalize loose brief context before scoring. If the agent only knows something like \"Pixar-style animated short with character consistency,\" the selector expands that into scorer-friendly intent and style signals instead of requiring a perfectly pre-shaped `task_context`.\n\nSelector outputs also surface the chosen provider's `agent_skills`, so the agent can immediately read the right Layer 3 provider skill before writing prompts.\n\n### Decision Audit Trail\n\nEvery major creative and technical choice — provider selection, style\u002Fplaybook choice, music track, voice selection, renderer family, any fallback or downgrade — is logged with alternatives considered, confidence scores, and reasoning. The cumulative decision log persists across all stages so you can trace exactly why the output looks the way it does.\n\n### Budget Controls\n\n- **Estimate** before execution — see what it will cost\n- **Reserve** budget — lock funds before the call\n- **Reconcile** after — record actual spend\n- **Configurable modes** — `observe` (track only), `warn` (log overruns), `cap` (hard limit)\n- **Per-action approval** — pause for confirmation above a threshold (default: $0.50)\n- **Total budget cap** — default $10, fully configurable\n\nNo surprise bills. The agent tells you what it will cost before it spends.\n\n---\n\n## Agent Compatibility\n\nOpenMontage works with any AI coding assistant that can read files and execute Python. Dedicated instruction files are included for:\n\n| Platform | Config File |\n|----------|------------|\n| **Claude Code** | `CLAUDE.md` |\n| **Cursor** | `CURSOR.md` + `.cursor\u002Frules\u002F` |\n| **GitHub Copilot** | `COPILOT.md` + `.github\u002Fcopilot-instructions.md` |\n| **Codex** | `CODEX.md` |\n| **Windsurf** | `.windsurfrules` |\n\nAll platform files point to the shared `AGENT_GUIDE.md` (operating guide and agent contract) and `PROJECT_CONTEXT.md` (architecture reference).\n\n> **Coming soon:** Local LLM support via **Ollama** and **LM Studio** — run the full production pipeline without any cloud LLM.\n\n---\n\n## Contributing\n\nOpenMontage is built to be extended. The two most common contributions:\n\n### Adding a New Tool\n\n1. Create a Python file in the appropriate `tools\u002F` subdirectory\n2. Inherit from `BaseTool` and implement the tool contract\n3. The registry auto-discovers it — no manual registration needed\n4. Add a skill file if the tool needs usage guidance\n\n### Adding a New Pipeline\n\n1. Create a YAML manifest in `pipeline_defs\u002F`\n2. Create stage director skills in `skills\u002Fpipelines\u002F\u003Cyour-pipeline>\u002F`\n3. Reference existing tools — or add new ones if needed\n\nSee `docs\u002FARCHITECTURE.md` for the full technical reference, `docs\u002FPROVIDERS.md` for the complete provider guide (setup, pricing, free tiers), and `AGENT_GUIDE.md` for the agent contract.\n\n### Join the Community\n\nWe use [GitHub Discussions](https:\u002F\u002Fgithub.com\u002Fcalesthio\u002FOpenMontage\u002Fdiscussions) to share work and ideas:\n\n- **[Show and Tell](https:\u002F\u002Fgithub.com\u002Fcalesthio\u002FOpenMontage\u002Fdiscussions\u002Fcategories\u002Fshow-and-tell)** — Share videos you've made, prompts that worked well, or creative workflows you've discovered\n- **[Ideas](https:\u002F\u002Fgithub.com\u002Fcalesthio\u002FOpenMontage\u002Fdiscussions\u002Fcategories\u002Fideas)** — Suggest new pipelines, tools, style playbooks, or integrations\n- **[Q&A](https:\u002F\u002Fgithub.com\u002Fcalesthio\u002FOpenMontage\u002Fdiscussions\u002Fcategories\u002Fq-a)** — Ask questions about setup, pipelines, or troubleshooting\n\nMade something cool? Post it in Show and Tell — we'd love to see what you build.\n\n---\n\n## Contact\n\nFor updates, releases, and behind-the-scenes build notes, follow [@calesthioailabs](https:\u002F\u002Fx.com\u002Fcalesthioailabs).\n\nFor bugs, feature requests, and workflow discussions, use [GitHub Issues](https:\u002F\u002Fgithub.com\u002Fcalesthio\u002FOpenMontage\u002Fissues) and [GitHub Discussions](https:\u002F\u002Fgithub.com\u002Fcalesthio\u002FOpenMontage\u002Fdiscussions) so everything stays visible and actionable.\n\n---\n\n## Testing\n\n```bash\n# Run contract tests (no API keys needed)\nmake test-contracts\n\n# Run all tests\nmake test\n```\n\n---\n\n## License\n\n[GNU AGPLv3](LICENSE)\n\n---\n\n**OpenMontage** — Production-grade video with real quality enforcement, orchestrated by your AI assistant.\n\nIf this project looks useful to you, a star would really mean a lot — it helps others discover it too.\n","OpenMontage 是首个开源的代理式视频制作系统，能够将你的AI编码助手转变为完整的视频工作室。该项目拥有12个处理流程、52种工具和超过500项代理技能，支持从研究、脚本编写到素材生成、编辑及最终合成的全流程自动化视频生产。基于Python语言开发，并集成了如ElevenLabs的文字转语音、Stable Diffusion的图像生成等先进技术。适用于需要快速创建高质量视频内容的各种场景，无论是商业广告、教育材料还是创意短片，都能以低成本实现专业级效果。",2,"2026-06-11 03:48:39","high_star"]