[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75752":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":9,"totalLinesOfCode":9,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":9,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},75752,"video-search-and-summarization","NVIDIA-AI-Blueprints\u002Fvideo-search-and-summarization","NVIDIA-AI-Blueprints","The NVIDIA VSS Blueprint is a suite of reference architectures for building GPU-accelerated vision agents and AI-powered video analytics applications.",null,"https:\u002F\u002Fgithub.com\u002FNVIDIA-AI-Blueprints\u002Fvideo-search-and-summarization","Python",1544,324,20,15,0,23,61,209,69,20.54,false,"main",[25,26,27,28,29,30,31],"llm","rag","vlm","agents","skills","video-analytics","video-search","2026-06-12 02:03:35","\u003Ch2>NVIDIA AI Blueprint: Video Search and Summarization (VSS)\u003C\u002Fh2>\n\n### Table of Contents\n- [Overview](#overview)\n- [Use Case \u002F Problem Description](#use-case--problem-description)\n- [Agent Workflows](#agent-workflows)\n- [Software Components](#software-components)\n- [Target Audience](#target-audience)\n- [Repository Structure Overview](#repository-structure-overview)\n- [Documentation](#documentation)\n- [Prerequisites](#prerequisites)\n- [Hardware Requirements](#hardware-requirements)\n- [Quickstart Guide](#quickstart-guide)\n- [License](#license)\n\n## Overview\n\nThe [NVIDIA Blueprint for Video Search and Summarization (VSS)](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002Flatest\u002Findex.html) provides a suite of reference architectures for building vision agents and AI-powered video analytics applications. Those architectures bring together accelerated vision microservices, vision language models (VLMs), and large language models (LLMs) so you can use them in existing applications, as standalone microservices, or as part of a larger vision agent.\n\nVSS is organized into three areas of processing and analysis: **real-time video intelligence** (feature extraction, embeddings, and stream understanding with results published to a message broker), **downstream analytics** (enrichment of metadata into trajectories, incidents, and verified alerts), and **agentic and offline processing** (orchestrated tools for search, Q&A, summarization, and clip retrieval, including via the Model Context Protocol).\n\nThis repository implements the blueprint and powers the [NVIDIA build experience](https:\u002F\u002Fbuild.nvidia.com\u002Fnvidia\u002Fvideo-search-and-summarization) for natural-language video agents—search, summarization, visual Q&A, and related workflows—backed by generative AI, VLMs and LLMs, and [NVIDIA NIM](https:\u002F\u002Fbuild.nvidia.com\u002F) microservices as configured in the stacks below.\n\n## Use Case \u002F Problem Description\n\nThe NVIDIA AI Blueprint for Video Search and Summarization addresses the challenge of deploying visual agents capable of interacting with large volumes of video data, both stored and streamed. This can be used to create vision AI agents, that can be applied to a multitude of use cases such as monitoring smart spaces, warehouse automation, and SOP validation. This is important where quick and accurate video analysis can lead to better decision-making and enhanced operational efficiency.\n\n## Agent Workflows\nWe provide multiple reference [Agent Workflows](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Fadding-workflows.html) which demonstrate how the individual components can be leveraged by an agent:\n\n| Workflow | Description |\n|----------|-------------|\n| [Q&A and Report Generation (Quickstart)](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Fquickstart.html) | Video retrieval, VLM-based Q&A, and report generation on short video clips |\n| [Alert Verification](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Fagent-workflow-alert-verification.html) | Realtime processing of videos using perception (object detection, tracking) and behavior analytics to generate alerts, which are subsequently verified with VLM to reduce false positives |\n| [Real-Time Alerts](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Fagent-workflow-rt-alert.html) | Continuous processing of video streams through VLM for anomaly detection |\n| [Video Search](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Fagent-workflow-search.html) | Natural language search across video archives using video embeddings (alpha) |\n| [Long Video Summarization](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Fagent-workflow-lvs.html) | Analysis and summarization of extended video recordings through chunking and aggregation of dense captions |\n\n## Software Components\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FNVIDIA-AI-Blueprints\u002Fvideo-search-and-summarization\u002Fraw\u002Fmain\u002Fassets\u002Fvss-architecture.png\" width=\"800\">\n\u003C\u002Fdiv>\n\n1. **NIM microservices**: Here are models used in this blueprint:\n\n    - [Cosmos-Reason2-8B](https:\u002F\u002Fbuild.nvidia.com\u002Fnvidia\u002Fcosmos-reason2-8b)\n    - [NVIDIA Nemotron-Nano-9B-v2](https:\u002F\u002Fbuild.nvidia.com\u002Fnvidia\u002Fnvidia-nemotron-nano-9b-v2)\n\n2. **Real-time video intelligence**: The Real-Time Video Intelligence layer extracts rich visual features, semantic embeddings, and contextual understanding from video data in real-time, publishing results to a message broker for downstream analytics and agentic workflows. It provides three core microservices for processing video streams.  \n\n3. **Downstream analytics**: The Downstream Analytics layer processes and enriches the metadata streams generated by real-time video intelligence microservices, transforming raw detections into actionable insights and verified alerts.\n\n4. **Agent and offline processing**: The top-level agent leverages the Model Context Protocol (MCP) to access video analytics data, incident records, and vision processing capabilities through a unified tool interface. It integrates multiple vision-based tools including video understanding with Vision Language Models (VLMs), semantic video search using embeddings, long video summarization for extended footage analysis, and video snapshot\u002Fclip retrieval. \n\n## Target Audience\nThis blueprint is designed for ease of setup with extensive configuration options, requiring technical expertise. It is intended for:\n\n1. **Video Analysts and IT Engineers:** Professionals focused on analyzing video data and ensuring efficient processing and summarization. The blueprint offers 1-click deployment steps, easy-to-manage configurations, and plug-and-play models, making it accessible for early developers.\n\n2. **GenAI Developers \u002F Machine Learning Engineers:** Experts who need to customize the blueprint for specific use cases. This includes modifying the pipelines for unique datasets and fine-tuning LLMs as needed. For advanced users, the blueprint provides detailed configuration options and custom deployment possibilities, enabling extensive customization and optimization.\n\n## Repository Structure Overview\n\n| Directory | Description |\n|-----------|-------------|\n| `agent\u002F` | Video search and summarization agent (Python). Contains `src\u002Fvss_agents\u002F` (tools, agents, APIs, embeddings, evaluators, video analytics), `tests\u002F`, `stubs\u002F`, `docker\u002F`, and `3rdparty\u002F`. See [agent\u002FREADME.md](agent\u002FREADME.md). |\n| `deployments\u002F` | Deployment configs and Docker Compose: NIM model configs (`nim\u002F`), developer workflows (`developer-workflow\u002F` — dev-profile-base, dev-profile-search, dev-profile-alerts, dev-profile-lvs), foundational services, LVS, RTVI, VLM-as-verifier, VST, and root `compose.yml`. |\n| `scripts\u002F` | Deployment and patch scripts, including the Brev launchable notebook (`deploy_vss_launchable.ipynb`) and dev-profile \u002F patch scripts. |\n| `skills\u002F` | [agentskills.io](https:\u002F\u002Fagentskills.io\u002Fspecification)-compatible agent skills for VSS: one self-contained subdirectory per skill with `SKILL.md` frontmatter. Covers deploy and usage of search, summarization, alerts, VIOS, RT-VLM, LVS, and other related workflows—see the catalog and install notes in [skills\u002FREADME.md](skills\u002FREADME.md). |\n| `ui\u002F` | Frontend monorepo (Next.js, Turbo): `apps\u002F` (nemo-agent-toolkit-ui, nv-metropolis-bp-vss-ui) and shared `packages\u002F`. See [ui\u002FREADME.md](ui\u002FREADME.md). |\n\n## Documentation\n\nFor detailed instructions and additional information about this blueprint, please refer to the [official documentation](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Findex.html).\n\n## Prerequisites\n\n### Obtain API Key\n\n- NVIDIA AI Enterprise developer licence required to local host NVIDIA NIM.\n- API catalog keys:\n   - NVIDIA [API catalog](https:\u002F\u002Fbuild.nvidia.com\u002F) or [NGC](https:\u002F\u002Forg.ngc.nvidia.com\u002Fsetup\u002Fapi-keys) ([steps to generate key](https:\u002F\u002Fdocs.nvidia.com\u002Fngc\u002Fgpu-cloud\u002Fngc-user-guide\u002Findex.html#generating-api-key))\n\n## Hardware Requirements\n\nThe platform requirement can vary depending on the configuration and deployment topology used for VSS and dependencies like VLM, LLM, etc. For a list of validated GPU topologies and what configuration to use, see the [GPU requirements](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Fprerequisites.html#development-profile-gpu-requirements).\n\n## Quickstart Guide\n\n### Launchable Deployment\n\n**Ideal for:** Quickly getting started with your own videos without worrying about hardware and software requirements.\n\nFollow the steps from the [documentation](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Fcloud-brev.html) and notebook in [scripts](scripts\u002F) directory to complete all pre-requisites and deploy the blueprint using Brev Launchable in a 2xRTX PRO 6000 SE AWS instance.\n- [scripts\u002Fdeploy_vss_launchable.ipynb](scripts\u002Fdeploy_vss_launchable.ipynb): This notebook is tailored specifically for the AWS CSP which uses Ephemeral storage.\n\n### Docker Compose Deployment\n\n**Ideal for:** Deploying a VSS agent on your own hardware or bare metal cloud instance.\n\n#### System Requirements\n\n- OS:\n    - x86 hosts: Ubuntu 22.04 or Ubuntu 24.04\n    - DGX-SPARK: DGX OS 7.4.0\n    - IGX-THOR: Jetson Linux BSP (Rel 38.5)\n    - AGX-THOR: Jetson Linux BSP (Rel 38.4)\n- NVIDIA Driver:\n    - 580.105.08 (x86 hosts with Ubuntu 24.04)\n    - 580.65.06 (x86 hosts with Ubuntu 22.04)\n    - 580.95.05 (DGX-SPARK)\n    - 580.00 (IGX-THOR and AGX-THOR)\n- NVIDIA Container Toolkit: 1.17.8+\n- Docker: 27.2.0+\n- Docker Compose: v2.29.0+\n- NGC CLI: 4.10.0+\n\nPlease refer to [Prerequisites section here for installation details](https:\u002F\u002Fdocs.nvidia.com\u002Fvss\u002F3.1.0\u002Fprerequisites.html).\n\n\n## License\nRefer to [LICENSE](LICENSE)\n","NVIDIA-AI-Blueprints\u002Fvideo-search-and-summarization 项目提供了一套参考架构，用于构建基于GPU加速的视觉代理和AI驱动的视频分析应用。该项目的核心功能包括实时视频智能处理、下游数据分析以及代理和离线处理，通过结合加速视觉微服务、视觉语言模型（VLMs）和大型语言模型（LLMs），支持视频搜索、问答、摘要生成等功能。技术上利用了NVIDIA的硬件加速优势，适合需要高效处理大量视频数据的场景，如智能空间监控、仓库自动化及标准操作程序验证等，能够显著提高决策速度与运营效率。",2,"2026-06-11 03:53:15","trending"]