[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72095":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":23,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":28,"discoverSource":29},72095,"vision-agent","landing-ai\u002Fvision-agent","landing-ai","This tool has been deprecated. Use Agentic Document Extraction instead.","",null,"Python",5294,598,60,8,0,2,17,67.03,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:03","\u003Cdiv align=\"center\">\n    \u003Cpicture>\n        \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fgithub.com\u002Flanding-ai\u002Fvision-agent\u002Fblob\u002Fmain\u002Fassets\u002Flogo_light.svg?raw=true\">\n        \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fgithub.com\u002Flanding-ai\u002Fvision-agent\u002Fblob\u002Fmain\u002Fassets\u002Flogo_dark.svg?raw=true\">\n        \u003Cimg alt=\"VisionAgent\" height=\"200px\" src=\"https:\u002F\u002Fgithub.com\u002Flanding-ai\u002Fvision-agent\u002Fblob\u002Fmain\u002Fassets\u002Flogo_light.svg?raw=true\">\n    \u003C\u002Fpicture> \n    \n_Prompt with an image\u002Fvideo → Get runnable vision code → Build Visual AI App in minutes_\n\n\n[![](https:\u002F\u002Fdcbadge.vercel.app\u002Fapi\u002Fserver\u002FwPdN8RCYew?compact=true&style=flat)](https:\u002F\u002Fdiscord.gg\u002FwPdN8RCYew)\n![ci_status](https:\u002F\u002Fgithub.com\u002Flanding-ai\u002Fvision-agent\u002Factions\u002Fworkflows\u002Fci_cd.yml\u002Fbadge.svg)\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fvision-agent.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fvision-agent)\n![version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fvision-agent)\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fdiscord.com\u002Finvite\u002FRVcW3j9RgR\" target=\"_blank\">\u003Cstrong>Discord\u003C\u002Fstrong>\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Flanding.ai\u002Fblog\u002Fvisionagent-an-agentic-approach-for-complex-visual-reasoning\" target=\"_blank\">\u003Cstrong>Architecture\u003C\u002Fstrong>\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLrKGAzovU85fvo22OnVtPl90mxBygIf79\" target=\"_blank\">\u003Cstrong>YouTube\u003C\u002Fstrong>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cbr \u002F>\n\n**VisionAgent** is the Visual AI pilot from LandingAI. Give it a prompt and an image, and it automatically picks the right vision models and outputs ready‑to‑run code—letting you build vision‑enabled apps in minutes. You can play around with VisionAgent using our local webapp in `examples\u002Fchat` and following the directions in the `README.md`:\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F752632b3-dda5-44f1-b27e-5cb4c97757ac>\n\n\n## Steps to Set Up the Library  \n\n### Get Your VisionAgent API Key\nThe most important step is to [create an account](https:\u002F\u002Fva.landing.ai\u002Fhome) and obtain your [API key](https:\u002F\u002Fva.landing.ai\u002Fsettings\u002Fapi-key).\n\n### Other Prerequisites\n- Python version 3.9 or higher\n- [Anthropic API key](#get-an-anthropic-api-key)\n- [Google API key](#get-a-google-api-key)\n\n### Why do I need Anthropic and Google API Keys?\nVisionAgent uses models from Anthropic and Google to respond to prompts and generate code. \n\n\nWhen you run VisionAgent, the app will need to use your API keys to access the Anthropic and Google models. This ensures that any projects you run with VisionAgent aren’t limited by the rate limits in place with the LandingAI accounts, and it also prevents many users from overloading the LandingAI rate limits.\n\nAnthropic and Google each have their own rate limits and paid tiers. Refer to their documentation and pricing to learn more.\n\n> **_NOTE:_** In VisionAgent v1.0.2 and earlier, VisionAgent was powered by Anthropic Claude-3.5 and OpenAI o1. If using one of these VisionAgent versions, you get an OpenAI API key and set it as an environment variable.\n\n\n### Get an Anthropic API Key\n1. If you don’t have one yet, create an [Anthropic Console account](https:\u002F\u002Fconsole.anthropic.com\u002F).\n2. In the Anthropic Console, go to the [API Keys](https:\u002F\u002Fconsole.anthropic.com\u002Fsettings\u002Fkeys) page.\n3. Generate an API key.\n\n### Get a Google API Key\n1. If you don’t have one yet, create a [Google AI Studio account](https:\u002F\u002Faistudio.google.com\u002F).\n2. In Google AI Studio, go to the [Get API Key](https:\u002F\u002Faistudio.google.com\u002Fapp\u002Fapikey) page.\n3. Generate an API key.\n\n\n## Installation\n\nInstall with uv:\n```bash\nuv add vision-agent\n```\n\nInstall with pip:\n\n```bash\npip install vision-agent\n```\n\n## Quickstart: Prompt VisionAgent\nFollow this quickstart to learn how to prompt VisionAgent. After learning the basics, customize your prompt and workflow to meet your needs.\n\n1. Get your Anthropic, Google, and VisionAgent API keys.\n2. [Set the Anthropic, Google, and VisionAgent API keys as environment variables](#set-api-keys-as-environment-variables).\n3. [Install VisionAgent](#installation).\n4. Create a folder called `quickstart`. \n5. Find an image you want to analyze and save it to the `quickstart` folder.\n6. Copy the [Sample Script](#sample-script-prompt-visionagent) to a file called `source.py`. Save the file to the `quickstart` folder.  \n7. Run `source.py`. \n8. VisionAgent creates a file called `generated_code.py` and saves the generated code there.  \n\n### Set API Keys as Environment Variables\nBefore running VisionAgent code, you must set the Anthropic, Google, and VisionAgent API keys as environment variables. Each operating system offers different ways to do this.\n\nHere is the code for setting the variables:\n```bash\nexport VISION_AGENT_API_KEY=\"your-api-key\"\nexport ANTHROPIC_API_KEY=\"your-api-key\"\nexport GOOGLE_API_KEY=\"your-api-key\" \n```\n### Sample Script: Prompt VisionAgent\nTo use VisionAgent to generate code, use the following script as a starting point:\n\n```python\n# Import the classes you need from the VisionAgent package\nfrom vision_agent.agent import VisionAgentCoderV2\nfrom vision_agent.models import AgentMessage\n\n# Enable verbose output \nagent = VisionAgentCoderV2(verbose=True)\n\n# Add your prompt (content) and image file (media)\ncode_context = agent.generate_code(\n    [\n        AgentMessage(\n            role=\"user\",\n            content=\"Describe the image\",\n            media=[\"friends.jpg\"]\n        )\n    ]\n)\n\n# Write the output to a file\nwith open(\"generated_code.py\", \"w\") as f:\n    f.write(code_context.code + \"\\n\" + code_context.test)\n```\n### What to Expect When You Prompt VisionAgent\nWhen you submit a prompt, VisionAgent performs the following tasks.\n\n1. Generates a plan for the code generation task. If verbose output is on, the numbered steps for this plan display.\n2. Generates code and a test case based on the plan. \n3. Tests the generated code with the test case. If the test case fails, VisionAgent iterates on the code generation process until the test case passes.\n\n## Example: Count Cans in an Image\nCheck out how to use VisionAgent in this Jupyter Notebook to learn how to count the number of cans in an image:\n\n[Count Cans in an Image](https:\u002F\u002Fgithub.com\u002Flanding-ai\u002Fvision-agent\u002Fblob\u002Fmain\u002Fexamples\u002Fnotebooks\u002Fcounting_cans.ipynb)\n\n## Use Specific Tools from VisionAgent\nThe VisionAgent library includes a set of [tools](vision_agent\u002Ftools), which are standalone models or functions that complete specific tasks. When you prompt VisionAgent, VisionAgent selects one or more of these tools to complete the tasks outlined in your prompt.\n\nFor example, if you prompt VisionAgent to “count the number of dogs in an image”, VisionAgent might use the `florence2_object_detection` tool to detect all the dogs, and then the `countgd_object_detection` tool to count the number of detected dogs.\n\nAfter installing the VisionAgent library, you can also use the tools in your own scripts. For example, if you’re writing a script to track objects in videos, you can call the `owlv2_sam2_video_tracking` function. In other words, you can use the VisionAgent tools outside of simply prompting VisionAgent. \n\nThe tools are in the [vision_agent.tools](vision_agent\u002Ftools) API.\n\n### Sample Script: Use Specific Tools for Images\nYou can call the `countgd_object_detection` function to count the number of objects in an image. \n\nTo do this, you could run this script:\n```python\n# Import the VisionAgent Tools library; import Matplotlib to visualize the results\nimport vision_agent.tools as T\nimport matplotlib.pyplot as plt\n\n# Load the image\nimage = T.load_image(\"people.png\")\n\n# Call the function to count objects in an image, and specify that you want to count people\ndets = T.countgd_object_detection(\"person\", image)\n\n# Visualize the countgd bounding boxes on the image\nviz = T.overlay_bounding_boxes(image, dets)\n\n# Save the visualization to a file\nT.save_image(viz, \"people_detected.png\")\n\n# Display the visualization\nplt.imshow(viz)\nplt.show()\n\n```\n### Sample Script: Use Specific Tools for Videos\nYou can call the `countgd_sam2_video_tracking` function to track people in a video and pair it with the `extract_frames_and_timestamps` function to return the frames and timestamps in which those people appear.\n\nTo do this, you could run this script:\n```python\n# Import the VisionAgent Tools library\nimport vision_agent.tools as T\n\n# Call the function to get the frames and timestamps\nframes_and_ts = T.extract_frames_and_timestamps(\"people.mp4\")\n\n# Extract the frames from the frames_and_ts list\nframes = [f[\"frame\"] for f in frames_and_ts]\n\n# Call the function to track objects, and specify that you want to track people\ntracks = T.countgd_sam2_video_tracking(\"person\", frames)\n\n# Visualize the countgd tracking results on the frames and save the video\nviz = T.overlay_segmentation_masks(frames, tracks)\nT.save_video(viz, \"people_detected.mp4\")\n```\n\n\n## Use Other LLM Providers\nVisionAgent uses [Anthropic Claude 3.7 Sonnet](https:\u002F\u002Fwww.anthropic.com\u002Fclaude\u002Fsonnet) and [Gemini Flash 2.0 Experimental](https:\u002F\u002Fai.google.dev\u002Fgemini-api\u002Fdocs\u002Fmodels\u002Fexperimental-models) (`gemini-2.0-flash-exp`) to respond to prompts and generate code. We’ve found that these provide the best performance for VisionAgent and are available on the free tiers (with rate limits) from their providers.\n\nIf you prefer to use only one of these models or a different set of models, you can change the selected LLM provider in this file: `vision_agent\u002Fconfigs\u002Fconfig.py`. You must also add the provider’s API Key as an [environment variable](#set-api-keys-as-environment-variables).\n\nFor example, if you want to use **only** the Anthropic model, run this command:\n```bash\ncp vision_agent\u002Fconfigs\u002Fanthropic_config.py vision_agent\u002Fconfigs\u002Fconfig.py\n```\n\nOr, you can manually enter the model details in the `config.py` file. For example, if you want to change the planner model from Anthropic to OpenAI, you would replace this code:\n```python\n    planner: Type[LMM] = Field(default=AnthropicLMM)\n    planner_kwargs: dict = Field(\n        default_factory=lambda: {\n            \"model_name\": \"claude-3-7-sonnet-20250219\",\n            \"temperature\": 0.0,\n            \"image_size\": 768,\n        }\n    )\n```\n\nwith this code:\n\n```python\n    planner: Type[LMM] = Field(default=OpenAILMM)\n    planner_kwargs: dict = Field(\n        default_factory=lambda: {\n            \"model_name\": \"gpt-4o-2024-11-20\",\n            \"temperature\": 0.0,\n            \"image_size\": 768,\n            \"image_detail\": \"low\",\n        }\n    )\n```\n\n## Resources\n- [Discord](https:\u002F\u002Fdiscord.com\u002Finvite\u002FRVcW3j9RgR): Check out our community of VisionAgent users to share use cases and learn about updates.\n- [VisionAgent Library Docs](https:\u002F\u002Flanding-ai.github.io\u002Fvision-agent\u002F): Learn how to use this library.\n- [Video Tutorials](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLrKGAzovU85fvo22OnVtPl90mxBygIf79): Watch the latest video tutorials to see how VisionAgent is used in a variety of use cases.\n","VisionAgent 是一个由 LandingAI 开发的视觉 AI 代理工具，通过给定提示和图像自动生成可运行的视觉代码，从而帮助用户在几分钟内构建视觉应用。其核心功能包括自动选择合适的视觉模型并输出即用型代码，极大简化了开发流程。技术上，VisionAgent 利用了 Anthropic 和 Google 的模型来处理提示和生成代码，并且需要相应的 API 密钥以访问这些服务。尽管此项目已被弃用，推荐使用 Agentic Document Extraction 替代方案，但 VisionAgent 仍适合快速原型设计和小型视觉识别项目的初期开发阶段。","2026-06-11 03:40:21","high_star"]