[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71964":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},71964,"sam3","facebookresearch\u002Fsam3","facebookresearch","The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.","https:\u002F\u002Fai.meta.com\u002Fsam3\u002F",null,"Python",10498,1575,47,252,0,75,205,413,225,119.59,"Other",false,"main",[],"2026-06-12 04:01:03","# SAM 3: Segment Anything with Concepts\n\nMeta Superintelligence Labs\n\n[Nicolas Carion](https:\u002F\u002Fwww.nicolascarion.com\u002F)\\*,\n[Laura Gustafson](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=c8IpF9gAAAAJ&hl=en)\\*,\n[Yuan-Ting Hu](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=E8DVVYQAAAAJ&hl=en)\\*,\n[Shoubhik Debnath](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=fb6FOfsAAAAJ&hl=en)\\*,\n[Ronghang Hu](https:\u002F\u002Fronghanghu.com\u002F)\\*,\n[Didac Suris](https:\u002F\u002Fwww.didacsuris.com\u002F)\\*,\n[Chaitanya Ryali](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=4LWx24UAAAAJ&hl=en)\\*,\n[Kalyan Vasudev Alwala](https:\u002F\u002Fscholar.google.co.in\u002Fcitations?user=m34oaWEAAAAJ&hl=en)\\*,\n[Haitham Khedr](https:\u002F\u002Fhkhedr.com\u002F)\\*, Andrew Huang,\n[Jie Lei](https:\u002F\u002Fjayleicn.github.io\u002F),\n[Tengyu Ma](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=VeTSl0wAAAAJ&hl=en),\n[Baishan Guo](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=BC5wDu8AAAAJ&hl=en),\nArpit Kalla, [Markus Marks](https:\u002F\u002Fdamaggu.github.io\u002F),\n[Joseph Greer](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=guL96CkAAAAJ&hl=en),\nMeng Wang, [Peize Sun](https:\u002F\u002Fpeizesun.github.io\u002F),\n[Roman Rädle](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=Tpt57v0AAAAJ&hl=en),\n[Triantafyllos Afouras](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~afourast\u002F),\n[Effrosyni Mavroudi](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=vYRzGGEAAAAJ&hl=en),\n[Katherine Xu](https:\u002F\u002Fk8xu.github.io\u002F)°,\n[Tsung-Han Wu](https:\u002F\u002Fpatrickthwu.com\u002F)°,\n[Yu Zhou](https:\u002F\u002Fyu-bryan-zhou.github.io\u002F)°,\n[Liliane Momeni](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=Lb-KgVYAAAAJ&hl=en)°,\n[Rishi Hazra](https:\u002F\u002Frishihazra.github.io\u002F)°,\n[Shuangrui Ding](https:\u002F\u002Fmark12ding.github.io\u002F)°,\n[Sagar Vaze](https:\u002F\u002Fsgvaze.github.io\u002F)°,\n[Francois Porcher](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=LgHZ8hUAAAAJ&hl=en)°,\n[Feng Li](https:\u002F\u002Ffengli-ust.github.io\u002F)°,\n[Siyuan Li](https:\u002F\u002Fsiyuanliii.github.io\u002F)°,\n[Aishwarya Kamath](https:\u002F\u002Fashkamath.github.io\u002F)°,\n[Ho Kei Cheng](https:\u002F\u002Fhkchengrex.com\u002F)°,\n[Piotr Dollar](https:\u002F\u002Fpdollar.github.io\u002F)†,\n[Nikhila Ravi](https:\u002F\u002Fnikhilaravi.com\u002F)†,\n[Kate Saenko](https:\u002F\u002Fai.bu.edu\u002Fksaenko.html)†,\n[Pengchuan Zhang](https:\u002F\u002Fpzzhang.github.io\u002Fpzzhang\u002F)†,\n[Christoph Feichtenhofer](https:\u002F\u002Ffeichtenhofer.github.io\u002F)†\n\n\\* core contributor, ° intern, † project lead, order is random within groups\n\n[[`Paper`](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fsam-3-segment-anything-with-concepts\u002F)]\n[[`Project`](https:\u002F\u002Fai.meta.com\u002Fsam3)]\n[[`Demo`](https:\u002F\u002Fsegment-anything.com\u002F)]\n[[`Blog`](https:\u002F\u002Fai.meta.com\u002Fblog\u002Fsegment-anything-model-3\u002F)]\n[[`BibTeX`](#citing-sam-3)]\n\n![SAM 3 architecture](assets\u002Fmodel_diagram.png?raw=true) SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. Compared to its predecessor [SAM 2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsam2), SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short text phrase or exemplars. Unlike prior work, SAM 3 can handle a vastly larger set of open-vocabulary prompts. It achieves 75-80% of human performance on our new [SA-CO benchmark](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsam3?tab=readme-ov-file#sa-co-dataset) which contains 270K unique concepts, over 50 times more than existing benchmarks.\n\nThis breakthrough is driven by an innovative data engine that has automatically annotated over 4 million unique concepts, creating the largest high-quality open-vocabulary segmentation dataset to date. In addition, SAM 3 introduces a new model architecture featuring a presence token that improves discrimination between closely related text prompts (e.g., “a player in white” vs. “a player in red”), as well as a decoupled detector–tracker design that minimizes task interference and scales efficiently with data.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fdog.gif\" width=380 \u002F>\n  \u003Cimg src=\"assets\u002Fplayer.gif\" width=380 \u002F>\n\u003C\u002Fp>\n\n## Latest updates\n\n**03\u002F27\u002F2026 -- SAM 3.1 Object Multiplex is released. It introduces a shared-memory approach for joint multi-object tracking that is significantly faster without sacrificing accuracy.**\n\n- A new suite of improved model checkpoints (denoted as **SAM 3.1**) are released on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fsam3.1). See [`RELEASE_SAM3p1.md`](RELEASE_SAM3p1.md) for full details.\n  * To use the new SAM 3.1 checkpoints, you need the latest model code from this repo. If you have installed an earlier version of this repo, pull the latest code from this repo (with `git pull`), and then reinstall the repo following [Installation](#installation) below.\n\n## Installation\n\n### Prerequisites\n\n- Python 3.12 or higher\n- PyTorch 2.7 or higher\n- CUDA-compatible GPU with CUDA 12.6 or higher\n\n1. **Create a new Conda environment:**\n\n```bash\nconda create -n sam3 python=3.12\nconda deactivate\nconda activate sam3\n```\n\n2. **Install PyTorch with CUDA support:**\n\n```bash\npip install torch==2.10.0 torchvision --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n```\n\n3. **Clone the repository and install the package:**\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsam3.git\ncd sam3\npip install -e .\n```\n\n4. **Install additional dependencies for example notebooks or development:**\n\n```bash\n# For running example notebooks\npip install -e \".[notebooks]\"\n\n# For development\npip install -e \".[train,dev]\"\n```\n\n5. **Optional dependencies for faster inference**\n```bash\npip install einops ninja && pip install flash-attn-3 --no-deps --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\npip install git+https:\u002F\u002Fgithub.com\u002Fronghanghu\u002Fcc_torch.git\n```\n\n## Getting Started\n\n⚠️ Before using SAM 3, please request access to the checkpoints on the SAM 3\nHugging Face [repo](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fsam3). Once accepted, you\nneed to be authenticated to download the checkpoints. You can do this by running\nthe following [steps](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fen\u002Fquick-start#authentication)\n(e.g. `hf auth login` after generating an access token.)\n\n### Basic Usage\n\n```python\nimport torch\n#################################### For Image ####################################\nfrom PIL import Image\nfrom sam3.model_builder import build_sam3_image_model\nfrom sam3.model.sam3_image_processor import Sam3Processor\n# Load the model\nmodel = build_sam3_image_model()\nprocessor = Sam3Processor(model)\n# Load an image\nimage = Image.open(\"\u003CYOUR_IMAGE_PATH.jpg>\")\ninference_state = processor.set_image(image)\n# Prompt the model with text\noutput = processor.set_text_prompt(state=inference_state, prompt=\"\u003CYOUR_TEXT_PROMPT>\")\n\n# Get the masks, bounding boxes, and scores\nmasks, boxes, scores = output[\"masks\"], output[\"boxes\"], output[\"scores\"]\n\n#################################### For Video ####################################\n\nfrom sam3.model_builder import build_sam3_video_predictor\n\nvideo_predictor = build_sam3_video_predictor()\nvideo_path = \"\u003CYOUR_VIDEO_PATH>\" # a JPEG folder or an MP4 video file\n# Start a session\nresponse = video_predictor.handle_request(\n    request=dict(\n        type=\"start_session\",\n        resource_path=video_path,\n    )\n)\nresponse = video_predictor.handle_request(\n    request=dict(\n        type=\"add_prompt\",\n        session_id=response[\"session_id\"],\n        frame_index=0, # Arbitrary frame index\n        text=\"\u003CYOUR_TEXT_PROMPT>\",\n    )\n)\noutput = response[\"outputs\"]\n```\n\n## Examples\n\nThe `examples` directory contains notebooks demonstrating how to use SAM3 with\nvarious types of prompts:\n\n- [`sam3_image_predictor_example.ipynb`](examples\u002Fsam3_image_predictor_example.ipynb)\n  : Demonstrates how to prompt SAM 3 with text and visual box prompts on images.\n- [`sam3_video_predictor_example.ipynb`](examples\u002Fsam3_video_predictor_example.ipynb)\n  : Demonstrates how to prompt SAM 3 with text prompts on videos, and doing\n  further interactive refinements with points.\n- [`sam3_image_batched_inference.ipynb`](examples\u002Fsam3_image_batched_inference.ipynb)\n  : Demonstrates how to run batched inference with SAM 3 on images.\n- [`sam3_agent.ipynb`](examples\u002Fsam3_agent.ipynb): Demonsterates the use of SAM\n  3 Agent to segment complex text prompt on images.\n- [`saco_gold_silver_vis_example.ipynb`](examples\u002Fsaco_gold_silver_vis_example.ipynb)\n  : Shows a few examples from SA-Co image evaluation set.\n- [`saco_veval_vis_example.ipynb`](examples\u002Fsaco_veval_vis_example.ipynb) :\n  Shows a few examples from SA-Co video evaluation set.\n\nThere are additional notebooks in the examples directory that demonstrate how to\nuse SAM 3 for interactive instance segmentation in images and videos (SAM 1\u002F2\ntasks), or as a tool for an MLLM, and how to run evaluations on the SA-Co\ndataset.\n\nTo run the Jupyter notebook examples:\n\n```bash\n# Make sure you have the notebooks dependencies installed\npip install -e \".[notebooks]\"\n\n# Start Jupyter notebook\njupyter notebook examples\u002Fsam3_image_predictor_example.ipynb\n```\n\n## Model\n\nSAM 3 consists of a detector and a tracker that share a vision encoder. It has 848M parameters. The\ndetector is a DETR-based model conditioned on text, geometry, and image\nexemplars. The tracker inherits the SAM 2 transformer encoder-decoder\narchitecture, supporting video segmentation and interactive refinement.\n\n## Image Results\n\n\u003Cdiv align=\"center\">\n\u003Ctable style=\"min-width: 80%; border: 2px solid #ddd; border-collapse: collapse\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth rowspan=\"3\" style=\"border-right: 2px solid #ddd; padding: 12px 20px\">Model\u003C\u002Fth>\n      \u003Cth colspan=\"3\" style=\"text-align: center; border-right: 2px solid #ddd; padding: 12px 20px\">Instance Segmentation\u003C\u002Fth>\n      \u003Cth colspan=\"5\" style=\"text-align: center; padding: 12px 20px\">Box Detection\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth colspan=\"2\" style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">LVIS\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 2px solid #ddd; padding: 12px 20px\">SA-Co\u002FGold\u003C\u002Fth>\n      \u003Cth colspan=\"2\" style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">LVIS\u003C\u002Fth>\n      \u003Cth colspan=\"2\" style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">COCO\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">SA-Co\u002FGold\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">cgF1\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">AP\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 2px solid #ddd; padding: 12px 20px\">cgF1\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">cgF1\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">AP\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">AP\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">AP\u003Csub>o\u003C\u002Fsub>\n\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">cgF1\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd style=\"border-right: 2px solid #ddd; padding: 10px 20px\">Human\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 2px solid #ddd; padding: 10px 20px\">72.8\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">74.0\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd style=\"border-right: 2px solid #ddd; padding: 10px 20px\">OWLv2*\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px; color: #999\">29.3\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px; color: #999\">43.4\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 2px solid #ddd; padding: 10px 20px\">24.6\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px; color: #999\">30.2\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px; color: #999\">45.5\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">46.1\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">23.9\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">24.5\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd style=\"border-right: 2px solid #ddd; padding: 10px 20px\">DINO-X\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">38.5\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 2px solid #ddd; padding: 10px 20px\">21.3\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">52.4\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">56.0\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">22.5\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd style=\"border-right: 2px solid #ddd; padding: 10px 20px\">Gemini 2.5\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">13.4\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 2px solid #ddd; padding: 10px 20px\">13.0\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">16.1\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">14.4\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr style=\"border-top: 2px solid #b19c9cff\">\n      \u003Ctd style=\"border-right: 2px solid #ddd; padding: 10px 20px\">SAM 3\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">37.2\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">48.5\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 2px solid #ddd; padding: 10px 20px\">54.1\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">40.6\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">53.6\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">56.4\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">55.7\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">55.7\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\u003Cp style=\"text-align: center; margin-top: 10px; font-size: 0.9em; color: #ddd;\">* Partially trained on LVIS, AP\u003Csub>o\u003C\u002Fsub> refers to COCO-O accuracy\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n## Video Results\n\n\u003Cdiv align=\"center\">\n\u003Ctable style=\"min-width: 80%; border: 2px solid #ddd; border-collapse: collapse\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth rowspan=\"2\" style=\"border-right: 2px solid #ddd; padding: 12px 20px\">Model\u003C\u002Fth>\n      \u003Cth colspan=\"2\" style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">SA-V test\u003C\u002Fth>\n      \u003Cth colspan=\"2\" style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">YT-Temporal-1B test\u003C\u002Fth>\n      \u003Cth colspan=\"2\" style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">SmartGlasses test\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">LVVIS test\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">BURST test\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">cgF1\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">pHOTA\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">cgF1\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">pHOTA\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">cgF1\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">pHOTA\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; border-right: 1px solid #eee; padding: 12px 20px\">mAP\u003C\u002Fth>\n      \u003Cth style=\"text-align: center; padding: 12px 20px\">HOTA\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd style=\"border-right: 2px solid #ddd; padding: 10px 20px\">Human\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">53.1\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">70.5\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">71.2\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">78.4\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">58.5\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">72.3\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">-\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">-\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr style=\"border-top: 2px solid #b19c9cff\">\n      \u003Ctd style=\"border-right: 2px solid #ddd; padding: 10px 20px\">SAM 3\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">30.3\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">58.0\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">50.8\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">69.9\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">36.4\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">63.6\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; border-right: 1px solid #eee; padding: 10px 20px\">36.3\u003C\u002Ftd>\n      \u003Ctd style=\"text-align: center; padding: 10px 20px\">44.5\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\u003C\u002Fdiv>\n\n## SA-Co Dataset\n\nWe release 2 image benchmarks, [SA-Co\u002FGold](scripts\u002Feval\u002Fgold\u002FREADME.md) and\n[SA-Co\u002FSilver](scripts\u002Feval\u002Fsilver\u002FREADME.md), and a video benchmark\n[SA-Co\u002FVEval](scripts\u002Feval\u002Fveval\u002FREADME.md). The datasets contain images (or videos) with annotated noun phrases. Each image\u002Fvideo and noun phrase pair is annotated with instance masks and unique IDs of each object matching the phrase. Phrases that have no matching objects (negative prompts) have no masks, shown in red font in the figure. See the linked READMEs for more details on how to download and run evaluations on the datasets.\n\n* HuggingFace host: [SA-Co\u002FGold](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Ffacebook\u002FSACo-Gold), [SA-Co\u002FSilver](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Ffacebook\u002FSACo-Silver) and [SA-Co\u002FVEval](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Ffacebook\u002FSACo-VEval)\n* Roboflow host: [SA-Co\u002FGold](https:\u002F\u002Funiverse.roboflow.com\u002Fsa-co-gold), [SA-Co\u002FSilver](https:\u002F\u002Funiverse.roboflow.com\u002Fsa-co-silver) and [SA-Co\u002FVEval](https:\u002F\u002Funiverse.roboflow.com\u002Fsa-co-veval)\n\n![SA-Co dataset](assets\u002Fsa_co_dataset.jpg?raw=true)\n\n## Development\n\nTo set up the development environment:\n\n```bash\npip install -e \".[dev,train]\"\n```\n\nTo format the code:\n\n```bash\nufmt format .\n```\n\n## Contributing\n\nSee [contributing](CONTRIBUTING.md) and the\n[code of conduct](CODE_OF_CONDUCT.md).\n\n## License\n\nThis project is licensed under the SAM License - see the [LICENSE](LICENSE) file\nfor details.\n\n## Acknowledgements\n\nWe would like to thank the following people for their contributions to the SAM 3 project: Alex He, Alexander Kirillov,\nAlyssa Newcomb, Ana Paula Kirschner Mofarrej, Andrea Madotto, Andrew Westbury, Ashley Gabriel, Azita Shokpour,\nBen Samples, Bernie Huang, Carleigh Wood, Ching-Feng Yeh, Christian Puhrsch, Claudette Ward, Daniel Bolya,\nDaniel Li, Facundo Figueroa, Fazila Vhora, George Orlin, Hanzi Mao, Helen Klein, Hu Xu, Ida Cheng, Jake Kinney,\nJiale Zhi, Jo Sampaio, Joel Schlosser, Justin Johnson, Kai Brown, Karen Bergan, Karla Martucci, Kenny Lehmann,\nMaddie Mintz, Mallika Malhotra, Matt Ward, Michelle Chan, Michelle Restrepo, Miranda Hartley, Muhammad Maaz,\nNisha Deo, Peter Park, Phillip Thomas, Raghu Nayani, Rene Martinez Doehner, Robbie Adkins, Ross Girshik, Sasha\nMitts, Shashank Jain, Spencer Whitehead, Ty Toledano, Valentin Gabeur, Vincent Cho, Vivian Lee, William Ngan,\nXuehai He, Yael Yungster, Ziqi Pang, Ziyi Dou, Zoe Quake.\n\n## Citing SAM 3\n\nIf you use SAM 3 or the SA-Co dataset in your research, please use the following BibTeX entry.\n\n```bibtex\n@misc{carion2025sam3segmentconcepts,\n      title={SAM 3: Segment Anything with Concepts},\n      author={Nicolas Carion and Laura Gustafson and Yuan-Ting Hu and Shoubhik Debnath and Ronghang Hu and Didac Suris and Chaitanya Ryali and Kalyan Vasudev Alwala and Haitham Khedr and Andrew Huang and Jie Lei and Tengyu Ma and Baishan Guo and Arpit Kalla and Markus Marks and Joseph Greer and Meng Wang and Peize Sun and Roman Rädle and Triantafyllos Afouras and Effrosyni Mavroudi and Katherine Xu and Tsung-Han Wu and Yu Zhou and Liliane Momeni and Rishi Hazra and Shuangrui Ding and Sagar Vaze and Francois Porcher and Feng Li and Siyuan Li and Aishwarya Kamath and Ho Kei Cheng and Piotr Dollár and Nikhila Ravi and Kate Saenko and Pengchuan Zhang and Christoph Feichtenhofer},\n      year={2025},\n      eprint={2511.16719},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.16719},\n}\n```\n","SAM 3 是一个用于图像和视频中可提示分割的统一基础模型，能够使用文本或视觉提示（如点、框和掩码）检测、分割和跟踪对象。该项目的核心功能包括基于开放词汇概念进行详尽实例分割的能力，这使得用户可以通过简短的文字描述或示例来指定要分割的对象。技术上，SAM 3 在前代基础上增强了对复杂场景的理解与处理能力。此项目适用于需要高精度目标识别与分割的应用场景，比如自动驾驶、医疗影像分析以及视频内容编辑等。",2,"2026-06-11 03:39:44","high_star"]