[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80057":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":14,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":27,"discoverSource":28},80057,"InstructSAM","DCDmllm\u002FInstructSAM","DCDmllm","The code for \"InstructSAM: Segment Any Instance with Any Instructions\"","",null,"Python",80,6,2,0,14,2.54,false,"main",[21,22,23],"instruction-driven-segmentation","mllm","multi-instance-segmentation","2026-06-12 02:03:57","\u003Cdiv align=\"center\">\n\n# InstructSAM: Segment Any Instance with Any Instructions\n\n[![arXiv preprint](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2605.26102-ECA8A7?logo=arxiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.26102)\n[![Huggingface](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-HuggingFace-E6A151)](https:\u002F\u002Fhuggingface.co\u002FCircleRadon\u002FInstructSAM-2B)\n[![Data](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDataset-Huggingface-7EBDC2)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FCircleRadon\u002FInst2Seg)\n[![Benchmark](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBenchmark-Huggingface-B8A1E6)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FCircleRadon\u002FInst2Seg-Bench)\n[![YouTube](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FVideo-YouTube-B22222?logo=youtube&logoColor=white)](https:\u002F\u002Fyoutu.be\u002F26-yJqE8wBQ)\n\n\u003C\u002Fdiv>\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd>\u003Cimg src=\"assets\u002Fvisual_demo\u002F1.gif\" width=\"250\">\u003C\u002Ftd>\n    \u003Ctd>\u003Cimg src=\"assets\u002Fvisual_demo\u002F2.gif\" width=\"250\">\u003C\u002Ftd>\n    \u003Ctd>\u003Cimg src=\"assets\u002Fvisual_demo\u002F3.gif\" width=\"250\">\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd>\u003Cimg src=\"assets\u002Fvisual_demo\u002F5.gif\" width=\"250\">\u003C\u002Ftd>\n    \u003Ctd>\u003Cimg src=\"assets\u002Fvisual_demo\u002F6.gif\" width=\"250\">\u003C\u002Ftd>\n    \u003Ctd>\u003Cimg src=\"assets\u002Fvisual_demo\u002F7.gif\" width=\"250\">\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n## Overview\n\nInstructSAM is an instruction-driven multi-instance segmentation framework designed to segment arbitrary target instances from natural-language instructions.\n\n\u003Cimg src=\"assets\u002Fmodel.png\" width=\"95%\">\n\nKey features:\n\n- **Flexible instructions**: supports category prompts, referring expressions, and reasoning-style instructions.\n- **Instance-aware outputs**: predicts a set of instance masks instead of a single semantic region.\n- **Efficient inference**: avoids multi-round agentic prompting and repeated SAM calls.\n- **Inst2Seg dataset support**: includes training and evaluation scripts for instruction-based instance segmentation.\n\n## News\n\n* **[2026.5.26]** 🔥 We release InstructSAM.\n\n## Setup\n\nCreate a conda environment with Python 3.10:\n\n```bash\nconda create -n instructsam python=3.10 -y\nconda activate instructsam\npip install -r requirements.txt\npip install flash-attn --no-build-isolation\n```\n\n## Training\n### Download data\nWe provide the training annotation JSON files on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FCircleRadon\u002Finstructsam_training_eval_data). Download all JSON files and place them under `data\u002Ftraining`.\n\nRaw images should be downloaded from the official source of each dataset.\n\n### Stage 1\n\nStage 1 starts from the base [Qwen3-VL](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-2B-Instruct) checkpoint:\n\n```bash\nbash scripts\u002Ftrain\u002Fstage1.sh\n```\n\n### Merge Stage 1 Checkpoint\n\nAfter Stage 1 finishes, merge the Stage 1 LoRA weights into the base checkpoint:\n\n```bash\npython3 -m instructsam.merge_ckpt \\\n  --base_dir .\u002Fwork_dirs \\\n  --model_path instructsam_stage1_2b \\\n  --save_path instructsam_stage1_merged\n```\n\n\n### Stage 2: Reasoning Fine-tuning\n\nStage 2 starts from the merged Stage 1 checkpoint:\n\n```bash\nbash scripts\u002Ftrain\u002Fstage2.sh\n```\n\n## Inference\n\n### Download data\nWe provide the evaluation annotation JSON files on [Hugging Face](). Download all JSON files and place them under `data\u002Feval`.\n\nRun single-image inference:\n\n```bash\npython3 -m instructsam.infer \\\n  --model_path work_dirs\u002Fstage2 \\\n  --image-path path\u002Fto\u002Fimage.jpg \\\n  --query \"Please segment the object in the image.\" \\\n  --output-dir vis\n```\n\nThe script prints the generated text and mask scores, then writes mask overlays to `vis\u002F`.\n\n\n## Evaluation\n\nAvailable evaluation entry points:\n\n```bash\nbash evaluation\u002Fscripts\u002Feval_inst2seg.sh\nbash evaluation\u002Fscripts\u002Feval_reasonseg.sh\nbash evaluation\u002Fscripts\u002Feval_grefcoco_ap.sh\nbash evaluation\u002Fscripts\u002Feval_roborefit.sh\n```\n\nEdit the dataset roots in each script before running. The Python evaluation files require explicit `--image_folder` and `--question_file` arguments instead of relying on machine-specific defaults.\n\n\n\n## Citation\n\nIf you find this project useful, please cite using this BibTeX:\n\n```bibtex\n@article{yuan2026instructsam,\n  title     = {InstructSAM: Segment Any Instance with Any Instructions},\n  author    = {Yuqian Yuan, Wentong Li, Zhaocheng Li Yutong Lin, Juncheng Li, Siliang Tang, Jun Xiao, Yueting Zhuang, Wenqiao Zhang},\n  year      = {2026},\n  journal   = {arXiv},\n}\n```\n","InstructSAM 是一个基于指令驱动的多实例分割框架，旨在根据自然语言指令从图像中分割出任意目标实例。其核心功能包括支持多种类型的指令（如类别提示、指代表达和推理式指令）、输出实例感知的结果（预测一组实例掩码而非单一语义区域）以及高效的推理过程（避免多次代理提示和重复调用SAM）。此外，项目还提供了针对Inst2Seg数据集的训练与评估脚本。InstructSAM适用于需要高灵活性和精确度的图像处理场景，特别是当任务需求涉及复杂或多样化的对象识别时。","2026-06-11 03:59:04","CREATED_QUERY"]