[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79202":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":14,"starSnapshotCount":14,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},79202,"spatial-vqa-bench","sitodowubb\u002Fspatial-vqa-bench","sitodowubb","Spatial-VQA-Bench: a focused benchmark of spatial visual reasoning for multimodal LLMs.",null,"Python",220,7621,6,0,195,10,"Other",false,"main",true,[],"2026-06-12 02:03:50","# Spatial-VQA-Bench\n> A small, focused benchmark of *spatial* visual reasoning for multimodal LLMs — left\u002Fright, behind\u002Fin-front, near\u002Ffar, and \"if I rotated this 90°\".\n\n## Overview\n\nMost VQA benchmarks treat spatial reasoning as a sub-task — a thin slice on top of object recognition and counting. The numbers therefore wash out: a model that aces \"what colour is the umbrella\" can score similarly to a model that actually understands that *the umbrella is behind the bench*. Spatial-VQA-Bench tries to isolate that signal.\n\nIt's 3,200 hand-vetted items across **five** task families:\n\n- **2D-relations** — left of \u002F right of \u002F above \u002F below \u002F between\n- **3D-relations** — in front of \u002F behind \u002F nearer \u002F farther\n- **Rotation** — \"if I rotated this object 90° clockwise, which way does X point?\"\n- **Occlusion** — questions about hidden \u002F partially-occluded objects\n- **Viewpoint** — \"what would the back of this look like?\"\n\nImages are sourced from indoor scenes (ScanNet renders, OpenImages indoor) and synthetic 3D scenes for the rotation\u002Fviewpoint tasks.\n\n## Architecture\n\n```\nitems.jsonl ──▶ runner ──▶ model adapter ──▶ predictions.jsonl\n                                                    │\n                                                    ▼\n                                              scorer ──▶ {acc\u002Ffamily}\n```\n\nEach item is a dict with `id`, `task`, `image`, `question`, `choices`, `answer`. Items are open-ended but always have a CMC version available.\n\n## Installation\n\n```bash\npip install -e \".[full]\"\n```\n\n## Quick Start\n\n```bash\n# evaluate one model\nsvb run --model qwen2-vl-7b --output predictions\u002Fqwen2-vl-7b.jsonl\n\n# score predictions\nsvb score predictions\u002Fqwen2-vl-7b.jsonl --report\n\n# reproduce Table 2 (all baseline models, all tasks)\nbash scripts\u002Frepro_table2.sh\n```\n\n## Benchmarks\n\n| Model              | 2D-rel | 3D-rel | Rot. | Occl. | Viewp. | Avg  |\n|--------------------|--------|--------|------|-------|--------|------|\n| Random             | 25.0   | 25.0   | 25.0 | 25.0  | 25.0   | 25.0 |\n| LLaVA-1.5-7B       | 51.2   | 38.4   | 30.2 | 47.0  | 32.1   | 39.8 |\n| LLaVA-1.5-13B      | 54.0   | 41.7   | 32.4 | 50.5  | 34.9   | 42.7 |\n| Qwen2-VL-7B        | 66.3   | 52.0   | 42.1 | 61.2  | 45.7   | 53.5 |\n| InternVL2-8B       | 65.1   | 50.6   | 41.4 | 60.0  | 44.9   | 52.4 |\n| GPT-4o-mini        | 70.4   | 56.7   | 47.8 | 65.3  | 51.0   | 58.2 |\n| GPT-4o             | 76.5   | 63.4   | 55.2 | 71.1  | 58.9   | 65.0 |\n| Human              | 92.8   | 89.5   | 84.7 | 91.0  | 86.4   | 88.9 |\n\nPatterns we see:\n\n- All models struggle most on **rotation** and **viewpoint** — these require mental simulation, not just direct perception.\n- The gap between LLaVA-1.5-7B and Qwen2-VL-7B is large on 3D-relations (+13.6 pts) but narrow on 2D-relations (+15.1) — Qwen2-VL really does seem to have a stronger 3D prior.\n\n## Citation\n\n```bibtex\n@misc{spatialvqabench,\n  author = {Boyang Ma},\n  title  = {Spatial-VQA-Bench: A Benchmark of Spatial Visual Reasoning for MLLMs},\n  year   = {2025},\n  url    = {https:\u002F\u002Fgithub.com\u002Fsitodowubb\u002Fspatial-vqa-bench}\n}\n```\n\n## License\n\nMIT.\n","Spatial-VQA-Bench 是一个专注于多模态大语言模型的空间视觉推理基准测试项目。该项目通过五个任务家族（2D关系、3D关系、旋转、遮挡和视角）中的3,200个手工验证的条目，评估模型在理解物体之间空间位置关系方面的能力。其核心功能包括支持多种类型的视觉问题生成与评估，并提供了一个简洁的架构用于模型预测结果的评分。特别适合于需要对多模态AI系统进行深入空间推理能力测试的研究场景，如室内环境下的物体相对位置判断或复杂视角变换下的物体识别等任务。",2,"2026-06-06 03:58:39","CREATED_QUERY"]