[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80677":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":12,"contributorsCount":12,"subscribersCount":12,"size":12,"stars1d":12,"stars7d":14,"stars30d":15,"stars90d":12,"forks30d":12,"starsTrendScore":12,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":12,"starSnapshotCount":12,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},80677,"vlm-probe","marived\u002Fvlm-probe","marived","Probing fine-grained perception in open-source vision-language models — companion code for a writeup.",null,"Python",217,0,5,54,174,70,"MIT License",false,"main",[],"2026-06-12 04:01:29","# VLM-Probe: When Do VLMs Actually Look?\n\n> Companion code for our (in-progress) report on what fine-grained perceptual tasks\n> open-source vision-language models fail on, and whether the failure is in the\n> visual encoder or the language head.\n\nThis repo contains the evaluation harness, probe templates, and per-task scoring\nscripts used in the writeup. Models are loaded through `transformers`; tasks are\nspecified as YAML.\n\n## Citation\n\nIf you find any of the scripts here useful, please cite the report (preprint pending):\n\n```bibtex\n@misc{xu2026vlmprobe,\n  title  = {When Do VLMs Actually Look? Probing fine-grained perception in\n            open-source vision-language models},\n  author = {Xu, Mingrui},\n  year   = {2026},\n  note   = {Technical report, Beihang University},\n}\n```\n\n## Reproducing the numbers\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmarived\u002Fvlm-probe.git\ncd vlm-probe\npip install -e .\n\n# Download the small image set (~500 MB)\npython -m vlmprobe.data.fetch --out data\u002Fimages\u002F\n\n# Evaluate a model on all tasks\npython -m vlmprobe.run \\\n    --model llava-hf\u002Fllava-1.5-7b-hf \\\n    --tasks tasks\u002F*.yaml \\\n    --out   results\u002Fllava15.json\n```\n\nReported numbers in the writeup were produced on 2 x A100 40 GB and pinned versions in\n`requirements.lock`. Stochastic decoding is off by default (`temperature=0`); per-task\nseeds live in the task YAMLs.\n\n## Tasks\n\n| Task             | Items | What it asks                                    |\n| ---------------- | ----- | ----------------------------------------------- |\n| `count_objects`  |  400  | how many X are there? (1..8)                    |\n| `spatial_rel`    |  300  | is X to the left of Y? above? in front of?      |\n| `colour_attr`    |  300  | what colour is the X?                           |\n| `text_in_image`  |  200  | what does the sign say? (small text, OCR-ish)   |\n| `partial_occl`   |  250  | is X fully visible? partially occluded? hidden? |\n\nAll five are multiple-choice. Scoring is exact-match on the parsed answer.\n\n## Layout\n\n```\nvlm-probe\u002F\n├── vlmprobe\u002F\n│   ├── run.py            # main eval driver\n│   ├── model.py          # model loaders (LLaVA, Qwen-VL, InternVL, ...)\n│   ├── tasks.py          # task loader \u002F scorer\n│   └── data\u002F\n├── tasks\u002F                # one YAML per task\n├── results\u002F              # JSON outputs go here\n└── scripts\u002F              # ad-hoc post-processing\n```\n\n## License\n\nMIT, see [LICENSE](LICENSE).\n","VLM-Probe是一个用于评估开源视觉-语言模型在细粒度感知任务上表现的工具，旨在分析这些模型在特定任务中的失败原因，并判断问题出在视觉编码器还是语言头部。项目使用Python编写，通过`transformers`库加载模型，并以YAML格式定义任务。核心功能包括一个评估框架、探针模板以及每项任务的评分脚本。适用于需要深入理解现有视觉-语言模型局限性的研究场景，尤其是在物体计数、空间关系判断、颜色识别、文本读取及部分遮挡物检测等具体任务上的性能测试。该项目遵循MIT许可协议，确保了其在学术和工业界的广泛适用性。",2,"2026-06-11 04:01:37","CREATED_QUERY"]