[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80110":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":14,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":9,"rankLanguage":9,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":17,"hasPages":19,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},80110,"Video-Metaphorical-Understanding","LiQiiiii\u002FVideo-Metaphorical-Understanding","LiQiiiii","[arxiv] ViMU: Benchmarking Video Metaphorical Understanding",null,"Python",56,9,8,0,3,"MIT License",false,"main",true,[],"2026-06-12 02:03:58","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"assets\u002Foverall.png\" width=\"100%\"\u002F>\n\n# ViMU: Benchmarking Video Metaphorical Understanding\n\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-blue?style=for-the-badge&logo=googlechrome&logoColor=white)](https:\u002F\u002Fliqiiiii.github.io\u002FVideo-Metaphorical-Understanding\u002F)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-b31b1b?style=for-the-badge&logo=arxiv&logoColor=white)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.14607)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-Dataset-yellow?style=for-the-badge&logo=huggingface&logoColor=black)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FLIQIIIII\u002FViMU)\n\n\n[Qi Li](https:\u002F\u002Fliqiiiii.github.io\u002F), [Xinchao Wang](https:\u002F\u002Fsites.google.com\u002Fsite\u002Fsitexinchaowang\u002F)\u003Csup>*\u003C\u002Fsup>\n\n\u003Csup>*\u003C\u002Fsup>Corresponding author\n\n[xML Lab](https:\u002F\u002Fsites.google.com\u002Fview\u002Fxml-nus), National University of Singapore\n\n\u003C\u002Fdiv>\n\nThis repository contains the evaluation scripts for ViMU, a benchmark for video metaphorical understanding. The code evaluates multimodal models on four tasks:\n\n1. Open-ended interpretation (OE)\n2. Evidence grounding (EG)\n3. Rhetoric mechanism identification (RM)\n4. Social value signal identification (SV)\n\n## Directory Structure\n\nExpected project structure:\n\n```text\nViMU\u002F\n├── videos\u002F\n│   ├── vimu_000001.mp4\n│   └── ...\n├── metadata\u002F\n│   ├── vimu_oe.jsonl\n│   ├── vimu_eg.jsonl\n│   ├── vimu_ss.jsonl\n│   ├── video_evidence.jsonl\n│   └── cache\u002F\n├── scripts\u002F\n│   ├── 00-vimu_oe.py\n│   ├── 01-vimu_oe_judge.py\n│   ├── 02-vimu_oe_score.py\n│   ├── 10-vimu_eg.py\n│   ├── 11-vimu_eg_score.py\n│   ├── 20-vimu_ss.py\n│   ├── 21-vimu_ss_score.py\n│   └── utils.py\n└── output\u002F\n````\n\n## Setup\n\nInstall dependencies:\n\n```bash\npip install openai requests numpy pandas tqdm\n```\n\nDepending on the models used, additional API keys may be required.\n\nSet API keys:\n\n```bash\nexport OPENAI_API_KEY=\"your_openai_key\"\nexport OPENROUTER_API_KEY=\"your_openrouter_key\"\nexport GOOGLE_API_KEY=\"your_google_key\"\n```\n\nNot all keys are required if you only run a subset of models.\n\n## Path Configuration\n\nBefore running, edit each script and set:\n\n```python\nPROJECT_ROOT = \"\u002FYour\u002FPath\u002FTo\u002FViMU\"\n```\n\n## Recommended Running Order\n\nFor a full evaluation, run:\n\n```bash\n# Open-ended interpretation\npython scripts\u002F00-vimu_oe.py\npython scripts\u002F01-vimu_oe_judge.py\npython scripts\u002F02-vimu_oe_score.py\n\n# Evidence grounding\npython scripts\u002F10-vimu_eg.py\npython scripts\u002F11-vimu_eg_score.py\n\n# Structured subtext tasks without guidance\npython scripts\u002F20-vimu_ss.py --prompt_mode without_guidance\npython scripts\u002F21-vimu_ss_score.py --prompt_mode without_guidance\n\n# Structured subtext tasks with guidance\npython scripts\u002F20-vimu_ss.py --prompt_mode with_guidance\npython scripts\u002F21-vimu_ss_score.py --prompt_mode with_guidance\n```\n\n## Model Configuration\n\nModels are configured in the `MODEL_SPECS` list inside the inference scripts.\n\nTo enable or disable a model, edit:\n\n```python\n\"enabled\": True\n```\n\nor\n\n```python\n\"enabled\": False\n```\n\nFor OpenRouter models, make sure the model ID and API key are valid.\n\n## Output Files\n\nThe main output files are:\n\n```text\noutput\u002Fvimu_oe_summary.json\noutput\u002Fvimu_eg_summary.json\noutput\u002Fvimu_ss_without_guidance_summary.json\noutput\u002Fvimu_ss_with_guidance_summary.json\n```\n\nThese files contain aggregated evaluation results.\n\n## Scoring Rules\n\n### Open-ended Interpretation\n\nOpen-ended answers are evaluated using an LLM-as-a-judge protocol. The judge scores semantic understanding based on:\n\n```text\ncore intent\nimplicit signal\ntarget or social meaning\nhallucination penalty\nliteral-only penalty\n```\n\nEvidence grounding is scored as a multi-label prediction problem. If the prediction contains any incorrect option, the score is 0. Otherwise, if the prediction is a subset of the gold answer, the score is: `score = number of correctly selected options \u002F number of gold options`. Rhetoric and social value tasks use the same multi-label scoring rule. If no incorrect option is selected; otherwise: `score = 0`.\n\n## Notes\n\nThe dataset contains socially sensitive video memes. The benchmark is intended for research use only.\n\n## Citation\n\nIf you finding our work interesting or helpful to you, please cite as follows:\n\n```\n@article{li2026vimu,\n  title={ViMU: Benchmarking Video Metaphorical Understanding},\n  author={Li, Qi and Wang, Xinchao},\n  journal={arXiv preprint arXiv:2605.14607},\n  year={2026}\n}\n```\n\n","ViMU是一个用于评估视频隐喻理解能力的基准测试项目。该项目通过四个任务来评测多模态模型的表现：开放式解释、证据定位、修辞机制识别以及社会价值信号识别。技术上，它基于Python语言开发，并依赖于OpenAI等API支持其功能实现。适用于需要对视频内容进行深层次语义理解和分析的场景，如多媒体内容分析、智能视频处理等领域。",2,"2026-06-11 03:59:16","CREATED_QUERY"]