[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79197":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},79197,"mm-probe-kit","edmicho\u002Fmm-probe-kit","edmicho","A small, hackable toolkit for probing multimodal LLMs — attention, hidden states, alignment, and causal tracing.",null,"Python",223,8644,6,0,194,10,"Other",false,"main",[],"2026-06-12 02:03:49","# MM-Probe — A Probing Toolkit for Multimodal LLMs\n\n> Small, hackable utilities for peeking inside vision-language models.\n\n`mm-probe-kit` collects the bits I kept rewriting whenever I wanted to look at\nattention maps, hidden states, or modality alignment in models like LLaVA,\nQwen-VL, and BLIP-2.  Nothing fancy — forward hooks, a few standard probes,\nand some matplotlib helpers — but packaged so I can stop copy-pasting them\nbetween notebooks.\n\n## Why\n\nA lot of interpretability tooling assumes a single decoder-only LM.  MLLMs\nmix a vision encoder, a connector \u002F projector, and a language backbone, and\nthe interesting things often happen *across* those boundaries.  This package\nmakes it less painful to:\n\n- register hooks on the right submodule across model families\n- pull out per-layer attention \u002F hidden states with consistent shapes\n- compute simple cross-modality metrics (text↔image alignment, attention\n  entropy, attention rollout)\n- patch activations for causal tracing-style experiments\n\n## Supported models\n\n| Family       | Status | Notes                                  |\n|--------------|--------|----------------------------------------|\n| LLaVA-1.5    | ✅      | tested on 7B \u002F 13B                     |\n| LLaVA-Next   | ✅      | dynamic resolution handled             |\n| Qwen-VL      | ✅      | original + Qwen2-VL                    |\n| BLIP-2       | ✅      | Q-Former outputs exposed               |\n| InternVL     | ✅      | ViT-H feature extraction               |\n| MiniCPM-V    | partial | tokenizer quirks, see issue #11        |\n\n## Install\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fedmicho\u002Fmm-probe-kit\ncd mm-probe-kit\npip install -e .\n```\n\nPyTorch ≥ 2.0 and `transformers` ≥ 4.36 are required.  For CUDA, install the\nmatching torch wheel first.\n\n## Quick start\n\n```python\nfrom mm_probe import load_model\nfrom mm_probe.probes import attention, alignment\n\nmodel = load_model(\"llava-hf\u002Fllava-1.5-7b-hf\", dtype=\"bf16\")\nout = model.generate(\"What is in this image?\", image=\"examples\u002Fcat.jpg\")\n\n# per-layer attention over the visual tokens\nattn = attention.visual_attention(model, layer=15)\n\n# rough text↔image cosine alignment, layer-by-layer\nscore = alignment.text_image_cosine(model)\n```\n\nThere is a small CLI for running a single probe end-to-end:\n\n```bash\nmm-probe run configs\u002Fllava-7b.yaml --image examples\u002Fcat.jpg \\\n    --prompt \"Describe the scene.\" --probe attention_rollout\n```\n\n## Architecture\n\n```\n              ┌──────────────┐\n              │  load_model  │   ─ thin wrapper around HF AutoModel\u002Fprocessor\n              └──────┬───────┘\n                     │\n              ┌──────▼───────┐\n              │ HookManager  │   ─ registers forward hooks on named modules\n              └──────┬───────┘\n                     │\n       ┌──────┬──────┼──────┬────────┐\n       │      │      │      │        │\n   attention hidden align rollout  causal\n```\n\nEach probe is a function that takes a model wrapper (with hooks already\nattached) and returns a tensor or dict of tensors.  No fancy framework — you\ncan usually copy a probe file into a notebook and tweak it.\n\n## Configuration\n\nCLI runs read YAML configs:\n\n```yaml\n# configs\u002Fllava-7b.yaml\nmodel:\n  name: llava-hf\u002Fllava-1.5-7b-hf\n  dtype: bf16\n  device: cuda:0\n\nprobe:\n  layers: [10, 15, 20, 25]\n  return_image_tokens: true\n\nviz:\n  overlay: true\n  cmap: magma\n```\n\n## Roadmap\n\n- [x] Forward hook + activation cache\n- [x] Attention \u002F hidden state \u002F rollout \u002F alignment \u002F causal probes\n- [x] LLaVA, Qwen-VL, BLIP-2, InternVL\n- [ ] Better Qwen2-VL position-id handling for dynamic resolution\n- [ ] Probes for audio-conditioned multimodal models (Qwen2-Audio etc.)\n- [ ] More notebooks\n\n## Citation\n\nIf this happens to be useful in a paper:\n\n```bibtex\n@misc{mmprobekit,\n  author = {Zihao Wei},\n  title  = {mm-probe-kit: a probing toolkit for multimodal LLMs},\n  year   = {2025},\n  url    = {https:\u002F\u002Fgithub.com\u002Fedmicho\u002Fmm-probe-kit}\n}\n```\n\n## License\n\nMIT.\n","mm-probe-kit 是一个用于探究多模态大语言模型（如LLaVA、Qwen-VL等）内部机制的小型工具包，主要功能包括查看注意力分布、隐藏状态以及模态对齐情况。该项目通过提供一系列简洁可复用的Python工具，如前向钩子注册、标准探测器和matplotlib辅助函数，简化了跨视觉与语言模型组件边界的数据提取和分析过程。适用于需要深入理解多模态模型工作原理的研究者或开发者，在进行因果追踪实验时尤为有用。",2,"2026-06-01 03:48:13","CREATED_QUERY"]