[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80768":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":36,"discoverSource":37},80768,"mercury-mcp","norika1207-lab\u002Fmercury-mcp","norika1207-lab","Cross-architecture LLM internal observation database (23 models, 13 architecture families). Exposed as MCP tools for any AI coding agent.","https:\u002F\u002Fgithub.com\u002Fnorika1207-lab\u002Fmercury-mcp",null,"Python",42,13,40,0,2,41.64,false,"main",true,[22,23,24,25,26,27,28,29,30,31,32],"ai-agents","anchor-dimensions","consumer-hardware","cross-architecture","frankenstein-merge","llm","mcp","mechanistic-interpretability","model-context-protocol","open-data","transformer","2026-06-12 04:01:30","# Mercury MCP: Cross-Architecture LLM Internal Observation, as Agent Tools\n\n[![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002FDOI\u002F10.5281\u002Fzenodo.20352085.svg)](https:\u002F\u002Fdoi.org\u002F10.5281\u002Fzenodo.20352085)\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n\n\n> \"Most AI coding agents don't know what's inside the model they're talking to. Mercury does.\"\n\nMercury MCP exposes a 23-LLM cross-architecture observation database to any agent that speaks the [Model Context Protocol](https:\u002F\u002Fmodelcontextprotocol.io\u002F) (Claude Code, Cursor, Cline, Goose, etc.).\n\nBuilt entirely on consumer hardware (one Mac mini + one NVIDIA DGX Spark) at near-zero compute cost.\n\n---\n\n## What it answers\n\nTry these prompts with any MCP-aware agent after installing:\n\n- *\"What hidden dimensions are universally hot across LLM families?\"*\n- *\"In qwen-7B, which layer has the same functional fingerprint as falcon-7B layer 16?\"*\n- *\"Compose a layer recipe for a Chinese-writing + reasoning hybrid model.\"*\n- *\"Show me OLMo2's anchor dimensions and how they overlap with the qwen anchor set.\"*\n\nThe agent calls Mercury MCP tools; Mercury answers from precomputed observation data.\n\n---\n\n## Why this matters\n\n### For mech interp researchers\n23-LLM cross-architecture survey across 13 architecture families, two observation tiers per model.\n\nTier-A (output-layer logit hooks): cheap, fast, candidate-signal screening. May surface artifacts from shared tokenization (vocab-id mod hidden_size collisions). Useful as hypothesis-generating layer.\n\nTier-B (HF output_hidden_states across all layers): the actual finding layer. Per-layer residual stream fingerprints. Cross-architecture functional layer alignment qwen-7B L15 to falcon-7B L16 reaches 0.868 similarity. 54\u002F84 model-pairs aligned at the middle layers (50-60% depth).\n\nEarlier \"dim 11 universal across families\" claim from Tier-A is being re-framed as candidate signal, not validated residual stream feature. Working through this in Paper A draft.\n\n### For LLM application developers\nStop guessing which model to fine-tune. Mercury tells you structurally which layers carry which capability and which models' middle layers are functionally interchangeable.\n\n### For Frankenstein \u002F model-merge people\nCross-architecture per-layer alignment matrix included (12 Tier-B models, 84 pairwise). Strongest single match: qwen-7B L15 to falcon-7B L16 at 0.868 similarity. mistral \u002F yi \u002F gemma occupy alternative residual subspace, possible non-interfering composition substrate.\n\n---\n\n## Install (3 lines)\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fnorika1207-lab\u002Fmercury-mcp\ncd mercury-mcp\npip install -e .\n```\n\nThen add to your MCP client config:\n\n```jsonc\n\u002F\u002F ~\u002F.claude\u002Fmcp_settings.json  (Claude Code)\n{\n  \"mcpServers\": {\n    \"mercury\": {\n      \"command\": \"python\",\n      \"args\": [\"-m\", \"mercury_mcp.server\"]\n    }\n  }\n}\n```\n\nRestart your agent. You now have 7 new tools: `mercury_list_models`, `mercury_anchor_dims`, `mercury_universal_anchors`, `mercury_layer_fingerprint`, `mercury_cross_arch_equivalent`, `mercury_compose_recipe`, `mercury_about`.\n\n---\n\n## The 23 models in v0.1 (13 architecture families)\n\n| Family | Models | Tier-B done |\n|---|---|---|\n| qwen2 | 3B, 7B, 14B, 32B, coder-32B | 3B, 7B, 14B ✓ |\n| qwen2-distill (DeepSeek-R1) | 7B, 32B, 70B | 7B ✓ |\n| llama | 3.1-8B, 3.3-70B | pending |\n| phi3 | medium-14B | ✓ |\n| falcon3 | 7B | ✓ |\n| internlm2 | 7B | running |\n| mistral | 7B-v0.3, small-3.2-24B | 7B ✓ |\n| olmo2 | 7B (AllenAI fully-open) | ✓ |\n| gemma2 | 9B | ✓ |\n| granite | 3.1-dense-8B (IBM) | ✓ |\n| yi | 1.5-9B | ✓ |\n| starcoder2 | 15B | pending |\n| codestral | 22B (mistral coder variant) | pending |\n| command-r | 35B (Cohere) | pending |\n\nTier-A all 23 done. Tier-B 12 of 23 done, rest in progress.\n\n---\n\n## Findings (still evolving, see Paper A draft for current framing)\n\n1. Cross-architecture per-layer functional alignment. qwen-7B L15 to falcon-7B L16 reaches sim 0.868. 54 of 84 pairwise comparisons show sim above 0.7 at middle layers (50-60% depth). This is the strongest finding right now. Comes from Tier-B (residual stream level), so not vulnerable to tokenizer artifact critique.\n\n2. Distillation appears to transplant anchor structure. DeepSeek-R1:70b uses a llama base but Tier-A anchor structure aligns with qwen at 44.7x random baseline. Note this is Tier-A so caveats apply (see point 4), but the magnitude is hard to dismiss as pure vocab artifact at 44.7x.\n\n3. Three families show alternative residual stream geometry. mistral-7B, mistral-small, yi-1.5 all show 0\u002F11 qwen-anchor presence in Tier-A and structurally different hot dims in Tier-B. They occupy a different subspace, possible substrate for non-interfering cross-architecture composition.\n\n4. Methodology caveat being worked through openly. Tier-A \"dim 11 universal across families\" (originally framed as residual stream feature) may be a vocab-token-id mod hidden_size collision artifact from shared tokenizer training. Tier-B on OLMo2 contradicts the Tier-A reading. Paper A is being re-framed with Tier-A as screening layer (candidate signals) and Tier-B as finding layer (validated residual stream observations). Discussion welcome.\n\nFull data + analysis: see [`anchor-survival-MASTER.txt`](https:\u002F\u002Fgithub.com\u002Fnorika1207-lab\u002Fmercury-paper-handoff\u002Fblob\u002Fmain\u002Fanchor-survival-MASTER.txt) and the [`mercury-paper-handoff`](https:\u002F\u002Fgithub.com\u002Fnorika1207-lab\u002Fmercury-paper-handoff) repo.\n\n---\n\n## Tool reference\n\n| Tool | Purpose |\n|---|---|\n| `mercury_list_models` | List all 23 observed models |\n| `mercury_anchor_dims(model, top_k=50)` | This model's hot dims + qwen-anchor overlap analysis |\n| `mercury_universal_anchors(min_models=3)` | Cross-model universal anchor presence ranking |\n| `mercury_layer_fingerprint(model, layer)` | Per-layer functional fingerprint (Tier-B only) |\n| `mercury_cross_arch_equivalent(model, layer, top_k=5)` | Find functionally-similar layers in OTHER models |\n| `mercury_compose_recipe(capabilities)` | Generate composable layer recipe for given capabilities |\n| `mercury_about` | Paper status, citations, contribution info |\n\n---\n\n## How the data was collected\n\n- **Observation protocol**: 10 multilingual prompts per model, ~60 tokens each, greedy decoding\n- **Tier-A**: hooked into `llama_cpp.Llama.logits_processor` to capture output-layer hot dims per token\n- **Tier-B**: `transformers.AutoModelForCausalLM.from_pretrained` with `output_hidden_states=True`, per-layer residual stream activation magnitudes binned into 10 quantiles\n- **Cell grid scheme**: addressable `(layer, dim, quantile_idx)` mmap-able binary, ~24MB-840MB depending on model size\n- **Source code**: [paper-handoff `paper-G-composable-llm\u002Ftools\u002F`](https:\u002F\u002Fgithub.com\u002Fnorika1207-lab\u002Fmercury-paper-handoff)\n\nReproducibility: ~4 hours wall-clock per model on a Mac mini M4 Pro. No GPU required.\n\n---\n\n## Cite\n\n```bibtex\n@dataset{oda_mercury_2026,\n  author = {Chen, Ho Yiing (norika)},\n  title  = {Mercury: Cross-Architecture Hot-Dim Geometry Database for 23 LLMs},\n  year   = {2026},\n  doi    = {10.5281\u002Fzenodo.20352085},\n  url    = {https:\u002F\u002Fgithub.com\u002Fnorika1207-lab\u002Fmercury-mcp}\n}\n```\n\n---\n\n## License\n\nMIT for code. Data CC-BY-4.0.\n\n---\n\n## Author\n\nnorika (Chen Ho Yiing), Taiwan independent researcher.\n\n- ORCID: [0009-0006-6816-9891](https:\u002F\u002Forcid.org\u002F0009-0006-6816-9891)\n- Google Scholar: [wrTR3VMAAAAJ](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=wrTR3VMAAAAJ)\n- GitHub: [@norika1207-lab](https:\u002F\u002Fgithub.com\u002Fnorika1207-lab)\n\nBuilt solo, no institutional affiliation, no grant, no GPU. Just curiosity and stubborn nights.\n","Mercury MCP 是一个跨架构的大规模语言模型内部观察数据库，支持23个模型和13个架构家族。项目通过Model Context Protocol（MCP）为AI编码代理提供工具，使得代理能够查询模型内部结构及特性，如隐藏维度的普遍活跃性、层间功能指纹匹配等。技术上，该项目完全基于消费级硬件构建，并以极低的计算成本运行。适用于机制解释研究者分析不同模型间的共性和差异，LLM应用开发者选择合适的微调模型，以及进行模型合并的开发者寻找功能相似的层进行组合。","2026-06-11 04:01:57","CREATED_QUERY"]