[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75114":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},75114,"tribev2","facebookresearch\u002Ftribev2","facebookresearch","This repository contains the code to train and evaluate TRIBE v2, a multimodal model for brain response prediction",null,"Jupyter Notebook",2848,619,28,16,0,36,84,302,108,110.38,"Other",false,"main",[],"2026-06-12 04:01:17","\u003Cdiv align=\"center\">\n\n# TRIBE v2\n\n**A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience**\n\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Ftribev2\u002Fblob\u002Fmain\u002Ftribe_demo.ipynb)\n[![License: CC BY-NC 4.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-CC%20BY--NC%204.0-lightgrey.svg)](https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc\u002F4.0\u002F)\n[![Python 3.11+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11%2B-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n\n📄 [Paper](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fa-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience\u002F) ▶️ [Demo](https:\u002F\u002Faidemos.atmeta.com\u002Ftribev2\u002F) | 🤗 [Weights](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Ftribev2)\n\n\u003C\u002Fdiv>\n\nTRIBE v2 is a deep multimodal brain encoding model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines state-of-the-art text, audio and video models into a unified Transformer architecture that maps multimodal representations onto the cortical surface.\n\n## Quick start\n\nLoad a pretrained model from HuggingFace and predict brain responses to a video:\n\n```python\nfrom tribev2 import TribeModel\n\nmodel = TribeModel.from_pretrained(\"facebook\u002Ftribev2\", cache_folder=\".\u002Fcache\")\n\ndf = model.get_events_dataframe(video_path=\"path\u002Fto\u002Fvideo.mp4\")\npreds, segments = model.predict(events=df)\nprint(preds.shape)  # (n_timesteps, n_vertices)\n```\n\nPredictions are for the \"average\" subject (see paper for details) and live on the **fsaverage5** cortical mesh (~20k vertices).\nThey are offset by 5 seconds in the past, in order to compensate for the hemodynamic lag.\n\nYou can also pass `text_path` or `audio_path` to `model.get_events_dataframe` — text is automatically converted to speech and transcribed to obtain word-level timings.\n\nFor a full walkthrough with brain visualizations, see the [Colab demo notebook](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Ftribev2\u002Fblob\u002Fmain\u002Ftribe_demo.ipynb).\n\n## Installation\n\n**Basic** (inference only):\n```bash\npip install -e .\n```\n\n**With brain visualization**:\n```bash\npip install -e \".[plotting]\"\n```\n\n**With training dependencies** (PyTorch Lightning, W&B, etc.):\n```bash\npip install -e \".[training]\"\n```\n\n## Training a model from scratch\n\n### 1. Set environment variables\n\nConfigure data\u002Foutput paths and Slurm partition (or edit `tribev2\u002Fgrids\u002Fdefaults.py` directly):\n\n```bash\nexport DATAPATH=\"\u002Fpath\u002Fto\u002Fstudies\"\nexport SAVEPATH=\"\u002Fpath\u002Fto\u002Foutput\"\n```\n\n\n### 2. Run training\n\n**Local test run:**\n```bash\npython -m tribev2.grids.test_run\n```\n\n**Grid search on Slurm:**\n```bash\npython -m tribev2.grids.run_cortical\npython -m tribev2.grids.run_subcortical\n```\n\n## Project structure\n\n```\ntribev2\u002F\n├── main.py              # Experiment pipeline: Data, TribeExperiment\n├── model.py             # FmriEncoder: Transformer-based multimodal→fMRI model\n├── pl_module.py         # PyTorch Lightning training module\n├── demo_utils.py        # TribeModel and helpers for inference from text\u002Faudio\u002Fvideo\n├── eventstransforms.py  # Custom event transforms (word extraction, chunking, …)\n├── utils.py             # Multi-study loading, splitting, subject weighting\n├── utils_fmri.py        # Surface projection (MNI \u002F fsaverage) and ROI analysis\n├── grids\u002F\n│   ├── defaults.py      # Full default experiment configuration\n│   └── test_run.py      # Quick local test entry point\n├── plotting\u002F            # Brain visualization (PyVista & Nilearn backends)\n└── studies\u002F             # Dataset definitions (Algonauts2025, Lahner2024, …)\n```\n\n## Contributing to open science\n\nIf you use this software, please share your results with the broader research community using the following citation:\n\n```bibtex\n@article{dascoli2026foundation,\n  title={A foundation model of vision, audition, and language for in-silico neuroscience},\n  author={d'Ascoli, St{\\'e}phane and Rapin, J{\\'e}r{\\'e}my and Benchetrit, Yohann and Brooks, Teon and Begany, Katelyn and Raugel, Jos{\\'e}phine and Banville, Hubert and King, Jean-R{\\'e}mi},\n  journal={arXiv preprint arXiv:2605.04326},\n  year={2026}\n}\n```\n\n## License\n\nThis project is licensed under CC-BY-NC-4.0. See [LICENSE](LICENSE) for details.\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.\n","TRIBE v2 是一个用于预测大脑对自然刺激（视频、音频、文本）响应的多模态模型。它将最先进的文本、音频和视频模型整合到统一的Transformer架构中，能够将多模态表示映射到皮质表面，从而实现脑响应的预测。项目支持通过预训练模型快速生成针对特定视频、音频或文本内容的大脑活动预测，并且提供了详细的Colab演示笔记本，便于用户理解和使用。此外，TRIBE v2 还允许研究人员基于自己的数据集从头开始训练模型，适用于神经科学领域的研究者进行虚拟实验或探索大脑处理多感官信息机制的研究场景。",2,"2026-06-11 03:52:23","high_star"]