Meta's TRIBE v2 Maps fMRI Responses Across 70,000 Voxels

Meta’s Fundamental AI Research team has released TRIBE v2, a trimodal foundation model built to predict whole-brain neural responses. The release introduces a computational method for running neuroscientific experiments in software, bypassing the need for physical functional Magnetic Resonance Imaging scans. By analyzing text, audio, and video inputs, the model outputs high-resolution 3D voxel maps representing predicted human brain activity.

Architectural Scaling and Resolution

The second iteration of the Transformer for In-silico Brain Experiments architecture represents a 70x increase in prediction resolution. While the original model yielded roughly 1,000 cortical predictions, TRIBE v2 scales to 70,000 voxels. This expansion maps the entire human cortex with unprecedented granularity.

Training required more than 1,000 hours of fMRI data collected from 720 healthy volunteers. This is a severe increase from the four subjects used to train the original system. During data collection, subjects consumed naturalistic stimuli including movies, podcasts, and written text to generate complex neural activation patterns.

The system processes inputs through a three-stage pipeline. Pretrained AI encoders first extract features from the source media. The text backend relies on Llama 3.2, video processing uses Video-JEPA 2, and audio is handled by Wav2Vec-Bert-2.0 or Seamless Communication models. A transformer then integrates these multimodal inputs into universal representations. A final person-specific layer maps these shared representations directly onto fMRI units.

Zero-Shot Generalization

TRIBE v2 achieves zero-shot generalization across new subjects, unseen languages, and novel tasks without retraining. The architecture delivers a 2-3x improvement in prediction accuracy over previous state-of-the-art linear encoding models.

This capability enables researchers to simulate cognitive processing across diverse populations computationally. Cognitive scientists report the model accurately recovers known specialized brain regions, such as the fusiform face area, and localizes language networks more reliably than individual noisy fMRI scans.

Clinical and Commercial Considerations

Meta positions the release as a tool for in silico neuroscience. This allows researchers to test hypotheses about brain function computationally, reducing the dependency on expensive physical lab environments.

Clinical applications center on accelerating treatments for neurological conditions like stroke, autism, and Alzheimer’s disease. Researchers can simulate how these disorders alter neural pathways under varying stimuli.

The model is distributed under a Creative Commons Attribution-NonCommercial license. Meta provides the research paper, model weights, GitHub codebase, and an interactive demo. The non-commercial restriction aligns with community concerns regarding computational neuromarketing, where commercial platforms could theoretically use the architecture to optimize content for maximal neural engagement.

If you are developing biologically inspired neural networks, studying the TRIBE v2 codebase provides a structural reference for how modern foundation models manage trimodal integration across discrete physical systems.

Meta's TRIBE v2 Maps fMRI Responses Across 70,000 Voxels

Architectural Scaling and Resolution

Zero-Shot Generalization

Clinical and Commercial Considerations

Keep Reading

How to Scale PyTorch Training With AWS Building Blocks

Roche Integrates PathAI Diagnostic Algorithms in $1.05B Deal

GENE-26.5 Gives Hardware-Agnostic Robots Human-Scale Dexterity

Boosting Drug Discovery via Paired Protein Language Model

Hugging Face Releases TRL v1.0 to Standardize LLM Fine-Tuning and Alignment