
Yale University, Dartmouth College, and the University of Cambridge researchers have developed MindLLM, a subject-agnostic model for decoding functional magnetic resonance imaging (fMRI) signals into text.
Integrating a neuroscience-informed attention mechanism with a large language model (LLM), the model outperforms existing approaches with a 12.0% improvement in downstream tasks, a 16.4% increase in unseen subject generalization, and a 25.0% boost in novel task adaptation compared to prior models like UMBRAE, BrainChat, and UniBrain.
Decoding brain activity into natural language has significant implications for neuroscience and brain-computer interface applications. Previous attempts have faced challenges in predictive performance, limited task variety, and poor generalization across subjects. Existing approaches often require subject-specific parameters, limiting their ability to generalize across individuals.
In the study “MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding,” published on the pre-print server arXiv, MindLLM was evaluated using comprehensive fMRI-to-text benchmarks based on data from eight individuals (NSD—Natural Scenes Dataset), a widely used standard dataset in fMRI research.
The MindLLM design consists of an fMRI encoder and a large language model.
First, the fMRI scans divide the brain into tiny 3D units called voxels (like 3D pixels). Different people have different brain structures that never quite match when aligned to a standardized brain atlas. As the number and arrangement of active voxels can vary (12,682 to 17,907 across individuals in the study), different input dimensions are required for each subject.
Since brain functions remain consistent across individuals, even if voxel distributions vary, neuroscience-informed activity mapping within the fMRI encoder (using a modified attention mechanism) allows the system to accommodate these varying input shapes across subjects.
By separating a voxel’s functional information from its raw fMRI value, the model leverages pre-existing knowledge from neuroscience research, improving consistency across individuals.

Brain Instruction Tuning (BIT) further enhances the system’s ability to extract diverse semantic representations from fMRI signals. BIT is an instruction-tuning approach that uses large-scale fMRI datasets, which contain fMRI recordings from multiple people viewing the same images. This multi-subject fMRI data and associated textual annotations strengthen the model’s semantic understanding.
Comprehensive fMRI-to-text benchmarks evaluated the model’s performance, demonstrating superior results in brain captioning, question answering, and reasoning tasks.
MindLLM adapts better to new subjects, improving performance by 16.4% over previous subject-agnostic models. It is 25% more adaptable to new tasks, allowing it to handle different challenges effectively.
The model’s attention patterns show connections between specific brain regions and cognitive functions like perception and reasoning.
Many prior models focus exclusively on generating captions from fMRI signals related to visual stimuli. MindLLM surpasses these limitations by integrating datasets that support knowledge retrieval, symbolic language processing, and complex reasoning.
The inclusion of memory-based tasks, such as retrieving descriptions of previously seen images, strengthens the model’s applicability to cognitive neuroscience. Open-ended question-answering capabilities further extend the range of possible applications, benefiting both medical and research settings.
Established neuroscientific atlases, including those by Glasser and Rolls, provide functional priors that help the model differentiate between voxel positions and activity values. By integrating these standardized mappings, the model maintains both subject generalization and neuroscientific integrity.
Current implementations process static fMRI snapshots, limiting the system’s ability to capture thought progression over time. Future advancements may involve incorporating temporal modeling techniques, such as recurrent architectures or sequential attention mechanisms, to analyze how brain activity patterns evolve.
MindLLM provides interpretable insights into how brain activity translates into semantic information, reinforcing its role as a tool for neuroscientific research. Expanding into real-time fMRI decoding could open new possibilities for neuroprosthetics, mental state tracking, and brain-computer interfaces.
More information:
Weikang Qiu et al, MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding, arXiv (2025). DOI: 10.48550/arxiv.2502.15786
© 2025 Science X Network
Citation:
Direct translation of brain imaging to text with MindLLM (2025, February 28)
retrieved 28 February 2025
from https://medicalxpress.com/news/2025-02-brain-imaging-text-mindllm.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Yale University, Dartmouth College, and the University of Cambridge researchers have developed MindLLM, a subject-agnostic model for decoding functional magnetic resonance imaging (fMRI) signals into text.
Integrating a neuroscience-informed attention mechanism with a large language model (LLM), the model outperforms existing approaches with a 12.0% improvement in downstream tasks, a 16.4% increase in unseen subject generalization, and a 25.0% boost in novel task adaptation compared to prior models like UMBRAE, BrainChat, and UniBrain.
Decoding brain activity into natural language has significant implications for neuroscience and brain-computer interface applications. Previous attempts have faced challenges in predictive performance, limited task variety, and poor generalization across subjects. Existing approaches often require subject-specific parameters, limiting their ability to generalize across individuals.
In the study “MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding,” published on the pre-print server arXiv, MindLLM was evaluated using comprehensive fMRI-to-text benchmarks based on data from eight individuals (NSD—Natural Scenes Dataset), a widely used standard dataset in fMRI research.
The MindLLM design consists of an fMRI encoder and a large language model.
First, the fMRI scans divide the brain into tiny 3D units called voxels (like 3D pixels). Different people have different brain structures that never quite match when aligned to a standardized brain atlas. As the number and arrangement of active voxels can vary (12,682 to 17,907 across individuals in the study), different input dimensions are required for each subject.
Since brain functions remain consistent across individuals, even if voxel distributions vary, neuroscience-informed activity mapping within the fMRI encoder (using a modified attention mechanism) allows the system to accommodate these varying input shapes across subjects.
By separating a voxel’s functional information from its raw fMRI value, the model leverages pre-existing knowledge from neuroscience research, improving consistency across individuals.

Brain Instruction Tuning (BIT) further enhances the system’s ability to extract diverse semantic representations from fMRI signals. BIT is an instruction-tuning approach that uses large-scale fMRI datasets, which contain fMRI recordings from multiple people viewing the same images. This multi-subject fMRI data and associated textual annotations strengthen the model’s semantic understanding.
Comprehensive fMRI-to-text benchmarks evaluated the model’s performance, demonstrating superior results in brain captioning, question answering, and reasoning tasks.
MindLLM adapts better to new subjects, improving performance by 16.4% over previous subject-agnostic models. It is 25% more adaptable to new tasks, allowing it to handle different challenges effectively.
The model’s attention patterns show connections between specific brain regions and cognitive functions like perception and reasoning.
Many prior models focus exclusively on generating captions from fMRI signals related to visual stimuli. MindLLM surpasses these limitations by integrating datasets that support knowledge retrieval, symbolic language processing, and complex reasoning.
The inclusion of memory-based tasks, such as retrieving descriptions of previously seen images, strengthens the model’s applicability to cognitive neuroscience. Open-ended question-answering capabilities further extend the range of possible applications, benefiting both medical and research settings.
Established neuroscientific atlases, including those by Glasser and Rolls, provide functional priors that help the model differentiate between voxel positions and activity values. By integrating these standardized mappings, the model maintains both subject generalization and neuroscientific integrity.
Current implementations process static fMRI snapshots, limiting the system’s ability to capture thought progression over time. Future advancements may involve incorporating temporal modeling techniques, such as recurrent architectures or sequential attention mechanisms, to analyze how brain activity patterns evolve.
MindLLM provides interpretable insights into how brain activity translates into semantic information, reinforcing its role as a tool for neuroscientific research. Expanding into real-time fMRI decoding could open new possibilities for neuroprosthetics, mental state tracking, and brain-computer interfaces.
More information:
Weikang Qiu et al, MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding, arXiv (2025). DOI: 10.48550/arxiv.2502.15786
© 2025 Science X Network
Citation:
Direct translation of brain imaging to text with MindLLM (2025, February 28)
retrieved 28 February 2025
from https://medicalxpress.com/news/2025-02-brain-imaging-text-mindllm.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.