publications
2025
- What does it mean to understand language?Colton Casto, Anna Ivanova, Evelina Fedorenko, and Nancy KanwisherarXiv, 2025
Language understanding entails not just extracting the surface-level meaning of the linguistic input, but constructing rich mental models of the situation it describes. Here we propose that because processing within the brain’s core language system is fundamentally limited, deeply understanding language requires exporting information from the language system to other brain regions that compute perceptual and motor representations, construct mental models, and store our world knowledge and autobiographical memories. We review the existing evidence for this hypothesis, and argue that recent progress in cognitive neuroscience provides both the conceptual foundation and the methods to directly test it, thus opening up a new strategy to reveal what it means, cognitively and neurally, to understand language.
@article{Casto2025, title = {What does it mean to understand language?}, author = {Casto, Colton and Ivanova, Anna and Fedorenko, Evelina and Kanwisher, Nancy}, journal = {arXiv}, volume = {}, issue = {}, pages = {1-118}, year = {2025}, url = {https://arxiv.org/abs/2511.19757}, doi = {10.48550/arXiv.2511.19757}, } - The cerebellar components of the human language networkColton Casto, Moshe Poliak, Greta Tuckute, Hannah Small, Patrick Sherlock, Agata Wolna, Benjamin Lipkin, Anila M. D’Mello, and Evelina FedorenkobioRxiv, 2025
The cerebellum’s capacity for neural computation is arguably unmatched. Yet despite now ample evidence of cerebellar contributions to cognition, including language, its precise role in language processing remains debated. Here, we systematically characterize cerebellar language-responsive regions using precision fMRI. We identify four cerebellar regions that respond to language across modalities (Experiments 1a-b, n=754). One region—spanning Crus I/II/lobule VIIb—is selective for language relative to diverse non-linguistic perceptual, cognitive, and motor tasks (Experiments 2a-f, n=732), and the rest exhibit mixed-selective profiles, responding strongly to language but also to one or more of the non-linguistic conditions. Similar to the neocortical language system, the language-selective region is engaged by sentence-level meanings during comprehension and production (Experiments 3a-b, n=100) and shows fine-grained sensitivity to linguistic processing difficulty (Experiment 3c, n=5). Further, this region’s response to language is not due to the frequent presence of social content in language, as it is strongly engaged by both social and nonsocial sentences (Experiment 3d, n=10). Finally, all four regions, but especially Crus I/II/VIIb, are functionally connected to the neocortical language system (Experiment 4, n=85). We propose that these cerebellar regions constitute components of the extended language network, with one region supporting linguistic semantic processing and closely mirroring the selectivity of the neocortical language network, and the other three plausibly integrating information from diverse neocortical regions.
@article{Casto2026, title = {The cerebellar components of the human language network}, author = {Casto, Colton and Poliak, Moshe and Tuckute, Greta and Small, Hannah and Sherlock, Patrick and Wolna, Agata and Lipkin, Benjamin and D'Mello, Anila M. and Fedorenko, Evelina}, journal = {bioRxiv}, volume = {}, issue = {}, pages = {1-118}, year = {2025}, url = {https://www.biorxiv.org/content/10.1101/2025.04.14.645351v2.abstract}, doi = {10.1101/2025.04.14.645351}, } - The extended language network: Language selective brain areas whose contributions to language remain to be discoveredAgata Wolna, Aaron Wright, Colton Casto, Benjamin Lipkin, and Evelina FedorenkobioRxiv, 2025
Although language neuroscience has largely focused on ‘core’ left frontal and temporal brain areas and their right-hemisphere homotopes, numerous other areas—cortical, subcortical, and cerebellar—have been implicated in linguistic processing. However, these areas’ contributions to language remain unclear given that the evidence for their recruitment comes from diverse paradigms, many of which conflate language processing with perceptual, motor, or task-related cognitive processes. Using fMRI data from 772 participants performing an extensively-validated language ‘localizer’ paradigm that isolates language processing from other processes, we a) delineate a comprehensive set of areas that respond reliably to language across written and auditory modalities, and b) evaluate these areas’ selectivity for language relative to a demanding non-linguistic task. In line with prior claims, many areas outside the core fronto-temporal network respond during language processing, and most of them show selectivity for language relative to general task demands. These language-selective areas of the extended language network include areas around the temporal poles, in the medial frontal cortex, in the hippocampus, and in the cerebellum, among others. Although distributed across many parts of the brain, the extended language-selective network still only comprises ∼1.2% of the brain’s volume and is about the size of a strawberry, challenging the view that language processing is broadly distributed across the cortical surface. These newly identified language-selective areas can now be systematically characterized to decipher their contributions to language processing, including testing whether these contributions differ from those of the core language areas.
@article{Wolna2025, title = {The extended language network: Language selective brain areas whose contributions to language remain to be discovered}, author = {Wolna, Agata and Wright, Aaron and Casto, Colton and Lipkin, Benjamin and Fedorenko, Evelina}, journal = {bioRxiv}, volume = {}, issue = {}, pages = {1-33}, year = {2025}, url = {https://www.biorxiv.org/content/10.1101/2025.04.02.646835v2.abstract}, doi = {10.1101/2025.04.02.646835}, }
2024
- Universality of representation in biological and artificial neural networksEghbal A. Hosseini, Colton Casto, Noga Zaslavsky, Colin Conwell, Mark Richardson, and Evelina FedorenkobioRxiv, 2024
Many artificial neural networks (ANNs) trained with ecologically plausible objectives on naturalistic data align with behavior and neural representations in biological systems. Here, we show that this alignment is a consequence of convergence onto the same representations by high-performing ANNs and by brains. We developed a method to identify stimuli that systematically vary the degree of inter-model representation agreement. Across language and vision, we then showed that stimuli from high- and low-agreement sets predictably modulated model-to-brain alignment. We also examined which stimulus features distinguish high- from low-agreement sentences and images. Our results establish representation universality as a core component in the model-to-brain alignment and provide a new approach for using ANNs to uncover the structure of biological representations and computations.
@article{Hosseini2024, title = {Universality of representation in biological and artificial neural networks}, author = {Hosseini, Eghbal A. and Casto, Colton and Zaslavsky, Noga and Conwell, Colin and Richardson, Mark and Fedorenko, Evelina}, journal = {bioRxiv}, volume = {}, issue = {}, pages = {1-70}, year = {2024}, url = {https://www.biorxiv.org/content/10.1101/2024.12.26.629294v1.abstract}, doi = {10.1101/2024.12.26.629294}, } - Neural populations in the language network differ in the size of their temporal receptive windowsTamar Regev*, Colton Casto*, Eghbal A. Hosseini, Markus Adamek, Anthony L. Ritaccio, Jon T. Willie, Peter Brunner, and Evelina FedorenkoNature Human Behavior, 2024
Despite long knowing what brain areas support language comprehension, our knowledge of the neural computations that these frontal and temporal regions implement remains limited. One important unresolved question concerns functional differences among the neural populations that comprise the language network. Here we leveraged the high spatiotemporal resolution of human intracranial recordings (n = 22) to examine responses to sentences and linguistically degraded conditions. We discovered three response profiles that differ in their temporal dynamics. These profiles appear to reflect different temporal receptive windows, with average windows of about 1, 4 and 6 words, respectively. Neural populations exhibiting these profiles are interleaved across the language network, which suggests that all language regions have direct access to distinct, multiscale representations of linguistic input—a property that may be critical for the efficiency and robustness of language processing.
@article{RegevCasto2024, title = {Neural populations in the language network differ in the size of their temporal receptive windows}, author = {Regev, Tamar and Casto, Colton and Hosseini, Eghbal A. and Adamek, Markus and Ritaccio, Anthony L. and Willie, Jon T. and Brunner, Peter and Fedorenko, Evelina}, journal = {Nature Human Behavior}, volume = {8}, issue = {}, pages = {1924-1942}, year = {2024}, url = {https://www.nature.com/articles/s41562-024-01944-2}, doi = {10.1038/s41562-024-01944-2}, } - Information-making processes in the speaker’s brain drive human conversations forwardAriel Goldstein, Haocheng Wang, Tom Sheffer, Mariano Schain, Zaid Zada, Leonard Niekerken, Bobbi Aubrey, Samuel A. Nastase, Harshvardhan Gazula, Colton Casto, Werner K. Doyle, Daniel Friedman, Sasha Devore, Patricia Dugan, Avinatan Hassidim, Michael Brenner, Yossi Matias, Orrin Devinsky, Adeen Flinker, and Uri HassonbioRxiv, 2024
A conversation following an overly predictable pattern is likely boring and uninformative; conversely, if it lacks structure, it is likely nonsensical. The delicate balance between predictability and surprise has been well studied using information theory during speech perception, focusing on how listeners predict upcoming words based on context and respond to unexpected information. However, less is known about how speakers’ brains generate structured yet surprisingly informative speech. This study uses continuous electrocorticography (ECoG) recordings during free, 24/7 conversations to investigate the neural basis of speech production and comprehension. We employed large language models (Llama-2 and GPT-2) to calculate word probabilities based on context and categorized words into probable (top 30%) and improbable (bottom 30%) groups. We then extracted word embeddings from the LLMs and used encoding models to estimate the neural activity while producing or listening to probable and improbable words. Our findings indicate that before word-onset, the human brain functions in opposing, perhaps complementary, ways while listening and speaking. Results show that listeners exhibit increased neural encoding for predictable words before word onset, while speakers show increased encoding for surprising, improbable words. Speakers also show a lower speech production rate before articulating unexpected words, suggesting additional cognitive processes are involved in producing novel information. This indicates that human speech production includes information-making processes for generating informative words that are absent in language models, which primarily rely on statistical probabilities to generate contextually appropriate speech.
@article{Goldstein2024, title = {Information-making processes in the speaker's brain drive human conversations forward}, author = {Goldstein, Ariel and Wang, Haocheng and Sheffer, Tom and Schain, Mariano and Zada, Zaid and Niekerken, Leonard and Aubrey, Bobbi and Nastase, Samuel A. and Gazula, Harshvardhan and Casto, Colton and Doyle, Werner K. and Friedman, Daniel and Devore, Sasha and Dugan, Patricia and Hassidim, Avinatan and Brenner, Michael and Matias, Yossi and Devinsky, Orrin and Flinker, Adeen and Hasson, Uri}, journal = {bioRxiv}, volume = {}, issue = {}, pages = {1-21}, year = {2024}, url = {https://www.biorxiv.org/content/10.1101/2024.08.27.609946v1.abstract}, doi = {10.1101/2024.08.27.609946}, } - Distributed sensitivity to syntax and semantics throughout the language networkCory Shain*, Hope Kean*, Colton Casto, Benjamin Lipkin, Josef Affourtit, Matthew Siegelman, Francis Mollica, and Evelina FedorenkoJournal of Cognitive Neuroscience, 2024
Human language is expressive because it is compositional: The meaning of a sentence (semantics) can be inferred from its structure (syntax). It is commonly believed that language syntax and semantics are processed by distinct brain regions. Here, we revisit this claim using precision fMRI methods to capture separation or overlap of function in the brains of individual participants. Contrary to prior claims, we find distributed sensitivity to both syntax and semantics throughout a broad frontotemporal brain network. Our results join a growing body of evidence for an integrated network for language in the human brain within which internal specialization is primarily a matter of degree rather than kind, in contrast with influential proposals that advocate distinct specialization of different brain areas for different types of linguistic functions.
@article{ShainKean2024, title = {Distributed sensitivity to syntax and semantics throughout the language network}, author = {Shain, Cory and Kean, Hope and Casto, Colton and Lipkin, Benjamin and Affourtit, Josef and Siegelman, Matthew and Mollica, Francis and Fedorenko, Evelina}, journal = {Journal of Cognitive Neuroscience}, volume = {36}, issue = {7}, pages = {1427-1471}, year = {2024}, url = {https://direct.mit.edu/jocn/article/36/7/1427/120796}, doi = {10.1162/jocn_a_02164}, }
2022
- Shared computational principles for language processing in humans and deep language modelsAriel Goldstein, Zaid Zada, Eliav Buchnik, Mariano Schain, Amy Price, Samuel A. Nastase, Amir Feder, Dotan Emanuel, Alon Cohen, Aren Jansen, Harshvardhan Gazula, Gina Choe, Aditi Rao, Catherine Kim, Colton Casto, Lora Fanda, Werner Doyle, Daniel Friedman, Patricia Dugan, Lucia Melloni, Roi Reichart, Sasha Devore, Adeen Flinker, Liat Hasenfratz, Omar Levy, Avinatan Hassidim, Michael Brenner, Yossi Matias, Kenneth A. Norman, Orrin Devinsky, and Uri HassonNature Neuroscience, 2022
Departing from traditional linguistic models, advances in deep learning have resulted in a new type of predictive (autoregressive) deep language models (DLMs). Using a self-supervised next-word prediction task, these models generate appropriate linguistic responses in a given context. In the current study, nine participants listened to a 30-min podcast while their brain responses were recorded using electrocorticography (ECoG). We provide empirical evidence that the human brain and autoregressive DLMs share three fundamental computational principles as they process the same natural narrative: (1) both are engaged in continuous next-word prediction before word onset; (2) both match their pre-onset predictions to the incoming word to calculate post-onset surprise; (3) both rely on contextual embeddings to represent words in natural contexts. Together, our findings suggest that autoregressive DLMs provide a new and biologically feasible computational framework for studying the neural basis of language.
@article{Goldstein2022, title = {Shared computational principles for language processing in humans and deep language models}, author = {Goldstein, Ariel and Zada, Zaid and Buchnik, Eliav and Schain, Mariano and Price, Amy and Nastase, Samuel A. and Feder, Amir and Emanuel, Dotan and Cohen, Alon and Jansen, Aren and Gazula, Harshvardhan and Choe, Gina and Rao, Aditi and Kim, Catherine and Casto, Colton and Fanda, Lora and Doyle, Werner and Friedman, Daniel and Dugan, Patricia and Melloni, Lucia and Reichart, Roi and Devore, Sasha and Flinker, Adeen and Hasenfratz, Liat and Levy, Omar and Hassidim, Avinatan and Brenner, Michael and Matias, Yossi and Norman, Kenneth A. and Devinsky, Orrin and Hasson, Uri}, journal = {Nature Neuroscience}, volume = {25}, issue = {}, pages = {369-380}, year = {2022}, url = {https://www.nature.com/articles/s41593-022-01026-4}, doi = {10.1038/s41593-022-01026-4}, }