Evaluating Contrast Localizer for Identifying Causal Units in Social & Mathematical Tasks in Language Models
Yassine Jamaa , Badr AlKhamissi, Satrajit S Ghosh , Martin Schrimpf
Identifiers and access
- DOI
- 10.48550/arxiv.2508.08276
- arXiv
- 2508.08276
- Open-access copy →
Key findings
Adapting a neuroscientific contrast localizer to 11 LLMs and 5 VLMs (3B-90B params), targeted ablation of functionally-selected units sometimes hurt task accuracy less than ablating low-activation units, and math-localizer units impaired Theory-of-Mind performance more than ToM-localizer units - questioning the causal relevance of contrast-based localizers in language models.
Abstract
Source: arxiv
This work adapts a neuroscientific contrast localizer to pinpoint causally relevant units for Theory of Mind (ToM) and mathematical reasoning tasks in large language models (LLMs) and vision-language models (VLMs). Across 11 LLMs and 5 VLMs ranging in size from 3B to 90B parameters, we localize top-activated units using contrastive stimulus sets and assess their causal role via targeted ablations. We compare the effect of lesioning functionally selected units against low-activation and randomly selected units on downstream accuracy across established ToM and mathematical benchmarks. Contrary to expectations, low-activation units sometimes produced larger performance drops than the highly activated ones, and units derived from the mathematical localizer often impaired ToM performance more than those from the ToM localizer. These findings call into question the causal relevance of contrast-based localizers and highlight the need for broader stimulus sets and more accurately capture task-specific units.
Topics
- ml-nlp-knowledge
Lab authors
This record was curated from the lab's CV, NCBI MyBibliography, and OpenAlex. See PROJECTS.md for how to add or correct an entry via a pull request.