Adrien Doerig | Visual representations in the human brain align with large language models

Guest Lecture

Date: Apr 12, 2024
Time: 10:00 AM - 11:00 AM (Local Time Germany)
Speaker: Adrien Doerig
University of Osnabrück
Location: MPI for Human Cognitive and Brain Sciences
Room: Wilhelm Wundt Room (A400) + Zoom Meeting (hybrid mode)
Host: CBS CoCoNUT
Contact: pn-katja.seeliger@cbs.mpg.de

Please join via zoom: https://zoom.us/j/98507777689

An intriguing recent finding in artificial intelligence is that linguistic representations improve the processing of visual inputs, suggesting a deep connection between vision and language. Here, we uncover a similar connection between vision and language representations in the human brain. We demonstrate that large language models (LLMs)trained solely on textual datayield representations that can be linearly mapped to brain representations evoked by visually presented natural scenes. This mapping successfully captures selectivities of different brain areas, and is sufficiently robust that accurate scene captions can be reconstructed from visually evoked activity alone. Using carefully controlled model comparisons, we then proceed to show that the accuracy with which LLM representations match brain representations derives from their ability to integrate complex information contained in scene captions beyond that conveyed by individual words. Finally, we train deep neural network models to transform raw image inputs into LLM representations. Remarkably, these networks learn representations that are better aligned with brain representations than a large number of alternative models, despite being trained on orders-of-magnitude less data. Overall, our results suggest that the computations of the visual brain may converge towards a representational format that can be derived from language alone.