Adrien Doerig | Visual representations in the human brain align with large language models

Guest Lecture

  • Date: Apr 12, 2024
  • Time: 10:00 AM - 11:00 AM (Local Time Germany)
  • Speaker: Adrien Doerig
  • University of Osnabrück
  • Location: MPI for Human Cognitive and Brain Sciences
  • Room: Wilhelm Wundt Room (A400) + Zoom Meeting (hybrid mode)
  • Host: CBS CoCoNUT
  • Contact: pn-katja.seeliger@cbs.mpg.de
Please join via zoom: https://zoom.us/j/98507777689
An intriguing recent finding in artificial intelligence is that linguistic representations improve the processing of visual inputs, suggesting a deep connection between vision and language. Here, we uncover a similar connection between vision and language representations in the human brain. We demonstrate that large language models (LLMs)—trained solely on textual data—yield representations that can be linearly mapped to brain representations evoked by visually presented natural scenes. This mapping successfully captures selectivities of different brain areas, and is sufficiently robust that accurate scene captions can be reconstructed from visually evoked activity alone. Using carefully controlled model comparisons, we then proceed to show that the accuracy with which LLM representations match brain representations derives from their ability to integrate complex information contained in scene captions beyond that conveyed by individual words. Finally, we train deep neural network models to transform raw image inputs into LLM representations. Remarkably, these networks learn representations that are better aligned with brain representations than a large number of alternative models, despite being trained on orders-of-magnitude less data. Overall, our results suggest that the computations of the visual brain may converge towards a representational format that can be derived from language alone.
Go to Editor View