Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01c821gp08n
Title: Understanding the Role of Speech vs. Language in Communication Using Deep-Learning Models
Authors: Singh, Aditi
Advisors: Hasson, Uri
Department: Computer Science
Class Year: 2023
Abstract: Humans have an impressive capacity to use speech to communicate with each other. Infants quickly learn this mapping from speech to language, predominantly by example and without explicit instruction. While speech and language are irrevocably interdependent—barring non-verbal methods of communication like sign language—the study of communication has largely dichotomized speech and language. This is particularly true of deep-learning models used to understand brain behavior during communication, where more emphasis has been placed on text-based language models like GPT-2. In this paper, to understand the relationship between speech and language, I considered a new multimodal deep-learning model called Whisper that is trained on two modalities: speech and language. I used an electrocorticography (ECoG) dataset of brain recordings of nine participants as they listen to a 30-minute narrative, and compared how well a unimodal (GPT-2) and multimodal (Whisper) deep-learning model can predict brain behavior during language comprehension. This research was able to show that Whisper outperforms GPT-2 in most cases even though Whisper has a considerably smaller training data set. My results also revealed that there is a direct correspondence between model architecture and the brain areas where the model best predicts behavior, and brain areas that were previously understood to carry out either high- or low-level language-processing, may in fact do both. This work reaffirms the theory that there are shared principles between the brain and deep-learning models, shows evidence that there is an overlap between acoustic and semantic information in language, and suggests that future deep-learning models should establish a multimodal training framework to better understand the neural behavior of language.
URI: http://arks.princeton.edu/ark:/88435/dsp01c821gp08n
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File Description SizeFormat 
SINGH-ADITI-THESIS.pdf4.58 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.