Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01c821gp08n
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Hasson, Uri | - |
dc.contributor.author | Singh, Aditi | - |
dc.date.accessioned | 2023-07-28T18:19:55Z | - |
dc.date.available | 2023-07-28T18:19:55Z | - |
dc.date.created | 2023-04-20 | - |
dc.date.issued | 2023-07-28 | - |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/dsp01c821gp08n | - |
dc.description.abstract | Humans have an impressive capacity to use speech to communicate with each other. Infants quickly learn this mapping from speech to language, predominantly by example and without explicit instruction. While speech and language are irrevocably interdependent—barring non-verbal methods of communication like sign language—the study of communication has largely dichotomized speech and language. This is particularly true of deep-learning models used to understand brain behavior during communication, where more emphasis has been placed on text-based language models like GPT-2. In this paper, to understand the relationship between speech and language, I considered a new multimodal deep-learning model called Whisper that is trained on two modalities: speech and language. I used an electrocorticography (ECoG) dataset of brain recordings of nine participants as they listen to a 30-minute narrative, and compared how well a unimodal (GPT-2) and multimodal (Whisper) deep-learning model can predict brain behavior during language comprehension. This research was able to show that Whisper outperforms GPT-2 in most cases even though Whisper has a considerably smaller training data set. My results also revealed that there is a direct correspondence between model architecture and the brain areas where the model best predicts behavior, and brain areas that were previously understood to carry out either high- or low-level language-processing, may in fact do both. This work reaffirms the theory that there are shared principles between the brain and deep-learning models, shows evidence that there is an overlap between acoustic and semantic information in language, and suggests that future deep-learning models should establish a multimodal training framework to better understand the neural behavior of language. | en_US |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | en_US |
dc.title | Understanding the Role of Speech vs. Language in Communication Using Deep-Learning Models | en_US |
dc.type | Princeton University Senior Theses | |
pu.date.classyear | 2023 | en_US |
pu.department | Computer Science | en_US |
pu.pdf.coverpage | SeniorThesisCoverPage | |
pu.contributor.authorid | 920227662 | |
pu.mudd.walkin | No | en_US |
Appears in Collections: | Computer Science, 1987-2024 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
SINGH-ADITI-THESIS.pdf | 4.58 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.