Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01vd66w3092
Title: | ML-Based Misinformation Detection in Podcasts |
Authors: | Han, Cindy |
Advisors: | Mayer, Jonathan |
Department: | Computer Science |
Class Year: | 2022 |
Abstract: | In recent years, the ubiquity of misinformation has spurred a lot of research around using machine learning to classify text-based misinformation in news articles and social media posts. However, there has been little prior work on classifying audio-based misinformation such as podcasts, despite the large quantity of misinformation they facilitate. Using the Spotify Podcast Dataset, I compile a new dataset (PAWcast) containing transcript snippets, their misinformation labels, and other metadata. Using this PAWcast dataset, in addition to five other texual misinformation datasets, I train two state-of-the-art classifiers, one based on LIWC features and one us- ing the BERT model. I then design a new ML classifier (TIGER) that finds a balance between training on the combined datasets and training on a single dataset. The TIGER model achieves a 74% F1 score on the PAWcast dataset (both with and without podcast- specific features). It also achieves a 75% average F1 score across all the datasets, which matches or exceeds the existing state-of-the-art models on each dataset. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01vd66w3092 |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Computer Science, 1987-2024 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
HAN-CINDY-THESIS.pdf | 1.3 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.