Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01vd66w3092
Title: ML-Based Misinformation Detection in Podcasts
Authors: Han, Cindy
Advisors: Mayer, Jonathan
Department: Computer Science
Class Year: 2022
Abstract: In recent years, the ubiquity of misinformation has spurred a lot of research around using machine learning to classify text-based misinformation in news articles and social media posts. However, there has been little prior work on classifying audio-based misinformation such as podcasts, despite the large quantity of misinformation they facilitate. Using the Spotify Podcast Dataset, I compile a new dataset (PAWcast) containing transcript snippets, their misinformation labels, and other metadata. Using this PAWcast dataset, in addition to five other texual misinformation datasets, I train two state-of-the-art classifiers, one based on LIWC features and one us- ing the BERT model. I then design a new ML classifier (TIGER) that finds a balance between training on the combined datasets and training on a single dataset. The TIGER model achieves a 74% F1 score on the PAWcast dataset (both with and without podcast- specific features). It also achieves a 75% average F1 score across all the datasets, which matches or exceeds the existing state-of-the-art models on each dataset.
URI: http://arks.princeton.edu/ark:/88435/dsp01vd66w3092
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2024

Files in This Item:
File SizeFormat 
HAN-CINDY-THESIS.pdf1.3 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.