Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01cj82kb49k
Title: Ready, Dataset, Go: Using Rule- and Semantic-Based Augmentation to Improve Disease Prediction ML Models
Authors: Njoku, Raphael
Advisors: Jha, Niraj
Department: Electrical and Computer Engineering
Certificate Program: Program in Cognitive Science
Robotics & Intelligent Systems Program
Class Year: 2022
Abstract: Disease diagnosis has been conducted manually by health professionals who assess conditions and then provide patients with medical solutions. Machine learning methods are faster and more accurate in consuming large portions of data and providing a diagnosis or decision based on the data. While efforts for optimizing the machine learning algorithms exist, barely any exploration into optimizing the datasets themselves exists. This thesis explores different methods for augmenting disease datasets to improve prediction accuracy when used to train machine learning models. Manual rule-based augmentation, SECRET, and latent semantic analysis are reviewed and tested. The experiment finds that SECRET is successful for disease datasets, latent semantic analysis requires further optimization than this paper has achieved, and manual rule-based augmentation is ineffective. Additionally, this thesis reviews popular machine learning models such as decision trees, MLPs, RF, and KNNs. This thesis concludes that decision tree models are the most effective on tabular data and that semantic representation methods are the most promising for increasing prediction accuracy through dataset augmentation.
URI: http://arks.princeton.edu/ark:/88435/dsp01cj82kb49k
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Electrical and Computer Engineering, 1932-2023
Robotics and Intelligent Systems Program

Files in This Item:
File Description SizeFormat 
NJOKU-RAPHAEL-THESIS.pdf748.12 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.