Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01cj82kb49k
Title: | Ready, Dataset, Go: Using Rule- and Semantic-Based Augmentation to Improve Disease Prediction ML Models |
Authors: | Njoku, Raphael |
Advisors: | Jha, Niraj |
Department: | Electrical and Computer Engineering |
Certificate Program: | Program in Cognitive Science Robotics & Intelligent Systems Program |
Class Year: | 2022 |
Abstract: | Disease diagnosis has been conducted manually by health professionals who assess conditions and then provide patients with medical solutions. Machine learning methods are faster and more accurate in consuming large portions of data and providing a diagnosis or decision based on the data. While efforts for optimizing the machine learning algorithms exist, barely any exploration into optimizing the datasets themselves exists. This thesis explores different methods for augmenting disease datasets to improve prediction accuracy when used to train machine learning models. Manual rule-based augmentation, SECRET, and latent semantic analysis are reviewed and tested. The experiment finds that SECRET is successful for disease datasets, latent semantic analysis requires further optimization than this paper has achieved, and manual rule-based augmentation is ineffective. Additionally, this thesis reviews popular machine learning models such as decision trees, MLPs, RF, and KNNs. This thesis concludes that decision tree models are the most effective on tabular data and that semantic representation methods are the most promising for increasing prediction accuracy through dataset augmentation. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01cj82kb49k |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Electrical and Computer Engineering, 1932-2023 Robotics and Intelligent Systems Program |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
NJOKU-RAPHAEL-THESIS.pdf | 748.12 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.