Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01wp988n13p
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorWalker, David-
dc.contributor.authorFillmore, Mark-
dc.date.accessioned2015-06-26T13:44:37Z-
dc.date.available2015-06-26T13:44:37Z-
dc.date.created2015-04-30-
dc.date.issued2015-06-26-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01wp988n13p-
dc.description.abstractData modeling is an already difficult task that is further exacerbated by the errors of data entry. Inconsistencies in large quantities of data can make it difficult to perform any kind of automated analyses. We motivate our investigation into improved data cleaning methods by revealing disastrous non-uniformity in data related to the controversial Stop and Frisk policy as implemented by the NYPD. These inconsistencies help guide our construction of workflow F, which consults multiple similarity measurements in order to dictate proper transformations of non-uniform data into standardized values. F increases the volume of non-standardized data that is correctly transformed by 887% in comparison to common existing methods, such as the Levenshtein distance. We conclude by presenting additional pathways for improvement and describing how to most effectively apply workflow F as part of an interactive tool.en_US
dc.format.extent44 pagesen_US
dc.language.isoen_USen_US
dc.titleAlgorithms for Data Normalization with Applications to Stop and Frisken_US
dc.typePrinceton University Senior Theses-
pu.date.classyear2015en_US
pu.departmentComputer Scienceen_US
pu.pdf.coverpageSeniorThesisCoverPage-
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File SizeFormat 
PUTheses2015-Fillmore_Mark.pdf858.51 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.