Skip navigation
Please use this identifier to cite or link to this item:
Title: A Comparison of Clustering Algorithms in the Study of Hate Crime and Discrimination in India
Authors: Joshi, Prachi
Advisors: Wang, Mengdi
Department: Operations Research and Financial Engineering
Certificate Program: Applications of Computing Program
Class Year: 2019
Abstract: In the past several years, India has seen a rise in well-publicized incidences of hate crime and discrimination. While there are no official reports on these incidences, a number of organizations have recently started to collect and publish data on them. This thesis looks at one of these datasets in an attempt to understand any patterns in contemporary incidences of hate crime and discrimination in India. In the process of doing so, we compare the performance of k-means clustering against k-medians and k-medoids, two algorithms that offer more representative cluster centers for the heavily categorical data. We find that k-means is by far the most stable on our dataset. We find five clusters in the data, with incidents primarily grouped together by cause, the nature of the violence, and the identity of the victims. In addition, this thesis examines the relationship between the number of victims per incident and the other features of an incident using sparse linear regression. We find that there are 17 significant binary variables that explain 16% of the variability in the number of victims per incident. Specifically, our top three variables, all describing the nature of the violence, explain 9% of the variability. Variables describing the cause, the identity of the victims, and the state that the incident took place in were also significant.
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Operations Research and Financial Engineering, 2000-2020

Files in This Item:
File Description SizeFormat 
JOSHI-PRACHI-THESIS.pdf1.75 MBAdobe PDF    Request a copy

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.