Skip navigation
Please use this identifier to cite or link to this item:
Title: A practical clustering algorithm based on k-NN density mode estimation with provable learning bounds
Authors: Jiang, Hanxi Heinrich
Advisors: Kpotufe, Samory
Contributors: Singer, Amit
Department: Mathematics
Class Year: 2015
Abstract: We present a new clustering algorithm based on first estimating the modes of the unknown density f from sample points generated by f. Our notion of mode is generalized to include points with densities similar to that of a local mode of f. Thus, we first identify sample points which can be confidently clustered together. We call these the cluster cores. The next step then consists of assigning the remaining points to cluster cores. We make mild assumptions on the underlying distribution and the shapes of the clusters. We develop mode estimation procedures based on the k-NN density estimate and use high probability finite-sample uniform bounds on the k-NN density estimator to obtain theoretically strong guarantees. The procedure can be implemented in O(nk log n) time. We show that the clustering algorithm is competitive against the state-of-art (including DBSCAN, mean-shift, and spectral clustering) based on simulations and real data experiments.
Extent: 69 pages
Type of Material: Princeton University Senior Theses
Language: en_US
Appears in Collections:Mathematics, 1934-2016

Files in This Item:
File SizeFormat 
PUTheses2015-Jiang_Hanxi_Heinrich.pdf12.11 MBAdobe PDF    Request a copy

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.