Skip navigation
Please use this identifier to cite or link to this item:
Title: Nonnegative Matrix Factorization: An Empirical Analysis
Authors: Collins, Liam
Advisors: Chen, Yuxin
Brinton, Christopher
Department: Electrical Engineering
Certificate Program: Applications of Computing Program
Class Year: 2019
Abstract: Dimensionality-reduction and information-extraction techniques are becoming increasingly valuable as more data is collected. Yet popular techniques such as principal component analysis (PCA) and the singular value decomposition (SVD) generate factors with negative entries, severely diminishing their interpretability for nonnegative data. Nonnegative matrix factorization (NMF) is a dimensionality-reduction and information-extraction technique that aims to obtain two low-dimensional, nonnegative factors of a nonnegative dataset, where one factor is a matrix of features and the other is a matrix of weights. These features and weights reveal useful properties of the data, as their nonnegativity implies an additive features model which has insightful physical interpretations for many applications. However, with greater potential insight comes greater computational difficulty: the nonnegativity constraints on the features and weights make computing a globally optimal version of them a nonconvex and NP-hard problem. There exist many iterative heuristics to solve NMF that tend to converge to an effective solution in practice, but lack general performance guarantees - their behavior is still not yet thoroughly understood, especially in relation to the type of data being factored. Conversely, numerous algorithms have recently been developed with provable error bounds under certain assumptions on the data, but their practicality is questionable. Meanwhile, many initialization methods have been developed to augment the performance of NMF algorithms, yet comparative experimentation and analysis quantifying their efficacy remains insufficient in the literature. In this paper, we comprehensively evaluate the performance of the most popular NMF algorithms and initialization techniques over varying characteristics of synthetic and real datasets in order to paint a clearer picture of when certain methods perform better and worse than others. Our analysis includes extensive background of the theory and intuition behind each technique and assessment of how this theory and intuition plays out in practice. We also evaluate how well NMF algorithms solve a particular practical problem, namely extracting latent variables from an educational dataset. Our results suggest that there is not one algorithm nor initialization technique that always performs the best, so for optimal performance, the NMF algorithm and initialization must be chosen based on the specific problem setting, and our work provides guidance for doing so.
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Electrical and Computer Engineering, 1932-2023

Files in This Item:
File Description SizeFormat 
COLLINS-LIAM-THESIS.pdf4.23 MBAdobe PDF    Request a copy

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.