Please use this identifier to cite or link to this item:
|Title:||Predicting Netflix Movie Ratings using a Topic Modeling Algorithm|
|Abstract:||Latent factor models and matrix factorization algorithms were some of the most successful stand-alone algorithms used for predicting movie ratings in the Netflix Prize. To address the sparsity in the movie rating training set, many matrix factorization algorithms train only on the observed ratings and use regularization to avoid overfitting. Topic modeling algorithms must also be able to handle high sparsity. Given a collection of documents, the purpose of topic modeling is to discover the high-level thematic structure that best explains the collection of documents as a whole. In the same way, we might hope that given a collection of movie ratings, we can uncover the high-level movie genres that best explain the collection of movie ratings as a whole. Mathematically, topic modeling can be interpreted as recovering the first factor in a matrix factorization, subject to some constraints. By this view, perhaps a topic modeling algorithm can be the first step in a matrix factorization algorithm that predicts Netflix movie ratings. In this thesis, we develop a three-step algorithm for predicting movie ratings using a matrix factorization of the form M = AW: first we obtain a collection of genres using a topic modeling algorithm, then we generate a suitable A matrix from the collection of genres, and finally we use the A matrix to get the W matrix.|
|Type of Material:||Princeton University Senior Theses|
|Appears in Collections:||Mathematics, 1934-2016|
Files in This Item:
|Michael Zhu thesis.pdf||607.5 kB||Adobe PDF||Request a copy|
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.