Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018p58pd125
 Title: Predicting Netflix Movie Ratings using a Topic Modeling Algorithm Authors: Zhu, Michael Advisors: Arora, Sanjeev Contributors: Singer, Amit Department: Mathematics Class Year: 2014 Abstract: Latent factor models and matrix factorization algorithms were some of the most successful stand-alone algorithms used for predicting movie ratings in the Netflix Prize. To address the sparsity in the movie rating training set, many matrix factorization algorithms train only on the observed ratings and use regularization to avoid overfitting. Topic modeling algorithms must also be able to handle high sparsity. Given a collection of documents, the purpose of topic modeling is to discover the high-level thematic structure that best explains the collection of documents as a whole. In the same way, we might hope that given a collection of movie ratings, we can uncover the high-level movie genres that best explain the collection of movie ratings as a whole. Mathematically, topic modeling can be interpreted as recovering the first factor in a matrix factorization, subject to some constraints. By this view, perhaps a topic modeling algorithm can be the first step in a matrix factorization algorithm that predicts Netflix movie ratings. In this thesis, we develop a three-step algorithm for predicting movie ratings using a matrix factorization of the form M = AW: first we obtain a collection of genres using a topic modeling algorithm, then we generate a suitable A matrix from the collection of genres, and finally we use the A matrix to get the W matrix. Extent: 23 pages URI: http://arks.princeton.edu/ark:/88435/dsp018p58pd125 Type of Material: Princeton University Senior Theses Language: en_US Appears in Collections: Mathematics, 1934-2016

Files in This Item:
File SizeFormat