Skip navigation
Please use this identifier to cite or link to this item:
Title: Streaming Wars Explained: Pre-Release Classification, Prediction, and Feature Selection of Movie Release Popularity on Streaming Services
Authors: Xie, Annie
Advisors: Carmona, Rene
Department: Operations Research and Financial Engineering
Certificate Program: Applications of Computing Program
Class Year: 2020
Abstract: Streaming Wars Explained: Pre-Release Classification, Prediction, and Feature Selection of Movie Release Popularity on Streaming Services Streaming movies and TV shows are increasingly mainstream. Consumers now have an overwhelming number of choices, both in content and in providers. As preferences shift towards online streaming, providers must create or license popular content to attract and retain customers. How can companies create or license movies that are popular at release and stay popular post-release? Most past research predicted success as measured by box office performance or TV show ratings. While these metrics are representative of traditional cinematic releases and TV show airings, they do not accurately quantify streaming popularity, the metric of success in streaming. Many titles released on streaming providers have never been released in theaters and do not have measurable box office performance. In addition, high ratings do not mean high popularity, as evidenced by low-rated but highly-watched titles. Little work has been done to predict streaming popularity, especially when limited to pre-release data, even as viewership further moves toward streaming services and it becomes more important to understand it. This thesis will focus on pre-release classification and prediction of streaming success, as defined by release popularity in theaters and on providers. Using data known after initial release dates, such as critic ratings, is of little use to streaming providers who increasingly want to create original content instead of licensing existing content for millions of dollars. Instead, this thesis will use pre-release data, measured at least three months before the initial release date, to understand what makes a film or TV show the one that people watch. I used IMDb Pro's MOVIEMeter as a proxy for movie popularity, my chosen metric of success. In an analysis of 60,000 movies and 250,000 cast and crew members, I focused on classifying the top 10% of the raw dataset, translating into the top 400 movies at theatrical release and the top 3,000 movies at provider release. To identify this successful 10%, I ran support vector machines, random forests, and neural networks on 12 permutations of the dataset, balancing the inclusion of a variety of features in exchange for less training data. I also used oversampling and weighted classes to compensate for the imbalanced data. With these models, I classified the top 10% of movies with up to 92% recall and 89% precision by traditional theatrical release and up to 91% recall and 81% precision by streaming provider release. This translates into large revenue gains and cost reductions for streaming providers and movie studios. I also classified movies by increasing, decreasing, or unchanging popularity, measuring the “lifetime" of a movie. I found that highly-popular movies had more constant popularity because they stayed popular and did not experience the fluctuations of unpopular movies. In additional analyses, I used linear, ridge, and lasso regressions to predict, instead of classify, movie popularity. I predicted theatrical release popularity with a coefficient of determination of 72.4% and provider release popularity with a coefficient of 63.2%. I also selected the most pertinent features which included the past movie release popularity of actors, directors, and actor-director collaborations, and the pre-release popularity of the cast as a whole and of the individual cast and crew members. These features are consistent with past literature, with the exception of the composer's popularity and award recognition, for which all prior researchers did not collect data. The pertinent features relate to an audience's familiarity with the cast, crew, or franchise, which is understandable for a high-cost, rare-success industry. However, none of these individual selected features singularly “predict" movie popularity. It is when these features are taken in combination with each other—and their connections and dependencies are analyzed—in the classification and prediction models that we can begin to forecast movie success. Using these models may even be able to help companies take risks on new artists and ideas.
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Operations Research and Financial Engineering, 2000-2020

Files in This Item:
File Description SizeFormat 
XIE-ANNIE-THESIS.pdf4.57 MBAdobe PDF    Request a copy

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.