Patricia Chen | 2026 I.S. Symposium

Name: Patricia Chen
Title: Deriving Predictive Models for Film Success Using Statistical and Machine-Learning Methods
Major: Mathematics
Minor: Statistical and Data Sciences
Advisor: Robert Kelvey
Predicting the success of films has been a central challenge for the movie industry, where billions of dollars are invested into projects whose outcomes remain highly uncertain. This independent study explores whether pre-release factors are helpful in forecasting film success through mathematically derived statistical and machine learning techniques. Utilizing data from IMDb, Rotten Tomatoes, Metacritic, the Academy awards, the Golden Globes, and budget and revenue records, a cumulative dataset was assembled to capture a wide range of film characteristics such as release month, runtime, MPAA rating, genre, and more. Two predictive models were developed. The first one is a risk scoring model derived from logistic regression, where regression鈥檚 coefficients are transformed into an interpretable points-based scoring system to predict breakeven proportion. The resulting scores show a clear positive relationship with breakeven outcomes, and grouping films by score range highlights which predictors correspond to stronger or weaker performance. The second approach uses the CatBoostRegressor, a gradient-boosting machine learning method designed to handle categorical variables and nonlinear interactions to predict proportional revenue. By iteratively building decision trees that correct prior errors, the model achieves useful predictive accuracy and provides feature importance that identifies influential variables. Together, these two methods provide a greater understanding into how movie predictors shape their successes and portray the value of using both interpretable statistical frameworks and machine learning techniques.
Posted in Symposium 2026 on May 1, 2026.