SMU Data Science Review


In this paper, we present a forecasting analysis of the San Francisco Bay Area Rapid Transit (BART) ridership data utilizing a number of different time series methods. BART is a major public transportation system in the Bay Area and it relies heavily on its riders' fares; having models that generate accurate ridership numbers better enables the agency to project revenue and help manage future expenses. For our time series modeling, we utilized autoregressive integrated moving average (ARIMA), deep neural networks (DNN), state space models, and long short-term memory (LSTM) to predict monthly ridership. As there is such a wide range of time series techniques being used in different applications today, we explore some of the most commonly-used methods to gain deeper insights into their strengths and weaknesses as it relates to our data set in particular. We apply a variety of novel transformations to our data set in an attempt to improve the forecast accuracy. One of our primary transformations was to decouple the time series into multiple different component series based on weekday and region. We then discover that different models have better performance on different weekday and regional series. While our transformations increased overall accuracy by roughly 550% across models, the decoupling of the series into multiple component series also allows for the possibility to fit different models to different series, and thus increase accuracy further. We, therefore, see this as a powerful transformation technique that can be applied to great effect when possible.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License