SMU Data Science Review


With growing energy usage, power outages affect millions of households. This case study focuses on gathering power outage historical data, modifying the data to attach weather attributes, and gathering ERCOT energy market conditions for Dallas-Fort Worth and Houston metropolitan areas of Texas. The transformed data is then analyzed using machine learning algorithms including, but not limited to, Regression, Random Forests and XGBoost to consider current weather and ERCOT features and predict power outage percentage for locations. The transformed data is also trained using time series models and serially correlated models including Autoregression and Vector Autoregression. This study also focuses on traditional machine learning models that assume sample independence when compared to those that assume serial correlation. The results show machine learning models that utilize both weather features and ERCOT data yield a lower RMSE and higher prediction accuracy than using one feature-set exclusively. In addition, multivariate Vector Autoregressive models have lower RMSE compared to univariate Auto-Regressive, univariate Random Forest and univariate neural network models when weather and ERCOT data are included to predict power outages. Top performing traditional machine learning models are packaged into an external facing web application for public use in determining current power outage risk.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Included in

Data Science Commons