•  
  •  
 

SMU Data Science Review

Abstract

This research investigates the application of machine learning techniques to assist in the execution of a synthetic control model. This model was performed to analyze counties within the United States that showed a voter shift from a majority of Democratic voter share to Republican between the 2012 and 2016 election cycles. The following study applies two steps of machine learning analysis. The first, which is the treatment discovery process, leverages a Random Forest to evaluate feature importance. The second step was the execution of the synthetic control model with two predictor variable lists. The first was the parametric method: a hand curated predictor variable list based on domain knowledge. The second was the non-parametric method: all available predictor (descriptive) variables were used. The Random Forest treatment discovery process resulted in two uncommon variables applied as treatment effects: WIC women enrollment and a decrease of vegetable farm acreage. The opportunity to research these atypical treatment variables allows for the potential of surfacing counterfactual arguments for further research. The use of the parametric and non-parametric methods offers a system of comparison for the research in this paper. The result from the decrease in vegetable farm acreage treatment variable was negative for the non-parametric model. However, the parametric model did show strong statistical evidence towards a treatment effect from the decrease in farm acreage. It is likely that the decrease in vegetable farm acreage is a proxy for poverty or a population density metric. These data results suggest that this model was likely suffering from omitted variable bias for representation of one or both of these metrics in the predictor variable list. The WIC women enrollment treatment variable investigation resulted in the synthetic control model having difficulty in forming a synthetic control comparison. These results suggest there is a fundamental difference between those counties used to create the synthetic control and the other counties that saw a treatment effect. Additional research needs to be performed, and it could result in a different application of the data for use in a synthetic control model. The results of this study, while not surfacing causal inference, did open questions for further research. Given the opportunity these joined causal inference and machines learning practices could continue and potential offer assistance to traditional causal modeling methods. Allowing researchers to understand data and relationships between the data more intimately, theoretically allowing for new causal inferences to be discovered.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Included in

Data Science Commons

Share

COinS