SMU Data Science Review

Adjusting Community Survey Data Benchmarks for External Factors

Abstract

Abstract. Using U.S. resident survey data from the National Community Survey in combination with public data from the U.S. Census and additional sources, a Voting Regressor Model was developed to establish fair benchmark values for city performance. These benchmarks were adjusted for characteristics the city cannot easily influence that contribute to confidence in local government, such as population size, demographics, and income. This adjustment allows for a more meaningful comparison and interpretation of survey results among individual cities. Methods explored for the benchmark adjustment included cluster analysis, anomaly detection, and a variety of regression techniques, including random forest, ridge, decision tree, support vector, gradient boosting, KNN, and ensembles. The final models used ensemble regression methods to predict trust in government and identify important features and cluster analysis to assign similar cities to clusters for comparison. The voting regression model predictions were compared to the actual raw scores, and cities that scored significantly above and below predictions were identified. These overperformers and underperformers may have additional factors not accounted for within the model contributing to their score.

Recommended Citation

Miller, Allen; Norelli, Nicole M.; Slater, Robert; and Yu, Mingyang N. (2022) "Adjusting Community Survey Data Benchmarks for External Factors," SMU Data Science Review: Vol. 6: No. 1, Article 2.
Available at: https://scholar.smu.edu/datasciencereview/vol6/iss1/2