SMU Data Science Review


Vacant lots have been associated with community violence for many years. Researchers have confirmed a positive correlation between vacant lots and vacant buildings with increased violence in urban and rural geographies. However, identifying vacant lots has been a challenge, and modeling methods were largely manual and time-intensive. This prevented cities and non-profit organizations from acting on the information since it was expensive and high-risk to develop remediation programs without clearly understanding where or how many vacant lots existed.

The primary objective of this study was to provide a predictive model that accelerates and improves the accuracy of prior land classification methods. Labels for 2019 vacant lots from Child Poverty Action Lab (CPAL) were used as the source of truth for model development. Public data from the City of Dallas and Dallas Central Appraisal District (DCAD) were used to determine land value, crime incidents, code violations, and certificates of occupancy. These features were mapped onto lot locations using Geospatial Information System (GIS) modeling techniques.

The study concluded that an XGBoost model with five simple factors: land value, improvement or the building value, total value, land size, and division code offers the best balance of performance and simplicity.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Included in

Data Science Commons