•  
  •  
 

SMU Data Science Review

Abstract

This paper introduces a novel approach to enhance the imputation process for missing data, utilizing crime records from Chicago with arrests as the target feature. Robust imputation techniques are crucial in the era of burgeoning datasets for generating reliable insights. Our core objective is to present an innovative method that improves imputation techniques, augmenting model performance and bolstering the reliability of analytical outcomes. Leveraging numeric crime data, we establish a Gradient Boosting (GBM) baseline model, then introduce ensemble methods including Random Forest and Decision Trees for further refinement. By systematically exploring multiple imputation processes, we establish a baseline for comparative analysis, enabling precise measurement of efficacy. Inspired by existing literature, our imputation process elevates performance metrics and provides actionable insights. This study addresses broader challenges in data imputation, particularly in crime data analysis in urban settings like Chicago. Throughout, we document our methodology, experimentation, and findings, highlighting the effectiveness of ensemble techniques coupled with GBM in addressing data imputation challenges. Our research aims to empower practitioners and researchers with enhanced decision-making capabilities and analytical prowess in data-rich environments.

Included in

Data Science Commons

Share

COinS