SMU Data Science Review
Abstract
Colorectal cancer (CRC) remains a significant public health concern, affecting millions in the United States and worldwide. This study investigates the risk factors associated with CRC using data from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial and aims to develop predictive models to identify high-risk individuals for targeted screening and increased awareness. The dataset integrates CRC incidence data from the National Cancer Institute with socioeconomic indicators from U.S. Census Bureau, linked by zip code. We employ Logistic Regression and Neural Network models to predict CRC risk, incorporating health, demographic, and socio-economic features. While the results suggest that factors such as age and address history are significant contributors to CRC risk, the inclusion of Census data had a marginal impact on model performance, likely due to limited geographic diversity. The study further explores the use of a user-facing risk calculator, designed to raise awareness and encourage screening, with a focus on accessibility and simplicity for users. These findings emphasize the importance of incorporating a broad range of factors in CRC risk prediction and the potential for improving outreach through user-friendly tools.
Recommended Citation
Bhandari, Anish; Deng, Shawn; Olheiser, Michael; and Slater, Robert
(2025)
"Predictive Modeling of Colorectal Cancer Risk: Leveraging Health, Demographic, and Socioeconomic Factors for Targeted Screening,"
SMU Data Science Review: Vol. 9:
No.
1, Article 5.
Available at:
https://scholar.smu.edu/datasciencereview/vol9/iss1/5
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
