SMU Data Science Review

Machine Learning Approach to Distinguish Ulcerative Colitis and Crohn’s Disease Using SMOTE (Synthetic Minority Oversampling Technique) Methods

Kris Ghimire, Southern Methodist UniversityFollow
Walter Lai, Southern Methodist UniversityFollow
Yasser Omar, Columbus UniversityFollow
Thad Schwebke, Southern Methodist UniversityFollow
Jamie Vo, Southern Methodist UniversityFollow

Abstract

Irritable Bowel Disease (IBD) affects a sizable portion of the US population, causing symptoms such as vomiting, abdominal pain, and diarrhea. Despite the disease’s prevalence, the precise cause is not fully understood. This study consists of endoscopic and histological data from patients diagnosed with IBD and a control population for reference. The machine learning models' focus is to classify patients into IBD types. Several models were analyzed, including decision trees, logistic regression, and k-nearest neighbors. In addition, various methods of SMOTE were applied to determine the most effective transformation and ensuring that the dataset is balanced. The best model with the highest scoring results was the random forest model. The data collected predicts the patient’s disease stage to an accuracy level of 98.8%. The sensitivity and specificity of 72.9% indicate slight bias. In conclusion, the model resulted in the highest accuracy and balanced the true/false positives.

Recommended Citation

Ghimire, Kris; Lai, Walter; Omar, Yasser; Schwebke, Thad; and Vo, Jamie (2021) "Machine Learning Approach to Distinguish Ulcerative Colitis and Crohn’s Disease Using SMOTE (Synthetic Minority Oversampling Technique) Methods," SMU Data Science Review: Vol. 5: No. 2, Article 9.
Available at: https://scholar.smu.edu/datasciencereview/vol5/iss2/9