SMU Data Science Review


Irritable Bowel Disease (IBD) affects a sizable portion of the US population, causing symptoms such as vomiting, abdominal pain, and diarrhea. Despite the disease’s prevalence, the precise cause is not fully understood. This study consists of endoscopic and histological data from patients diagnosed with IBD and a control population for reference. The machine learning models' focus is to classify patients into IBD types. Several models were analyzed, including decision trees, logistic regression, and k-nearest neighbors. In addition, various methods of SMOTE were applied to determine the most effective transformation and ensuring that the dataset is balanced. The best model with the highest scoring results was the random forest model. The data collected predicts the patient’s disease stage to an accuracy level of 98.8%. The sensitivity and specificity of 72.9% indicate slight bias. In conclusion, the model resulted in the highest accuracy and balanced the true/false positives.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License