Abstract

Breast cancer is prevalent among women in the United States. Breast cancer screening is standard but requires a radiologist to review screening images to make a diagnosis. Diagnosis through the traditional screening method of mammography currently has an accuracy of about 78% for women of all ages and demographics. A more recent and precise technique called Digital Breast Tomosynthesis (DBT) has shown to be more promising but is less well studied. A machine learning model trained on DBT images has the potential to increase the success of identifying breast cancer and reduce the time it takes to diagnose a patient, leading to faster treatment. In this study, a Convolutional Neural Network (CNN) was trained on an open-source dataset from Duke of DBT images belonging to patients with no, benign, and malignant tumors. The model was designed to identify the presence of a tumor (both malignant or benign) or its absence. Robust open-source datasets of medical images are scarce due to the nature of medicine. Deidentifying medical images is very time-intensive, and labeling the dataset requires the expertise of a medical professional, in this case, a radiologist. The open-source dataset was small and imbalanced, so transfer learning, under-sampling the more prevalent healthy patient class, and image augmentation was used to improve prediction accuracy. Training a CNN is very computationally expensive, and a high compute VM environment with extensive RAM was created to facilitate learning the weights of a CNN.

Recommended Citation

Fogleman, Spencer; Otsap, Jeremy; and Cho, Sangrae (2021) "Clinical Diagnosis Support with Convolutional Neural Network by Transfer Learning," SMU Data Science Review: Vol. 5: No. 3, Article 2.
Available at: https://scholar.smu.edu/datasciencereview/vol5/iss3/2