SMU Data Science Review


In the healthcare industry today, 80% of data is unstructured (Razzak et al., 2019). The challenge this imposes on healthcare providers is that they rely on unstructured data to inform their decision-making. Although Electronic Health Records (EHRs) exist to integrate patient data, healthcare providers are still challenged with searching for information and answers contained within unstructured data. Prior NLP and Deep Learning research has shown that these methods can improve information extraction on unstructured medical documents. This research expands upon those studies by developing a Question Answering system using distilled BERT models. Healthcare providers can use this system on their local computers to search for and receive answers to specific questions about patients. This paper’s best TinyBERT and TinyBioBERT models had Mean Reciprocal Rank (MRRs) of 0.522 and 0.284 respectively. Based on these findings this paper concludes that TinyBERT performed better than TinyBioBERT on BioASQ task 9b data.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Included in

Data Science Commons