SMU Data Science Review
Abstract
In the healthcare industry today, 80% of data is unstructured (Razzak et al., 2019). The challenge this imposes on healthcare providers is that they rely on unstructured data to inform their decision-making. Although Electronic Health Records (EHRs) exist to integrate patient data, healthcare providers are still challenged with searching for information and answers contained within unstructured data. Prior NLP and Deep Learning research has shown that these methods can improve information extraction on unstructured medical documents. This research expands upon those studies by developing a Question Answering system using distilled BERT models. Healthcare providers can use this system on their local computers to search for and receive answers to specific questions about patients. This paper’s best TinyBERT and TinyBioBERT models had Mean Reciprocal Rank (MRRs) of 0.522 and 0.284 respectively. Based on these findings this paper concludes that TinyBERT performed better than TinyBioBERT on BioASQ task 9b data.
Recommended Citation
Lewandowski, Brittany; Morris, Rayon; Paul, Pearly Merin; and Slater, Robert
(2023)
"Question Answering with distilled BERT models: A case study for Biomedical Data,"
SMU Data Science Review: Vol. 7:
No.
1, Article 9.
Available at:
https://scholar.smu.edu/datasciencereview/vol7/iss1/9
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License