SMU Data Science Review
Abstract
Abstract. Large Language Models (LLMs) are being used increasingly within the healthcare industry to summarize complex clinical information, but their outputs can often reflect biases inherited from their training data. In healthcare, these biases are not just technical flaws, but they can lead to distorted and false information about vaccine safety, compromise patient trust, and lead to potential harmful outcomes. This study investigates bias found in LLM-generated outputs to question-answer pairs inspired by adverse vaccine reactions using COVID-19 data from the Vaccine Adverse Event Reporting System (VAERS) from 2020–2024. We examined whether training the LLMs on a known Bias Benchmark for Question Answering (BBQ) could help reduce some of the models’ inherent biases. To address this issue, we propose different Natural Language Processing (NLP) techniques to mitigate bias, while evaluating the Llama 3.1 8B open-source model by Meta. This research aims to help reduce the known bias in healthcare LLMs and strengthen reliability and robustness.
Recommended Citation
Dulude, Nolan; Karthikeyan, Renu; Sadler, Bivin; and Javed, Faizan
(2025)
"Bias Evaluation of Healthcare Data with the use of VBQA - a VAERS Inspired Bias Question Answer Dataset,"
SMU Data Science Review: Vol. 9:
No.
3, Article 10.
Available at:
https://scholar.smu.edu/datasciencereview/vol9/iss3/10
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
