SMU Data Science Review

NLP Bias and African American English

Abstract

African American English (AAE) is widely used on social media, but most sentiment analysis tools are trained only on Standard American English (SAE). This mismatch can cause models to misclassify dialectal expressions—especially by labeling neutral or positive AAE as negative or toxic. These errors matter, since Natural Language Processing (NLP) systems are now central to content moderation and brand monitoring. This research will evaluate the VADER, RoBERTa, GPT-OSS, and Gemma’s handling of AAVE in comparison to SAE using the TwitterAAE corpus, a public dataset of tweets with estimated AAVE usage. The goal is to see if these models consistently misread AAVE in sentiment detection tasks. Performance will be compared using metrics such as accuracy, false negative rates, and sentiment score differences between AAE and SAE tweets.

Recommended Citation

Roy, Kenya and Javed, Faizan (2025) "NLP Bias and African American English," SMU Data Science Review: Vol. 9: No. 3, Article 9.
Available at: https://scholar.smu.edu/datasciencereview/vol9/iss3/9