SMU Data Science Review


Major corporations utilize data from online platforms to make user product or service recommendations. Companies like Netflix, Amazon, Yelp, and Spotify rely on purchasing trends, user reviews, and helpfulness votes to make content recommendations. This strategy can increase user engagement on a company's platform. However, misleading and/or spam reviews significantly hinder the success of these recommendation strategies. The rise of social media has made it increasingly difficult to distinguish between authentic content and advertising, leading to a burst of deceptive reviews across the marketplace. The helpfulness of the review is subjective to a voting system. As such, this study aims to predict product reviews that are helpful and enable strategies to moderate a user review post to improve the helpfulness quality of a review. The prediction of review helpfulness will utilize NLP methods against Amazon product review data. Multiple machine learning principles of different complexities will be implemented in this review to compare the results and ease of implementation (e.g., Naïve Bayes and BERT) to predict a product review's helpfulness. The result of this study concludes that review helpfulness can be effectively predicted through the deployment of model features. The removal of duplicate reviews, the imputing of review helpfulness based on word count, and the inclusion of lexical elements are recommended to be included in review analysis. The results of this research indicate that the deployment of these features results in a high F1-Score of 0.83 for predicting helpful Amazon product reviews.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Included in

Data Science Commons