SMU Data Science Review


In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the restaurants examined, but key trigrams were specific to restaurants, including menu items. While the models were successful in predicting high-rated reviews, they struggled to identify negative reviews with acceptable accuracy. The method outlined in this paper proved successful in extracting positive trigrams that are highly specific to the restaurants examined, and we propose these phrases be emphasized on Yelp pages to allow users to quickly learn the items of highest quality at a restaurant.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License