•  
  •  
 

SMU Data Science Review

Abstract

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews were most likely to be recommended when conveying an overall positive message written in a few moderately complex sentences expressing substantive detail with an informative range of varied sentiment. Other factors relating to patterns and frequency of platform use also bear strongly on review recommendations. Though not without important ethical implications, the findings are logically consistent with Yelp’s efforts to facilitate, inform, and empower consumer decisions.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS