SMU Data Science Review
Abstract
In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews were most likely to be recommended when conveying an overall positive message written in a few moderately complex sentences expressing substantive detail with an informative range of varied sentiment. Other factors relating to patterns and frequency of platform use also bear strongly on review recommendations. Though not without important ethical implications, the findings are logically consistent with Yelp’s efforts to facilitate, inform, and empower consumer decisions.
Recommended Citation
Yao, Yao; Angelov, Ivelin; Rasmus-Vorrath, Jack; Lee, Mooyoung; and Engels, Daniel W.
(2018)
"Yelp’s Review Filtering Algorithm,"
SMU Data Science Review: Vol. 1:
No.
3, Article 3.
Available at:
https://scholar.smu.edu/datasciencereview/vol1/iss3/3
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Included in
Analysis Commons, Applied Statistics Commons, Business Analytics Commons, Business and Corporate Communications Commons, Business Intelligence Commons, Computer Law Commons, Engineering Education Commons, Multivariate Analysis Commons, Numerical Analysis and Computation Commons, Other Legal Studies Commons, Other Statistics and Probability Commons, Probability Commons, Science and Technology Studies Commons, Social Statistics Commons, Statistical Methodology Commons, Statistical Models Commons, Technology and Innovation Commons