SMU Data Science Review

Finding Truth in Fake News: Reverse Plagiarism and other Models of Classification

Matthew Przybyla, Southern Methodist UniversityFollow
David Tran, Southern Methodist UniversityFollow
Amber Whelpley, Southern Methodist UniversityFollow
Daniel W. Engels, Southern Methodist UniversityFollow

Abstract

As the digital age creates new ways of spreading news, fake stories are propagated to widen audiences. A majority of people obtain both fake and truthful news without knowing which is which. There is not currently a reliable and efficient method to identify “fake news”. Several ways of detecting fake news have been produced, but the various algorithms have low accuracy of detection and the definition of what makes a news item ‘fake’ remains unclear. In this paper, we propose a new method of detecting on of fake news through comparison to other news items on the same topic, as well as performing logistic regression and multinomial naïve Bayes classification. From the techniques and methodologies, we found that fake news can be classified in the simplest terms as fact-based or non-fact-based. Our model, built upon reverse plagiarism and natural language processing, produces positive results but is not as effective as logistic regression and multinomial naïve Bayes. These models classify fake news more correctly and efficiently than a human could and show that fake news is easily identifiable. The traditional classification models outperform the reverse plagiarism method, but improvements and refinements can be made.

Recommended Citation

Przybyla, Matthew; Tran, David; Whelpley, Amber; and Engels, Daniel W. (2018) "Finding Truth in Fake News: Reverse Plagiarism and other Models of Classification," SMU Data Science Review: Vol. 1: No. 4, Article 13.
Available at: https://scholar.smu.edu/datasciencereview/vol1/iss4/13