•  
  •  
 

SMU Data Science Review

Abstract

As the digital age creates new ways of spreading news, fake stories are propagated to widen audiences. A majority of people obtain both fake and truthful news without knowing which is which. There is not currently a reliable and efficient method to identify “fake news”. Several ways of detecting fake news have been produced, but the various algorithms have low accuracy of detection and the definition of what makes a news item ‘fake’ remains unclear. In this paper, we propose a new method of detecting on of fake news through comparison to other news items on the same topic, as well as performing logistic regression and multinomial naïve Bayes classification. From the techniques and methodologies, we found that fake news can be classified in the simplest terms as fact-based or non-fact-based. Our model, built upon reverse plagiarism and natural language processing, produces positive results but is not as effective as logistic regression and multinomial naïve Bayes. These models classify fake news more correctly and efficiently than a human could and show that fake news is easily identifiable. The traditional classification models outperform the reverse plagiarism method, but improvements and refinements can be made.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS