In this paper, we present a new model to predict the prob- ability that a personal computer will become infected with malware. The dataset is selected from a Kaggle competition supported by Mi- crosoft. The data includes computer configuration, owner information, installed software, and configuration information. In our research, sev- eral classification models are utilized to assign a probability of a machine being infected with malware. The LightGBM classifier is the optimum machine learning model by performing faster with higher efficiency and lower memory usage in this research. The LightGBM algorithm obtained a cross-validation ROC-AUC score of 74%. Leading factors and feature importance are also identified by LightGBM technique. Our research revealed that variables related to location, firmware version, operating system, and anti-virus software are the most important variables that have the highest weight in predicting malware detection.
Shahini, Maryam; Farhanian, Ramin; and Ellis, Marcus
"Machine Learning to Predict the Likelihood of a Personal Computer to Be Infected with Malware,"
SMU Data Science Review: Vol. 2:
2, Article 9.
Available at: https://scholar.smu.edu/datasciencereview/vol2/iss2/9
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License