SMU Data Science Review


Abstract. Over 87% of the streaming music is owned by four major record labels (Jones, 2018). Yet, the songs owned by those labels account for <1% of the total amount of music created each year. These labels are historically better at identifying talent (though this talent identification is becoming more difficult). Even though Spotify has 36% of the streaming marketing share (T4, 2021), Spotify has not been profitable because of the large licensing costs paid to the large music labels. If Spotify could identify hit songs & artists before the large labels, they would sign those artists and dramatically reduce their licensing costs. Using the Spotify API, this paper will use Spotify data on over 400K songs over the last three years for exploratory data analysis, provide descriptive statistics, perform feature selection, and develop models using LASSO and XGBOOST Classification. The research determined multiple key features and predicted with over 60% accuracy songs which were going to be a hit (defined as >90% popularity).

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License