SMU Data Science Review

A Machine Learning Model for Clustering Securities

Vanessa Torres, Southern Methodist UniversityFollow
Travis Deason, Southern Methodist UniversityFollow
Michael Landrum, Southern Methodist UniversityFollow
Nibhrat Lohria, Southern Methodist UniversityFollow

Abstract

In this paper, we evaluate the self-declared industry classifications and industry relationships between companies listed on either the Nasdaq or the New York Stock Exchange (NYSE) markets. Large corporations typically operate in multiple industries simultaneously; however, for investment purposes they are classified as belonging to a single industry. This simple classification obscures the actual industries within which a company operates, and, therefore, the investment risks of that company.
By using Natural Language Processing (NLP) techniques on Security and Exchange Commission (SEC) filings, we obtained self-defined industry classifications per company. Using clustering techniques such as Hierarchical Agglomerative and k-means clustering we were able to identify companies operating in similar industries. We found that the use of NLP to extract features the text was more important to model performance then model selection or optimization.

Recommended Citation

Torres, Vanessa; Deason, Travis; Landrum, Michael; and Lohria, Nibhrat (2019) "A Machine Learning Model for Clustering Securities," SMU Data Science Review: Vol. 2: No. 2, Article 18.
Available at: https://scholar.smu.edu/datasciencereview/vol2/iss2/18