SMU Data Science Review


As part of its overseeing of capital markets, the Securities and Exchange Commission (SEC) requires firms with publicly traded shares to issue periodic reports to shareholders. These SEC filings are part of the SEC’s Electronic Data Gathering, Analysis, and Retrieval system (EDGAR), a large online database. Financial services and banking industry have armies of analysts that are dedicated to rushing over, analyzing, and attempting to quantify qualitative data from this SEC mandated reporting. We sought to prototype a predictive model to render consistent judgments on a company's prospects, based on the written textual sections of public earnings releases extracted from SEC 8-K financial reports and actual stock market performance. In this project, we leverage data from \emph EDGAR to model the viability to predict the stock price through, natural language processing (NLP) and deep learning methods. The model used to predict the stock movement in the near futures (next few days from the release of report) by incorporating relevant financial information, such as recent stock price movement and above or below earnings, and other textual information from these financial reports. Our results will demonstrate how a deep learning model trained on text in SEC Document filings could provide a valuable signal to an investment decision maker. The results will be most important in 1-5 days (i.e. The next day after the financial event) but persist (constant or in some trend up or down) for up to five days.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License