SMU Data Science Review

Stack Overflow Question Retrieval System

Vishi Cline, Southern Methodist UniversityFollow
Abhishek Dharwadkar, Southern Methodist UniversityFollow
Rajni Goyal, Southern Methodist UniversityFollow
Daniel Engels, Southern Methodist UniversityFollow
Raghuram Srinivas, Southern Methodist UniversityFollow
Sohail Rafiqi, Southern Methodist UniversityFollow

Abstract

In this paper, various approaches were presented to match the most similar question to a user’s query. This is a two-step process, wherein the tags/topics of the questions are identified using k-means clustering and topic modeling respectively. User’s query is then matched with the most similar question in the corpus using k-means, topic modeling and ensemble models. Our motivation is to improve the developer’s productivity by presenting the top 10 most relevant questions similar to the users’ query. Our study is focused on answering Python (windows) specific technical programming related questions using the Stack Overflow dataset. The models are built using k-mean classification, topic modelling and ensemble of the two approaches, to find similar questions. These three approaches were chosen because the tags provided by the dataset were too generic to contextualize the question – which may result in irrelevant answer queries for future questions. Recall is the metric used to evaluate the models. Based on the results, we concluded that NMF and ensemble method outperformed k-means, with recall for NMF and Ensemble being 67% and recall for k-means being 50%.

Recommended Citation

Cline, Vishi; Dharwadkar, Abhishek; Goyal, Rajni; Engels, Daniel; Srinivas, Raghuram; and Rafiqi, Sohail (2018) "Stack Overflow Question Retrieval System," SMU Data Science Review: Vol. 1: No. 2, Article 13.
Available at: https://scholar.smu.edu/datasciencereview/vol1/iss2/13

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Download

COinS

SMU Data Science Review

Stack Overflow Question Retrieval System

Authors

Abstract

Recommended Citation

Creative Commons License

Share

Search