In this paper, we present a common definition and list of skills for a Data Scientist using online job postings. The overlap and ambiguity of various roles such as data scientist, data engineer, data analyst, software engineer, database administrator, and statistician motivate the problem. To arrive at a single Data Scientist definition, we collect over 8,000 job postings from Indeed.com for the six job titles. Each corpus contains text on job qualifications, skills, responsibilities, educational preferences, and requirements. Our data science methodology and analysis rendered the single definition of a data scientist: A data scientist codes, collaborates, and communicates – transforming data into insights using techniques in statistics, analytics, and machine learning. A secondary finding confirms the hypothesis that Data Scientist job postings’ features overlap with the other five job titles – explaining the absence of a common definition. Our conclusion is the application of data science algorithms and techniques on the job postings shows the most similar roles to a Data Scientist, provides a single definition of a Data Scientist and generates the top features of a Data Scientist.
Ho, Andy; Nguyen, An; Pafford, Jodi L.; and Slater, Robert
"A Data Science Approach to Defining a Data Scientist,"
SMU Data Science Review: Vol. 2:
3, Article 4.
Available at: https://scholar.smu.edu/datasciencereview/vol2/iss3/4
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License