SMU Data Science Review


In this paper, we present an analysis of the predictive ability of machine learning on the success of students in college courses in a California Community College. The California Legislature passed assembly bill 705 in order to place students in non-remedial coursework, based on high school transcripts, to increase college completion. We utilize machine learning methods on de-identified student high school transcript data to create predictive algorithms on whether or not the student will be successful in college-level English and Mathematics coursework. To satisfy the bill’s requirements, we first use exploratory data analysis on applicable transcript variables. Then we use industry knowledge to select variables to utilize in machine learning. Finally, we use these variables as input to supervised machine learning algorithms and build predictive algorithms. The results obtained indicate that a student’s overall GPA is the best predictor and higher accuracy is obtained when student demographic information is utilized. In conclusion, we were able to predict student success in a California Community College course with an average accuracy of 70%.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License