SMU Data Science Review

Machine Learning Pipeline for Exoplanet Classification

George Clayton Sturrock, Southern Methodist UniversityFollow
Brychan Manry, Southern Methodist UniversityFollow
Sohail Rafiqi, Southern Methodist UniversityFollow

Abstract

Planet identification has typically been a tasked performed exclusively by teams of astronomers and astrophysicists using methods and tools accessible only to those with years of academic education and training. NASA’s Exoplanet Exploration program has introduced modern satellites capable of capturing a vast array of data regarding celestial objects of interest to assist with researching these objects. The availability of satellite data has opened up the task of planet identification to individuals capable of writing and interpreting machine learning models. In this study, several classification models and datasets are utilized to assign a probability of an observation being an exoplanet. A Random Forest Classifier was selected as the optimum machine learning model to classify objects of interest in the Cumulative Kepler Object of Information table. The Random Forest Classifier obtained a cross-validated accuracy score of 98%. 968 candidate observations have a greater than 95% probability of being an exoplanet. Finally, the Random Forest Classifier was made publicly accessible by an application programming interface (API) and an Azure Container Instance web service in the Microsoft Azure cloud.

Recommended Citation

Sturrock, George Clayton; Manry, Brychan; and Rafiqi, Sohail (2019) "Machine Learning Pipeline for Exoplanet Classification," SMU Data Science Review: Vol. 2: No. 1, Article 9.
Available at: https://scholar.smu.edu/datasciencereview/vol2/iss1/9