SMU Data Science Review
Abstract
Planet identification has typically been a tasked performed exclusively by teams of astronomers and astrophysicists using methods and tools accessible only to those with years of academic education and training. NASA’s Exoplanet Exploration program has introduced modern satellites capable of capturing a vast array of data regarding celestial objects of interest to assist with researching these objects. The availability of satellite data has opened up the task of planet identification to individuals capable of writing and interpreting machine learning models. In this study, several classification models and datasets are utilized to assign a probability of an observation being an exoplanet. A Random Forest Classifier was selected as the optimum machine learning model to classify objects of interest in the Cumulative Kepler Object of Information table. The Random Forest Classifier obtained a cross-validated accuracy score of 98%. 968 candidate observations have a greater than 95% probability of being an exoplanet. Finally, the Random Forest Classifier was made publicly accessible by an application programming interface (API) and an Azure Container Instance web service in the Microsoft Azure cloud.
Recommended Citation
Sturrock, George Clayton; Manry, Brychan; and Rafiqi, Sohail
(2019)
"Machine Learning Pipeline for Exoplanet Classification,"
SMU Data Science Review: Vol. 2:
No.
1, Article 9.
Available at:
https://scholar.smu.edu/datasciencereview/vol2/iss1/9
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Included in
Applied Statistics Commons, Artificial Intelligence and Robotics Commons, Other Astrophysics and Astronomy Commons, Probability Commons, Stars, Interstellar Medium and the Galaxy Commons, Theory and Algorithms Commons