SMU Data Science Review


In this paper, we present the use of a machine learning-based architecture for use in regression tasks with image data collected by the International Ocean Discovery Program (IODP) – in part funded by the U.S. National Science Foundation (NSF) – for scientific use during seago- ing operations aboard the JOIDES Resolution scientific drillship as well as onshore at the Texas A&M University IODP Headquarters. Our data science-driven approach integrates modern programming techniques in Python, computer vision, machine learning, and deep learning applica- tions with the traditional geoscience linear regression architecture such that the modeling of high-resolution spectral data with the vast amounts of petrophysical and geochemical data can be trained end-to-end with open source applications to predict independent global proxies O18/O16 isotope ratio (δ18O) for geologic age of ocean sediments through time. First, we show that computer vision applications like OpenCV can be employed to extract, transform and load spectral data into a continuous data array in an automated function that webscrapes the online IODP database. Next, we present generalizable machine learning regression modeling of IODP core data driven by the Support Vector Machine (SVM) Regression (SVR) algorithm and hyperparameter tuning with a K-fold cross validation CV grid search technique using popular Scikitlearn packages and functional programming. Finally, we demonstrate a K-fold cross validated deep learning prediction of the δ18O variations with a deep neural network using Keras and TensorFlow 2.0 with generalizability for the vast amounts of ocean sediment data maintained by IODP. We find that the machine learning and deep learning approaches both extrapolate quite well outside of the training data, and the use of these methods pave the way for future data science applications in scientific exploration and discovery at IODP.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License