Subject Area

Computer Science

Abstract

In recent years, the progress in inter-disciplinary application of machine learning and artificial intelligence (ML/AI) have truly transformed various fields, from weather forecasting and drug development to medical diagnostics, energy, and sustainability. Computational chemistry uses computational tools to model, predict, analyze, and explain chemical phenomena, while the Quantum chemistry specifically uses techniques based on quantum mechanics (as opposed to classical mechanics or empirical models). Quantum chemistry or Computational chemistry has also observed a momentum in application of ML techniques over the past decade significantly accelerating results and providing valuable insights into vast datasets, often surpassing traditional methods.

This dissertation explores the integration of machine learning to enhance the efficiency and accuracy of computational chemistry methods. The prime area of focus is to minimize the errors associated when the complex tensor hyper-contraction (THC) approximation technique is applied over the third-order Møller-Plesset Perturbation (MP3) theory.

We started by comparing different levels of THC approximations based on the dataset and amount of correction. The varying $\delta$ values of THC indicate the level of approximation. An attempt was made to reduce the errors for both molecular and reaction energy.

The research then systematically applies machine learning methods from linear model to more complex neural networks. Part of this research is published in Journal of Computational Chemistry (JCC).

Multiple Linear regression (MLR): Albeit a simpler technique, MLR yielded good results with up to 84\% improvement in calculating the energy levels over MP3b baseline values.

Kernel Ridge Regression: This model yielded even better results than MLR, up to 89\% improvement in the calculated energy level values. This strongly suggested towards non-linearity in the datasets.

Multi-Layer Perceptron Artificial Neural Network Model (MLP-ANN): The research evaluated the MLP architecture as a viable candidate for the THC-MP3 dataset. While the model offered high learning capacity, they were more sensitive to training procedures and data splits than KRR.

Tabular Prior-data Fitted Network (TabPFN): Since the contextual data of THC-MP3 is in tabulated form, pre-trained TabPFN was also implemented to evaluate the performance of transformer-based models. The results of TabPFN also yielded a varied range of results based on the $\delta$ of the THC-MP3 dataset.

Hybrid Stacking Technique: A major contribution of the research was development of a two-stage stacking framework that augmented KRR predictions to the input features for secondary MLP or TabPFN models. This successfully combined regression stability with neural network capacity to reach improvements up to 90\%.

The results also indicate that while molecular energy corrections were highly successful, improving reaction energies remained more challenging due to the limited ability of statistical models to exploit the physical error cancellation inherent in reactions.

By focusing on improving speed and accuracy, this dissertation contributes to making quantum chemistry processes more efficient and cost-effective. This work can deepen our understanding of molecular ground states and reaction dynamics, paving the way for advancements in drug design, vaccine development, advancing polymer processing techniques, climate modeling, artificial photosynthesis, and sustainable energy solutions.

Degree Date

Spring 5-16-2026

Document Type

Dissertation

Degree Name

Ph.D.

Department

Lyle School of Engineering

Advisor

Dr. Devin Matthews

Second Advisor

Dr. Eric Larson

Number of Pages

162

Format

PDF

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS