Subject Area
Computer Science
Abstract
Deep learning has had remarkable success in a variety of fields. However, architectures often rely on hyperparameter searches and heuristics for improved model performance. Performing hyperparameter searches is an arduous task--often time-consuming and potentially expensive to run on accelerated hardware. As a result, practitioners often rely on heuristics which may lead to sub-optimal results. In the context of deep learning, hyperparameters are set by the user prior to the training process and remain fixed. Deep learning uses gradient descent to learn complex feature representations from data, limiting human intervention. While the weights of the architecture can learn directly through data and gradient descent, hyperparameters do not embody the same spirit. Several design components including convolutional kernel size, decimation rates, and aggregation methods are not differentiable and must be manually adjusted as hyperparameters.
This dissertation proposes several novel approaches to address these challenges associated with hyperparameter tuning by reformulating many hyperparameters into differentiable operations that can be optimized through gradient descent. As a result, these methods reduce the reliance on manual hyperparameter tuning and allow networks to learn directly from data. By enabling the network to learn architectural design elements, model performance can be improved as specific structures imposed by hyperparameters can be avoided. Furthermore, learning in a continuous and differentiable space can often yield more efficient and effective optimization while increasing model expressiveness. Overall, this work aims to reduce the need for human intervention and expensive hyperparameter searches while improving the performance and flexibility of deep neural networks.
Degree Date
Summer 8-2024
Document Type
Dissertation
Degree Name
Ph.D.
Department
Computer Science
Advisor
Eric Larson
Second Advisor
Mitchell Thornton
Number of Pages
212
Format
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Harper, Clayton, "Improving Expressive Capacity of Deep Neural Networks" (2024). Computer Science and Engineering Theses and Dissertations. 42.
https://scholar.smu.edu/engineering_compsci_etds/42