Subject Area

Computer Science

Abstract

Deep learning has had remarkable success in a variety of fields. However, architectures often rely on hyperparameter searches and heuristics for improved model performance. Performing hyperparameter searches is an arduous task--often time-consuming and potentially expensive to run on accelerated hardware. As a result, practitioners often rely on heuristics which may lead to sub-optimal results. In the context of deep learning, hyperparameters are set by the user prior to the training process and remain fixed. Deep learning uses gradient descent to learn complex feature representations from data, limiting human intervention. While the weights of the architecture can learn directly through data and gradient descent, hyperparameters do not embody the same spirit. Several design components including convolutional kernel size, decimation rates, and aggregation methods are not differentiable and must be manually adjusted as hyperparameters.

This dissertation proposes several novel approaches to address these challenges associated with hyperparameter tuning by reformulating many hyperparameters into differentiable operations that can be optimized through gradient descent. As a result, these methods reduce the reliance on manual hyperparameter tuning and allow networks to learn directly from data. By enabling the network to learn architectural design elements, model performance can be improved as specific structures imposed by hyperparameters can be avoided. Furthermore, learning in a continuous and differentiable space can often yield more efficient and effective optimization while increasing model expressiveness. Overall, this work aims to reduce the need for human intervention and expensive hyperparameter searches while improving the performance and flexibility of deep neural networks.

Degree Date

Summer 8-2024

Document Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

Advisor

Eric Larson

Second Advisor

Mitchell Thornton

Number of Pages

212

Format

.pdf

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS