Subject Area

Computer Science

Abstract

Deep learning has had remarkable success in a variety of fields. However, architectures often rely on hyperparameter searches and heuristics for improved model performance. Performing hyperparameter searches is an arduous task--often time-consuming and potentially expensive to run on accelerated hardware. As a result, practitioners often rely on heuristics which may lead to sub-optimal results. In the context of deep learning, hyperparameters are set by the user prior to the training process and remain fixed. Deep learning uses gradient descent to learn complex feature representations from data, limiting human intervention. While the weights of the architecture can learn directly through data and gradient descent, hyperparameters do not embody the same spirit. Several design components including convolutional kernel size, decimation rates, and aggregation methods are not differentiable and must be manually adjusted as hyperparameters.

This dissertation proposes several novel approaches to address these challenges associated with hyperparameter tuning by reformulating many hyperparameters into differentiable operations that can be optimized through gradient descent. As a result, these methods reduce the reliance on manual hyperparameter tuning and allow networks to learn directly from data. By enabling the network to learn architectural design elements, model performance can be improved as specific structures imposed by hyperparameters can be avoided. Furthermore, learning in a continuous and differentiable space can often yield more efficient and effective optimization while increasing model expressiveness. Overall, this work aims to reduce the need for human intervention and expensive hyperparameter searches while improving the performance and flexibility of deep neural networks.

Degree Date

Summer 8-2024

Document Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science and Engineering

Advisor

Eric Larson

Second Advisor

Mitchell Thornton

Number of Pages

212

Format

.pdf

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Harper, Clayton, "Improving Expressive Capacity of Deep Neural Networks" (2024). Computer Science and Engineering Theses and Dissertations. 42.
https://scholar.smu.edu/engineering_compsci_etds/42

Download

Included in

Other Computer Engineering Commons

COinS

Computer Science and Engineering Theses and Dissertations

Improving Expressive Capacity of Deep Neural Networks

Subject Area

Abstract

Degree Date

Document Type

Degree Name

Department

Advisor

Second Advisor

Number of Pages

Format

Creative Commons License

Recommended Citation

Included in

Search

Browse

Submit

Links

Computer Science and Engineering Theses and Dissertations

Improving Expressive Capacity of Deep Neural Networks

Authors

Subject Area

Abstract

Degree Date

Document Type

Degree Name

Department

Advisor

Second Advisor

Number of Pages

Format

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Submit

Links