The challenges of learning a new language can be reduced with real-time feedback on pronunciation and language usage. Today there are readily available technologies which provide such feedback on spoken languages, by translating the voice of the learner into written text. For someone seeking to learn American Sign Language (ASL), there is however no such feedback application available. A learner of American Sign Language might reference websites or books to obtain an image of a hand sign for a word. This process is like looking up a word in a dictionary, and if the person wanted to know if they were doing the sign correctly; or to know what a sign means, there is no way of checking. Because of this, the automated translation of ASL has been an active area of research since the early 2000’s. Researchers have investigated numerous ways of capturing hand signs, as well as numerous methods for recognizing or categorizing the captured data. In this work, we demonstrate an ASL translation system based on Convolutional Neural Networks (CNN), that provides a practical application for the real-time translation of 29 ASL signs, including 26 alphabet signs and three additional signs (‘space’, ‘delete’, and ‘nothing’). This application can be used to translate a hand sign for a person learning ASL as well as to facilitate communication between an ASL-signer and a non-signer. The CNN model used in this study is based on the Keras VGG161 pre-trained model and pre-processed images. It has 100% accuracy when predicting on a hold-out/cross-validation testing dataset. The keys to achieving this high precision in automated sign translation are 1) good input images, 2) starting from a pre-trained model 3) fine-tuning of the model. This paper discusses the use of contrast limiting adaptative histogram equalization (CLAHE) image pre-processing to enhance the input images, provides a high-level overview of convolution neural networks (CNN), discusses the use of the VGG161 pre- training model as a starting point for the CNN network and the fine-tuning of the resultant model, and provides an overview of the web application implemented for real-time ASL translation. The results of experiments used to assess the strength and generalization capabilities of the model are also detailed.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License