SMU Data Science Review

Spoken Language Recognition on Open-Source Datasets

Brady Arendale, Southern Methodist UniversityFollow
Samira Zarandioon, Southern Methodist UniversityFollow
Ryan Goodwin, Southern Methodist UniversityFollow
Douglas Reynolds, MIT Lincoln LaboratoryFollow

Abstract

The field of speaker and language recognition is constantly being researched and developed, but much of this research is done on private or expensive datasets, making the field more inaccessible than many other areas of machine learning. In addition, many papers make performance claims without comparing their models to other recent research. With the recent development of public multilingual speech corpora such as Mozilla's Common Voice as well as several single-language corpora, we now have the resources to attempt to address both of these problems. We construct an eight-language dataset from Common Voice and a Google Bengali corpus as well as a five-language holdout test set from Audio Lingua. We then compare one filterbank-based model and two waveform-based models found in recent literature, all based on convolutional neural networks. We find that our filterbank-based model achieves the strongest results, with a 90.5% test accuracy on our eight-language test set and a 74.8% test accuracy on our five-language Audio Lingua test set. We conclude that some models originally trained on private datasets are also applicable to our public datasets and make suggestions on how this performance can be improved further.

Recommended Citation

Arendale, Brady; Zarandioon, Samira; Goodwin, Ryan; and Reynolds, Douglas (2020) "Spoken Language Recognition on Open-Source Datasets," SMU Data Science Review: Vol. 3: No. 2, Article 3.
Available at: https://scholar.smu.edu/datasciencereview/vol3/iss2/3