SMU Data Science Review

Real-Time Voice Biometric Speaker Verification

Inderbir Dhillon, Southern Methodist UniversityFollow
Jason Rupp, Southern Methodist UniversityFollow
Aniketh Vankina, Southern Methodist UniversityFollow
Robert Slater, Southern Methodist UniversityFollow

Abstract

Abstract. Automated speaker verification has been an area of increased research in the last few years, with a special interest in metric learning approaches that compute distances between speaker voiceprints. In this paper, three metric learning systems are built and compared in a one-shot speaker verification task using contrastive max-margin loss, triplet loss, and quadruplet loss. For all the models, spectrograms are created from speaker audio. Convolutional Neural Network embedding layers are trained to produce compact voiceprints that allow users to be distinguished using distance calculations. Performances of the three models were similar, but the model with the best EER used triplet loss in this experiment.

Recommended Citation

Dhillon, Inderbir; Rupp, Jason; Vankina, Aniketh; and Slater, Robert (2021) "Real-Time Voice Biometric Speaker Verification," SMU Data Science Review: Vol. 5: No. 2, Article 11.
Available at: https://scholar.smu.edu/datasciencereview/vol5/iss2/11