Please use this identifier to cite or link to this item:
|Title:||Improving Noise Robustness in Automatic Speaker Recognition Systems|
|Abstract:||Abstract Most automatic speaker recognition systems perform poorly on data with low signal-tonoise ratios (SNRs). In this paper, we analyze the performance of Spear, an open-source, comprehensive speaker recognition toolkit, on speech utterances from the TIMIT corpus injected with background noise. We suggest and implement changes for the voice activity detection (VAD) and feature extraction steps of the Spear toolchain to improve its overall noise robustness. Speci cally, we propose replacing Spear's simple VAD, which classi es frame-level energy into two groups, with a new VAD, which uses a posteriori signal-tonoise weighted energy distance. For feature extraction, we consider the e ectiveness of using gammatone frequency cepstral coe cients (GFCCs) instead of traditional mel-scale frequency cepstral coe cients (MFCCs). We prove the superiority of GFCCs for data with low SNRs by incorporating GFCC feature extraction in the Spear toolchain and then testing it on the noisy TIMIT data. Then, we further propose a new, modi ed version of MFCCs that is even more noise-robust than GFCCs.|
|Type of Material:||Princeton University Senior Theses|
|Appears in Collections:||Computer Science, 1988-2017|
Files in This Item:
|PUTheses2015-Smith_Jamie.pdf||413.62 kB||Adobe PDF||Request a copy|
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.