Evaluation of the effects of speech enhancement algorithms on the detection of fundamental frequency of speech

Abstract

The estimation of the fundamental frequency (F0) in speech is a very important task that has been addressed by many researchers. F0 estimation can be used to separate two kind of frames from an utterance, those where the vocal folds vibrate (voiced sounds) and those where not (unvoiced sounds). The methods used to estimate F0 are affected by the presence of additive noise in recordings made in non-controlled environments, however, there are different techniques to mitigate the effect of such noise and Speech Enhancement (SE) has proven to be one of the most effective ones. This article presents results of the evaluation of the effects of noise and SE algorithms on the detection of F0 and the signal segmentation in voiced/unvoiced segments. We performed experiments with signals artificially contaminated with two different kinds of noise, White Gaussian Noise (WGN) and background noise recorded at a cafeteria (Cafeteria babble), subsequently, the signals are processed with SE algorithms of four different classes: Wiener Filter, Spectral Subtraction, Statistical-Model Based and Sub-space algorithms. Two different kind of error metrics are considered: Gross Pitch Error and Voicing Determination Error. The results show that only the sub-space approach improves the performance in the detection of F0 and the signal segmentation in voiced/unvoicd segments.

Publication
2014 XIX Symposium on Image, Signal Processing and Artificial Vision