Emotion recognition from speech with acoustic, non-linear and wavelet-based features extracted in different acoustic conditions

Abstract

In the last years, there has a great progress in automatic speech recognition. The challenge now it is not only recognize the semantic content in the speech but also the called ‘paralinguistic’ aspects of the speech, including the emotions, and the personality of the speaker. This research work aims in the development of a methodology for the automatic emotion recognition from speech signals in non-controlled noise conditions. For that purpose, different sets of acoustic, non-linear, and wavelet based features are used to characterize emotions in different databases created for such purpose. The acoustic analysis considers a standard feature set developed for emotion recognition from speech called OpenEAR, and a set of spectral and noise derived measures. The non-linear analysis is based on non-linear dynamic measures and include the correlation dimension, the largest Lyapunov exponent, the Hurst exponent, and the Lempel Ziv complexity. Also it is proposed a set of measures derived from parametric non-stationary analysis using time dependent ARMA models. The wavelet based measures consider features derived from the wavelet packet transform, and different wavelet time-frequency representations such as the bionic wavelet transform, and the synchro-squeezed wavelet transform. Different non-controlled noise conditions are tested considering four different scenarios: (1) the original recordings, (2) the signals degraded by two additive noisy environments: street and a cafeteria babble, (3) the re-captured signals in two natural noisy environments asstreet and office, and (4) the recordings compressed by seven different codecs used for the transmission through mobile, VoIP, and web based telephone channels. Also two different speech enhancement algorithms are tested to evaluate if they are suitable to improve the results in the classification of emotions in noisy speech signals. A classification scheme based on the combination of Gaussian mixture models and Support vector machines is used for the analysis

Publication
Master thesis, Faculty of Engineering, University of Antioquia