Automatic emotion recognition in compressed speech using acoustic and non-linear features

Abstract

Automatic recognition of emotions in speech has attracted the attention of the research community in recent years. Some of the most relevant proposed applications of it are in call-centers. In these scenarios the speech is distorted by compression algorithms. The effects of such distortion on the performance of systems for automatic recognition of emotions must be assessed. In this study these effects are evaluated independently of any other distortions generated by the communications channel. Several state-of-the-art codecs are used to compress the speech signals of two emotional speech databases. The databases used are the Berlin Database of Emotional Speech and the enterface05. The methodology considers voiced and unvoiced segments of the speech separately. Spectral, cepstral, noise and Non-Linear Dynamics (NLD) measures are used to characterize the segments. Finally, a classifier based on a Gaussian Mixture Model (GMM) is used to identify the emotion. The results indicate that voiced segments are less affected by the compression than unvoiced ones in terms in classification accuracy. They also show that the bandwidth of the analyzed signals is an important factor in the classification results.

Publication
2015 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA)