The interest in emotion recognition from speech has increased in the last decade. Emotion recognition can improve the quality of services and the quality of life of people. One of the main problems in emotion recognition from speech is to find suitable features to represent the phenomenon. This paper proposes new features based on the energy content of wavelet based time-frequency (TF) representations to model emotional speech. Three TF representations are considered: (1) the continuous wavelet transform, (2) the bionic wavelet transform, and (3) the synchro-squeezed wavelet transform. The classification is performed using GMM supervectors. Different classification problems are addressed, including high vs. low arousal, positive vs. negative valence, and multiple emotions. The results indicate that the proposed features are useful to classify high vs. low arousal emotions, and that the features derived from the synchro-squeezed wavelet transform are more suitable than the other two approaches to model emotional speech.