Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech

Abstract

There are a lot of features that can be extracted from speech signals for different applications such as automatic speech recognition or speaker verification. However, for pathological speech processing there is a need to extract features about the presence of the disease or the state of the patients that are comprehensible for clinical experts. Phonological posteriors are a group of features that can be interpretable by the clinicians and at the same time carry suitable information about the patient’s speech. This paper presents a tool to extract phonological posteriors directly from speech signals. The proposed method consists of a bank of parallel bidirectional recurrent neural networks to estimate the posterior probabilities of the occurrence of different phonological classes. The proposed models are able to detect the phonological classes with accuracies over 90%. In addition, the trained models are available to be used by the research community interested in the topic.

Publication
Twentieth Annual Conference of the International Speech Communication Association