Modified-VQ Features for Speech Emotion Recognition
Hemanta Kumar Palo
Mihir Narayan Mohanty
Objective: Features of the signal has major role for recognition, classification and detection task. Less number of features for effective recognition is the challenge that motivates the authors to proceed in this respect. In this study, a modified Vector Quantized (VQ) feature for emotional speech recognition has been proposed. Methodology: The proposed feature is based on statistical VQ and differential VQ statistics of frame-level prosodic features derived at utterance level. Further, the combination of frame-level baseline features, VQ based frame-level prosodic features and modified VQ prosodic features at utterance level are compared and analyzed. Neural network based classifiers as multilayer perceptron (MLP) and Radial Basis Function Network (RBFN) has been tested with proposed combinations. Standard Berlin emotional (EMO-DB) database and a locally collected emotional speech database have been used for validation of the methods. Results: The modified VQ feature combinations outperformed all other feature combinations in terms of classification accuracy and Mean Square Error (MSE). Conclusion: Result exhibited highest accuracy of 91.08% with RBFN and 89.93% with MLP classifiers respectively with modified VQ based feature combination for EMO-DB database. As against it the recognition was 90.38 and 88.05% with VQ based prosodic feature combination and 85.79 and 84.04% with frame level prosodic feature combination, respectively.
Cited References Fulltext