Audio Features Analysis for Speech Emotion Recognition
Keywords:
speech emotion recognition, feature selection, info gain ratio, chi square, classification, support vector machineAbstract
In this paper audio features analysis is performed using two emotional speech databases: SAVEE in English language and EMO-DB in German language. A diverse set of more than 6000 acoustic features were extracted. The extracted features were normalized using z-score and min-max techniques, followed by feature selection using correlation, chi-square, gain ratio, and info gain ratio methods. Finally, classification was performed using various classifiers: support vector machine, Bayes net, meta and trees. The best classification result of 78.5% was achieved for seven emotion classes on Surrey audio-visual expressed emotion database using support vector machine classifier with 3500 features. The best result of 87.1% was obtained for the Berlin emotional speech database using support vector machine classifier with 4000 features. Classification performances comparable to human were obtained for both the databases. The Mel-spectrum, cepstral and spectral features were found most discriminative for audio emotion classification.
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.