Audio Features Analysis for Speech Emotion Recognition

Authors

  • Saeeda Begum Department of Electronics, University of Peshawar, Peshawar 25120, Pakistan
  • Sana Ul Haq Department of Electronics, University of Peshawar
  • Muhammad Saeed Shah Department of Electronics, University of Peshawar, Peshawar 25120, Pakistan
  • Muhammad Kamran Department of Electronics, University of Peshawar, Peshawar 25120, Pakistan
  • Imtiaz Rasool Department of Electronics, University of Peshawar, Peshawar 25120, Pakistan

Keywords:

speech emotion recognition, feature selection, info gain ratio, chi square, classification, support vector machine

Abstract

In this paper audio features analysis is performed using two emotional speech databases: SAVEE in English language and EMO-DB in German language. A diverse set of more than 6000 acoustic features were extracted. The extracted features were normalized using z-score and min-max techniques, followed by feature selection using correlation, chi-square, gain ratio, and info gain ratio methods. Finally, classification was performed using various classifiers: support vector machine, Bayes net, meta and trees. The best classification result of 78.5% was achieved for seven emotion classes on Surrey audio-visual expressed emotion database using support vector machine classifier with 3500 features. The best result of 87.1% was obtained for the Berlin emotional speech database using support vector machine classifier with 4000 features. Classification performances comparable to human were obtained for both the databases. The Mel-spectrum, cepstral and spectral features were found most discriminative for audio emotion classification.

Downloads

Published

2024-01-25