Deep Residual Network with D-S Evidence Theory for Bimodal Emotion Recognition
Article 2021 en
Authors
YL
Yulong Liu
LC
Luefeng Chen
ML
Min Li
Abstract
1 min read
In this paper, the Deep Residual Network (ResNet) with Dempster-Shafer (D-S) evidence theory is presented for bimodal emotion recognition through applying facial expression and speech emotion information. By acquiring discriminative emotion features and performing bimodal fusion of emotions, this method can overcome the limitations of single modal emotion recognition and obtain higher recognition accuracy. The key areas of emotional features and spectrograms are firstly used to acquire low-level characteristics of emotion. Moreover, two ResNets are designed to select high-level emotion semantic features. Furthermore, under the structure of D-S evidence theory, the output probability values are used for achieving emotion fusion to improve the effectiveness of bimodal emotion recognition. The experimental studies on the eNTERFACE’05 database demonstrate a recognition accuracy of 88.67%, which is a noteworthy improvement of 23.11% and 9.32% compared to an individual mode of facial expressions and speech, respectively.
Discussion(0)
No comments yet. Be the first to comment.