Speech Emotion Recognition for Indonesian Language Using Long Short-Term Memory

With the rapid development in social media, single-modal emotion recognition is hard to satisfy the demands of the current emotional recognition system. Aiming to optimize the performance of the emotional recognition system, a multimodal emotion recognition model from speech and text was proposed in this paper. Considering the complementarity between different modes, CNN (convolutional neural network) and LSTM (long short-term memory) were combined in a form of binary channels to learn acoustic emotion features; meanwhile, an effective Bi-LSTM (bidirectional long short-term memory) network was resorted to capture the textual features. Furthermore, we applied a deep neural network to learn and classify the fusion features. The final emotional state was determined by the output of both speech and text emotion analysis. Finally, the multimodal fusion experiments were carried out to validate the proposed model on the IEMOCAP database. In comparison with the single modal, the overall recognition accuracy of text increased 6.70%, and that of speech emotion recognition soared 13.85%. Experimental results show that the recognition accuracy of our multimodal is higher than that of the single modal and outperforms other published multimodal models on the test datasets.

Download Full-text

Attention-based convolution skip bidirectional long short-term memory network for speech emotion recognition

IEEE Access ◽

10.1109/access.2020.3047395 ◽

2020 ◽

pp. 1-1

Author(s):

Huiyun Zhang ◽

Heming Huang ◽

Henry Han

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Speech Emotion Recognition ◽

Short Term ◽

Term Memory ◽

Memory Network ◽

Long Short Term Memory

Download Full-text

Speech emotion recognition using convolutional long short-term memory neural network and support vector machines

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.1109/apsipa.2017.8282315 ◽

2017 ◽

Cited By ~ 1

Author(s):

Nattapong Kurpukdee ◽

Tomoki Koriyama ◽

Takao Kobayashi ◽

Sawit Kasuriya ◽

Chai Wutiwiwatchai ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machines ◽

Emotion Recognition ◽

Short Term Memory ◽

Speech Emotion Recognition ◽

Support Vector ◽

Short Term ◽

Term Memory ◽

Vector Machines ◽

Long Short Term Memory

Download Full-text

Dance Emotion Recognition Based on Laban Motion Analysis Using Convolutional Neural Network and Long Short-Term Memory

IEEE Access ◽

10.1109/access.2020.3007956 ◽

2020 ◽

Vol 8 ◽

pp. 124928-124938 ◽

Cited By ~ 1

Author(s):

Simin Wang ◽

Junhuai Li ◽

Ting Cao ◽

Huaijun Wang ◽

Pengjia Tu ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Motion Analysis ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Long Short Term Memory Hyperparameter Optimization for a Neural Network Based Emotion Recognition Framework

IEEE Access ◽

10.1109/access.2018.2868361 ◽

2018 ◽

Vol 6 ◽

pp. 49325-49338 ◽

Cited By ~ 20

Author(s):

Bahareh Nakisa ◽

Mohammad Naim Rastgoo ◽

Andry Rakotonirainy ◽

Frederic Maire ◽

Vinod Chandran

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Short Term Memory ◽

Short Term ◽

Hyperparameter Optimization ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Emotion Recognition based on Fusion of Long Short-Term Memory Networks and SVMs

Digital Signal Processing ◽

10.1016/j.dsp.2021.103153 ◽

2021 ◽

pp. 103153

Author(s):

Tian Chen ◽

Hongfang Yin ◽

Xiaohui Yuan ◽

Yu Gu ◽

Fuji Ren ◽

...

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

Electronics ◽

10.3390/electronics9050713 ◽

2020 ◽

Vol 9 (5) ◽

pp. 713 ◽

Cited By ~ 3

Author(s):

Yeonguk Yu ◽

Yoon-Joong Kim

Keyword(s):

Emotion Recognition ◽

Motion Capture ◽

Short Term Memory ◽

Speech Emotion Recognition ◽

Main Study ◽

Short Term ◽

Attention Model ◽

Proposed Model ◽

Long Short Term Memory ◽

Spectrogram Feature

We propose a speech-emotion recognition (SER) model with an “attention-long Long Short-Term Memory (LSTM)-attention” component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model’s performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.

Download Full-text