Skip to main content

SPEECH EMOTION RECOGNITION VIA TWO-STREAM POOLING ATTENTION WITH DISCRIMINATIVE CHANNEL WEIGHTING

Ke Liu (Northwest University); Dekui Wang (Northwest University); Dongya Wu (Northwest University); Jun Feng (Northwest University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Multi-view Speech Emotion Recognition (SER) based on the pre-trained model has achieved success in speaker-independent scenarios. However, the existing SER methods rely on excessive feature views and have complicated feature fusion strategies. In this paper, we propose a novel method to learn effective emotion-related information from two feature views. First, we present a Discriminative Channel Weighting (DCW) module to weight the channel dimension of the features produced by a set of multi-scale convolution layers. This module allows for discriminative weighting of complex channel dimensions. Second, a concise Two-stream Pooling Attention (TsPA) strategy is proposed to generate two groups of fusion features based on different channel-level embeddings with different emphasis. Finally, the SER task is completed by three consecutive fully connected layers. The effectiveness of the proposed method has been demonstrated on two speaker-independent validation strategies, outperforming other state-of-the-art approaches.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00