You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
To improve the recognition rate of the speaker recognition system, a model scheme combined with the Additive Margin--Softmax loss function is proposed from the perspective of model differentiation and based on the fusion of Convolutional Neural Network and Gated Recurrent Unit, which not only reduces the distance of similar sample features and increases the distance among different types of sample features simultaneously but also uses layer normalization to constrain the distribution of high-dimensional features. In order to address the problem of poor robustness of the speaker recognition system in real scenes, the SpecAugment data enhancement method is proposed to train the speaker model to combat external environmental interference. Based on the experimental data, the speech recognition performance of the proposed and traditional methods is analyzed. The experimental results show that, compared with other models, the equal error rate based on the Additive Margin--Convolutional Neural Network--Gated Recurrent Unit method is 4.48%, and the recognition rate is 99.18%. Adding layer normalization to the training model can improve the training speed to a certain extent, and the speaker model has better robustness.
Author (s): Lan, Chaofeng; Wang, Yuqiao; Zhang, Lei; Zhao, Hongyun
Affiliation:
College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; Beidahuang Industry Group General Hospital, Harbin, China; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China
(See document for exact affiliation information.)
Publication Date:
2022-07-06
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=21827
(542KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Lan, Chaofeng; Wang, Yuqiao; Zhang, Lei; Zhao, Hongyun; 2022; Research on Additive Margin Softmax Speaker Recognition Based on Convolutional and Gated Recurrent Neural Networks [PDF]; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; Beidahuang Industry Group General Hospital, Harbin, China; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; Paper ; Available from: https://aes2.org/publications/elibrary-page/?id=21827
Lan, Chaofeng; Wang, Yuqiao; Zhang, Lei; Zhao, Hongyun; Research on Additive Margin Softmax Speaker Recognition Based on Convolutional and Gated Recurrent Neural Networks [PDF]; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; Beidahuang Industry Group General Hospital, Harbin, China; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; Paper ; 2022 Available: https://aes2.org/publications/elibrary-page/?id=21827
@article{lan2022research,
author={lan chaofeng and wang yuqiao and zhang lei and zhao hongyun},
journal={journal of the audio engineering society},
title={research on additive margin softmax speaker recognition based on convolutional and gated recurrent neural networks},
year={2022},
volume={70},
issue={7/8},
pages={611-620},
month={july},}
TY – paper
TI – Research on Additive Margin Softmax Speaker Recognition Based on Convolutional and Gated Recurrent Neural Networks
SP – 611 EP – 620
AU – Lan, Chaofeng
AU – Wang, Yuqiao
AU – Zhang, Lei
AU – Zhao, Hongyun
PY – 2022
JO – Journal of the Audio Engineering Society
VO – 70
IS – 7/8
Y1 – July 2022