Semi-natural and spontaneous speech recognition using deep neural networks with hybrid features unification

Ammar Amjad, Lal Khan, Hsien Tsung Chang*

*此作品的通信作者

研究成果: 期刊稿件文章同行評審

12 引文 斯高帕斯(Scopus)

摘要

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.

原文英語
文章編號2286
期刊Processes
9
發行號12
DOIs
出版狀態已出版 - 12 2021

文獻附註

Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.

指紋

深入研究「Semi-natural and spontaneous speech recognition using deep neural networks with hybrid features unification」主題。共同形成了獨特的指紋。

引用此