Using attention networks and adversarial augmentation for styrian dialect continuous sleepiness and baby sound recognition

Sung Lin Yeh, Gao Yi Chao, Bo Hao Su, Yu Lin Huang, Meng Han Lin, Yin Chun Tsai, Yu Wen Tai, Zheng Chi Lu, Chieh Yu Chen, Tsung Ming Tai, Chiu Wang Tseng, Cheng Kuang Lee, Chi Chun Lee

Research output: Contribution to journalConference articlepeer-review

14 Scopus citations

Abstract

In this study, we present extensive attention-based networks with data augmentation methods to participate in the INTERSPEECH 2019 ComPareE Challenge, specifically the three Sub-challenges: Styrian Dialect Recognition, Continuous Sleepiness Regression, and Baby Sound Classification. For Styrian Dialect Sub-challenge, these dialects are classified into Northern Styrian (NorthernS), Urban Sytrian (UrbanS), and Eastern Styrian (EasternS). Our proposed model achieves an UAR 49.5% on the test set, which is 2.5% higher than the baseline. For Continuous Sleepiness Sub-challenge, it is defined as a regression task with score range from 1 (extremely alert) to 9 (very sleepy). In this work, our proposed architecture achieves a Spearman correlation 0.369 on the test set, which surpasses the baseline model by 0.026. For Baby Sound Sub-challenge, the infant sounds are classified into canonical babbling, non-canonical babbling, crying, laughing and junk/other, and our proposed augmentation framework achieves an UAR of 62.39% on the test set, which outperforms the baseline by about 3.7%. Overall, our analyses demonstrate that by fusing attention network models with conventional support vector machine benefits the test set robustness, and the recognition rates of these paralinguistic attributes generally improve when performing data augmentation.

Original languageEnglish
Pages (from-to)2398-2402
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2019-September
DOIs
StatePublished - 2019
Externally publishedYes
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: 15 09 201919 09 2019

Bibliographical note

Publisher Copyright:
Copyright © 2019 ISCA

Keywords

  • Adversarial learning
  • Attention networks
  • Augmentation
  • Computational paralinguistics

Fingerprint

Dive into the research topics of 'Using attention networks and adversarial augmentation for styrian dialect continuous sleepiness and baby sound recognition'. Together they form a unique fingerprint.

Cite this