Abstract
In this study, we present extensive attention-based networks with data augmentation methods to participate in the INTERSPEECH 2019 ComPareE Challenge, specifically the three Sub-challenges: Styrian Dialect Recognition, Continuous Sleepiness Regression, and Baby Sound Classification. For Styrian Dialect Sub-challenge, these dialects are classified into Northern Styrian (NorthernS), Urban Sytrian (UrbanS), and Eastern Styrian (EasternS). Our proposed model achieves an UAR 49.5% on the test set, which is 2.5% higher than the baseline. For Continuous Sleepiness Sub-challenge, it is defined as a regression task with score range from 1 (extremely alert) to 9 (very sleepy). In this work, our proposed architecture achieves a Spearman correlation 0.369 on the test set, which surpasses the baseline model by 0.026. For Baby Sound Sub-challenge, the infant sounds are classified into canonical babbling, non-canonical babbling, crying, laughing and junk/other, and our proposed augmentation framework achieves an UAR of 62.39% on the test set, which outperforms the baseline by about 3.7%. Overall, our analyses demonstrate that by fusing attention network models with conventional support vector machine benefits the test set robustness, and the recognition rates of these paralinguistic attributes generally improve when performing data augmentation.
Original language | English |
---|---|
Pages (from-to) | 2398-2402 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2019-September |
DOIs | |
State | Published - 2019 |
Externally published | Yes |
Event | 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: 15 09 2019 → 19 09 2019 |
Bibliographical note
Publisher Copyright:Copyright © 2019 ISCA
Keywords
- Adversarial learning
- Attention networks
- Augmentation
- Computational paralinguistics