Abstract
Singing techniques are important skills for a professional vocal performance that usually involves dedicated fluctuations of timbre, pitch, duration, and loudness, etc. To recognize types of singing techniques can be quite challenging because 1) the time-frequency features in singing are highly dynamic that may appear in a long range of audio signals; 2) different singing techniques such as vibrato and trill tend to have similar features in the locality; 3) The distribution of singing technique dataset suffers from the long-tailed issue. To man-age these problems, we proposed a novel Radial Attention Transformer (RAT) with a Radial Attention (RA) Module that can capture the fine-grained local features as well as the long range inter-dependency of audio features. The experiment results showed that the proposed method, RAT with Adaptive Logit Adjustment (ALA) Loss significantly outperformed pre-vious state-of-the-art models (Convolutional Neural Networks and Deformable CNN), on the recognition tasks of singing technique categories.
| Original language | English |
|---|---|
| Title of host publication | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781728163277 |
| DOIs | |
| State | Published - 2023 |
| Event | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: 04 06 2023 → 10 06 2023 |
Publication series
| Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
|---|---|
| Volume | 2023-June |
| ISSN (Print) | 1520-6149 |
Conference
| Conference | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 |
|---|---|
| Country/Territory | Greece |
| City | Rhodes Island |
| Period | 04/06/23 → 10/06/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- local attention
- logit adjustment loss
- singing technique
- sparse global attention
- transformer