Abstract
The scarcity of expansive datasets for singing quality assessment makes the utilization of complex deep learning methods a considerable challenge. This research presents a method to improve the singing quality prediction based on the feedback from subjective human perception opinion that is learned by the transfer learning methods of self-supervised learning (SSL) speech models. In combination with the CRNN_PH model as the baseline model, the SSL models are integrated into two distinct major architectures: one directly draws features from the pre-trained SSL model (CRNN_PH+SSL), and the other employs the weighted sum (WS) of the output features from different transformer blocks in the SSL model (CRNN_PH+SSL_WS). We conducted comparative experiments on pre-trained SSL models, five on wav2vec 2.0 (W2V2) and two on HuBERT, which were trained over various datasets. It turns out that CRNN_PH+W2V2_base_WS is improved the most on singing quality score prediction that is closely aligning with subjective human perceptions in terms of correlation coefficients and MSE with respect to the ground truth.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 5th ACM International Conference on Multimedia in Asia, MMAsia 2023 |
| Publisher | Association for Computing Machinery, Inc |
| ISBN (Electronic) | 9798400702051 |
| DOIs | |
| State | Published - 06 12 2023 |
| Event | 5th ACM International Conference on Multimedia in Asia, MMAsia 2023 - Hybrid, Tainan, Taiwan Duration: 06 12 2023 → 08 12 2023 |
Publication series
| Name | Proceedings of the 5th ACM International Conference on Multimedia in Asia, MMAsia 2023 |
|---|
Conference
| Conference | 5th ACM International Conference on Multimedia in Asia, MMAsia 2023 |
|---|---|
| Country/Territory | Taiwan |
| City | Hybrid, Tainan |
| Period | 06/12/23 → 08/12/23 |
Bibliographical note
Publisher Copyright:© 2023 Copyright held by the owner/author(s).
Keywords
- Automatic singing quality evaluation
- Self-Supervised Learning
- Transfer learning