Improve Singing Quality Prediction Using Self-supervised Transfer Learning and Human Perception Feedback

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The scarcity of expansive datasets for singing quality assessment makes the utilization of complex deep learning methods a considerable challenge. This research presents a method to improve the singing quality prediction based on the feedback from subjective human perception opinion that is learned by the transfer learning methods of self-supervised learning (SSL) speech models. In combination with the CRNN_PH model as the baseline model, the SSL models are integrated into two distinct major architectures: one directly draws features from the pre-trained SSL model (CRNN_PH+SSL), and the other employs the weighted sum (WS) of the output features from different transformer blocks in the SSL model (CRNN_PH+SSL_WS). We conducted comparative experiments on pre-trained SSL models, five on wav2vec 2.0 (W2V2) and two on HuBERT, which were trained over various datasets. It turns out that CRNN_PH+W2V2_base_WS is improved the most on singing quality score prediction that is closely aligning with subjective human perceptions in terms of correlation coefficients and MSE with respect to the ground truth.

Original languageEnglish
Title of host publicationProceedings of the 5th ACM International Conference on Multimedia in Asia, MMAsia 2023
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400702051
DOIs
StatePublished - 06 12 2023
Event5th ACM International Conference on Multimedia in Asia, MMAsia 2023 - Hybrid, Tainan, Taiwan
Duration: 06 12 202308 12 2023

Publication series

NameProceedings of the 5th ACM International Conference on Multimedia in Asia, MMAsia 2023

Conference

Conference5th ACM International Conference on Multimedia in Asia, MMAsia 2023
Country/TerritoryTaiwan
CityHybrid, Tainan
Period06/12/2308/12/23

Bibliographical note

Publisher Copyright:
© 2023 Copyright held by the owner/author(s).

Keywords

  • Automatic singing quality evaluation
  • Self-Supervised Learning
  • Transfer learning

Fingerprint

Dive into the research topics of 'Improve Singing Quality Prediction Using Self-supervised Transfer Learning and Human Perception Feedback'. Together they form a unique fingerprint.

Cite this