A Few Shot Learning of Singing Technique Conversion Based on Cycle Consistency Generative Adversarial Networks

Po Wei Chen, Von Wun Soo

Research output: Contribution to journalConference articlepeer-review

Abstract

We adopt the recent cycle consistent generative adversarial network (MaskCycleGAN-VC) that allows converting a specific singing technique using only a few articulations of singing voice as examples. Since it is often prone to fail to preserve the content information of the singing voice due to distortion and noise during the conversion, a self-supervised learning module is proposed as the basic framework to enforce content consistency without additional annotations. We evaluate the proposed methods on three datasets that were commonly used in pop songs which involve singing techniques in terms of breathy voice, vibrato, and vocal fry. Experiments showed that our proposed methods outperform the baseline in terms of audio quality and content preservation, including melody and singer's timbral identity, without affecting the perception of singing techniques.

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • cycle consistency
  • few shot learning
  • generative adversarial networks
  • singing technique conversion
  • triplet learning

Fingerprint

Dive into the research topics of 'A Few Shot Learning of Singing Technique Conversion Based on Cycle Consistency Generative Adversarial Networks'. Together they form a unique fingerprint.

Cite this