Abstract
The burgeoning field of text-to-music generation models has shown great promise in their ability to generate high-quality music aligned with users' textual descriptions. These models effectively capture abstract/global musical features such as style and mood. However, they often inadequately produce the precise rendering of critical music loop attributes, including melody, rhythms, and instrumentation, which are essential for modern music loop production. To overcome this limitation, this paper proposed a Loops Transformer and a Multi-Stage Cross Attention mechanism that enable a cohesive integration of textual and MIDI input specifications. Additionally, a novel Instrument-Aware Reinforcement Learning technique was introduced to ensure the correct adoption of instrumentation. We demonstrated that the proposed model can generate music loops that simultaneously satisfy the conditions specified by both natural language texts and MIDI input, ensuring coherence between the two modalities. We also showed that our model outperformed the state-of-the-art baseline model, MusicGen, in both objective metrics (by lowering the FAD score by 1.3, indicating superior quality with lower scores, and by improving the Normalized Dynamic Time Warping Distance with given melodies by 12%) and subjective metrics (by +2.56% in OVL, +5.42% in REL, and +7.74% in Loop Consistency). These improvements highlight our model's capability to produce musically coherent loops that satisfy the complex requirements of contemporary music production, representing a notable advancement in the field. Generated music loop samples can be explored at: https://loopstransformer.netlify.app/.
Original language | English |
---|---|
Title of host publication | MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia |
Publisher | Association for Computing Machinery, Inc |
Pages | 6851-6859 |
Number of pages | 9 |
ISBN (Electronic) | 9798400706868 |
DOIs | |
State | Published - 28 10 2024 |
Event | 32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia Duration: 28 10 2024 → 01 11 2024 |
Publication series
Name | MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia |
---|
Conference
Conference | 32nd ACM International Conference on Multimedia, MM 2024 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 28/10/24 → 01/11/24 |
Bibliographical note
Publisher Copyright:© 2024 ACM.
Keywords
- controllable music generation
- loop generation
- reinforcement learning
- residual vector quantization
- text-to-music generation
- transformer