Abstract
In this paper, we conducted model finetuning on OpenAI's Whisper for Taiwanese languages, enabling Whisper to generate both Mandarin and Taiwanese text outputs. We employed Hugging Face's official Whisper models, namely Medium and Large-v2, and their finetuning methodology. Additionally, we utilized the Taiwanese dataset from CommonVoice and collected around 800 hours of Taiwanese drama videos along with their subtitle files from the internet. The achieved Character Error Rate (CER) reached approximately 50.7%. We will provide the code we have fine-tuned in the subsequent updates.
Translated title of the contribution | Taiwanese/Mandarin Speech Recognition using OpenAI's Whisper Multilingual Speech Recognition Engine Based on Generative Pretrained Transformer Architecture |
---|---|
Original language | Chinese (Traditional) |
Title of host publication | ROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing |
Editors | Jheng-Long Wu, Ming-Hsiang Su, Hen-Hsen Huang, Yu Tsao, Hou-Chiang Tseng, Chia-Hui Chang, Lung-Hao Lee, Yuan-Fu Liao, Wei-Yun Ma |
Publisher | The Association for Computational Linguistics and Chinese Language Processing (ACLCLP) |
Pages | 210-214 |
Number of pages | 5 |
ISBN (Electronic) | 9789869576963 |
State | Published - 2023 |
Event | 35th Conference on Computational Linguistics and Speech Processing, ROCLING 2023 - Taipei City, Taiwan Duration: 20 10 2023 → 21 10 2023 |
Publication series
Name | ROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing |
---|
Conference
Conference | 35th Conference on Computational Linguistics and Speech Processing, ROCLING 2023 |
---|---|
Country/Territory | Taiwan |
City | Taipei City |
Period | 20/10/23 → 21/10/23 |
Bibliographical note
Publisher Copyright:© 2023 ROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing. All rights reserved.