運用基於生成預訓練轉換器架構的 OpenAI Whisper 多語言語音辨識引擎之台語及華語語音辨識之實作

Translated title of the contribution: Taiwanese/Mandarin Speech Recognition using OpenAI's Whisper Multilingual Speech Recognition Engine Based on Generative Pretrained Transformer Architecture

Yueh Che Hsieh, Ke Ming Lyu, Ren Yuan Lyu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we conducted model finetuning on OpenAI's Whisper for Taiwanese languages, enabling Whisper to generate both Mandarin and Taiwanese text outputs. We employed Hugging Face's official Whisper models, namely Medium and Large-v2, and their finetuning methodology. Additionally, we utilized the Taiwanese dataset from CommonVoice and collected around 800 hours of Taiwanese drama videos along with their subtitle files from the internet. The achieved Character Error Rate (CER) reached approximately 50.7%. We will provide the code we have fine-tuned in the subsequent updates.

Translated title of the contributionTaiwanese/Mandarin Speech Recognition using OpenAI's Whisper Multilingual Speech Recognition Engine Based on Generative Pretrained Transformer Architecture
Original languageChinese (Traditional)
Title of host publicationROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing
EditorsJheng-Long Wu, Ming-Hsiang Su, Hen-Hsen Huang, Yu Tsao, Hou-Chiang Tseng, Chia-Hui Chang, Lung-Hao Lee, Yuan-Fu Liao, Wei-Yun Ma
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages210-214
Number of pages5
ISBN (Electronic)9789869576963
StatePublished - 2023
Event35th Conference on Computational Linguistics and Speech Processing, ROCLING 2023 - Taipei City, Taiwan
Duration: 20 10 202321 10 2023

Publication series

NameROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing

Conference

Conference35th Conference on Computational Linguistics and Speech Processing, ROCLING 2023
Country/TerritoryTaiwan
CityTaipei City
Period20/10/2321/10/23

Bibliographical note

Publisher Copyright:
© 2023 ROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing. All rights reserved.

Fingerprint

Dive into the research topics of 'Taiwanese/Mandarin Speech Recognition using OpenAI's Whisper Multilingual Speech Recognition Engine Based on Generative Pretrained Transformer Architecture'. Together they form a unique fingerprint.

Cite this