An efficient algorithm to select phonetically balanced scripts for constructing a speech corpus

Min Siong Liang, Ren Yuan Lyu, Yuang Chin Chiang

研究成果: 圖書/報告稿件的類型會議稿件同行評審

8 引文 斯高帕斯(Scopus)

摘要

In this paper, we describe an efficient algorithm to select ph onetically balanced scripts for collecti ng a large-scale multilingual speech corpus. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, the first step is to construct a multilingual phonetic alphabet, namely Formosa Phonetic Alphabet (ForPA). In addition, the multilingual lexicons (Fomosa Lexicons) are also important parts for building the corpus. Until now, this corpus containing 600 speakers' speech of Taiwanese (Min-nan) and Mandarin Chinese has been finished and ready to release. There contains about 40 hours of speech in 247 thousand utterances in thi s release.

原文英語
主出版物標題NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings
編輯Chengqing Zong
發行者Institute of Electrical and Electronics Engineers Inc.
頁面433-437
頁數5
ISBN(電子)0780379020, 9780780379022
DOIs
出版狀態已出版 - 2003
事件International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003 - Beijing, 中國
持續時間: 26 10 200329 10 2003

出版系列

名字NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings

Conference

ConferenceInternational Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003
國家/地區中國
城市Beijing
期間26/10/0329/10/03

文獻附註

Publisher Copyright:
© 2003 IEEE.

指紋

深入研究「An efficient algorithm to select phonetically balanced scripts for constructing a speech corpus」主題。共同形成了獨特的指紋。

引用此