Construct a multi-lingual speech corpus in Taiwan with extracting phonetically balanced articles

  • Min Siong Liang
  • , Dau Cheng Lyu
  • , Yuang Chin Chiang
  • , Ren Yuan Lyu

Research output: Contribution to conferenceConference Paperpeer-review

1 Scopus citations

Abstract

In this paper, we describe an initial stage to construct a multilingual speech corpus in Taiwan with selecting phonetically balanced scripts. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, constructing a multilingual phonetic alphabet, namely Formosa Phonetic Alphabet (ForPA), is the first step. In addition, the multilingual lexicons (Fomosa Lexicons) are also important parts for building the corpus. Recently, this corpus containing 2, 300 speakers' speech database has been finished and is ready to be released. It contains about 200 hours of speech and 404, 000 utterances.

Original languageEnglish
Pages2737-2740
Number of pages4
StatePublished - 2004
Externally publishedYes
Event8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of
Duration: 04 10 200408 10 2004

Conference

Conference8th International Conference on Spoken Language Processing, ICSLP 2004
Country/TerritoryKorea, Republic of
CityJeju, Jeju Island
Period04/10/0408/10/04

Fingerprint

Dive into the research topics of 'Construct a multi-lingual speech corpus in Taiwan with extracting phonetically balanced articles'. Together they form a unique fingerprint.

Cite this