Abstract
In this paper, we describe an initial stage to construct a multilingual speech corpus in Taiwan with selecting phonetically balanced scripts. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, constructing a multilingual phonetic alphabet, namely Formosa Phonetic Alphabet (ForPA), is the first step. In addition, the multilingual lexicons (Fomosa Lexicons) are also important parts for building the corpus. Recently, this corpus containing 2, 300 speakers' speech database has been finished and is ready to be released. It contains about 200 hours of speech and 404, 000 utterances.
| Original language | English |
|---|---|
| Pages | 2737-2740 |
| Number of pages | 4 |
| State | Published - 2004 |
| Externally published | Yes |
| Event | 8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of Duration: 04 10 2004 → 08 10 2004 |
Conference
| Conference | 8th International Conference on Spoken Language Processing, ICSLP 2004 |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Jeju, Jeju Island |
| Period | 04/10/04 → 08/10/04 |
Fingerprint
Dive into the research topics of 'Construct a multi-lingual speech corpus in Taiwan with extracting phonetically balanced articles'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver