Biphone-rich versus tripohne-rich: A comparison of speech corpora in automatic speech recognition

Yong Chang Yio*, Min Siong Liang, Yuang Chin Chiang, Ren Yuan Lyu

*Corresponding author for this work

Research output: Contribution to conferenceConference Paperpeer-review

1 Scopus citations

Abstract

In this paper, we compare the performance of a speech recognition system trained with two speech corpora. We select two set of words such that they covered all the cross-syllable bi-phones and tri-phones, and are called phonetically bi-phone-rich and tri-phone-rich respectively. It is required about 10 times more words than that of cross-syllable bi-phones to cover all the cross-syllable tri-phones. To facilitate fair comparison, the bi-phone-rich corpus is thus consisted often sets of words that each covers all the cross-syllable bi-phones. With those words as data sheets, a male Taiwanese speaker recorded all the words as microphone speech. The resulting speech corpora, about 100 minutes for each set, are used to train for the acoustic models. Although both perform quite well in tasks with recognition networks of linear net and free syllable net the tri-phone-rich corpus does not show much advantages over the bi-phone-rich corpus.

Original languageEnglish
Pages194-197
Number of pages4
StatePublished - 2005
Event9th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA - Hsinchu, Taiwan
Duration: 28 05 200530 05 2005

Conference

Conference9th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA
Country/TerritoryTaiwan
CityHsinchu
Period28/05/0530/05/05

Fingerprint

Dive into the research topics of 'Biphone-rich versus tripohne-rich: A comparison of speech corpora in automatic speech recognition'. Together they form a unique fingerprint.

Cite this