Abstract
In this paper, we compare the performance of a speech recognition system trained with two speech corpora. We select two set of words such that they covered all the cross-syllable bi-phones and tri-phones, and are called phonetically bi-phone-rich and tri-phone-rich respectively. It is required about 10 times more words than that of cross-syllable bi-phones to cover all the cross-syllable tri-phones. To facilitate fair comparison, the bi-phone-rich corpus is thus consisted often sets of words that each covers all the cross-syllable bi-phones. With those words as data sheets, a male Taiwanese speaker recorded all the words as microphone speech. The resulting speech corpora, about 100 minutes for each set, are used to train for the acoustic models. Although both perform quite well in tasks with recognition networks of linear net and free syllable net the tri-phone-rich corpus does not show much advantages over the bi-phone-rich corpus.
Original language | English |
---|---|
Pages | 194-197 |
Number of pages | 4 |
State | Published - 2005 |
Event | 9th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA - Hsinchu, Taiwan Duration: 28 05 2005 → 30 05 2005 |
Conference
Conference | 9th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA |
---|---|
Country/Territory | Taiwan |
City | Hsinchu |
Period | 28/05/05 → 30/05/05 |