Language identification in code-switching speech usingword-based lexical model

Dau Cheng Lyu*, Cing Lei Zhu, Ren Yuan Lyu, Ming Tat Ko

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

In this paper, a language identification (LID) task is described on Mandarin/Taiwanese code-switching utterances. The proposed word-based lexical model of this LID system integrates acoustic, phonetic and lexical cues. The first two cues are obtained from a large vocabulary continuous speech recognition (LVCSR) system, and the last one is trained for a word-based lexical model. The lexical model is used to identify languages according to the frequency and context of each word by given a sequence of words recognized by the LVCSR system. Because the switching unit in the code-switching speech is a word, the experiments showed that, by using a word-based lexical model, 16% relative reduction of classification errors was achieved compared with that in those LVCSR-basedLID systems.

Original languageEnglish
Title of host publication2010 7th International Symposium on Chinese Spoken Language Processing, ISCSLP 2010 - Proceedings
Pages460-464
Number of pages5
DOIs
StatePublished - 2010
Event2010 7th International Symposium on Chinese Spoken Language Processing, ISCSLP 2010 - Tainan, Taiwan
Duration: 29 11 201003 12 2010

Publication series

Name2010 7th International Symposium on Chinese Spoken Language Processing, ISCSLP 2010 - Proceedings

Conference

Conference2010 7th International Symposium on Chinese Spoken Language Processing, ISCSLP 2010
Country/TerritoryTaiwan
CityTainan
Period29/11/1003/12/10

Keywords

  • Code-switching
  • Mandarin
  • Speech recognition
  • Taiwanese

Fingerprint

Dive into the research topics of 'Language identification in code-switching speech usingword-based lexical model'. Together they form a unique fingerprint.

Cite this