Improving ASR in Reverberant Environments

  • Yen Lun Liao*
  • , Chi Han Lin
  • , Ren Yuan Lyu
  • , Jyh Shing Roger Jang*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Automatic Speech Recognition (ASR) significantly reduces the effort to create audio transcripts. Despite its convenience, the performance of ASR is unstable in disturbing environments; for instance, indoor signals are usually corrupted by reverberation (reverb), resulting in diminished performance in ASR. A type of solution is to construct an acoustic dereverberation (dereverb) model to pre-process the original signals before submitting them to ASR. However, the acoustic properties of the output signal of the dereverb model differ from that of the training dataset for ASR, resulting in a decline in performance. This paper optimizes the aforementioned structure from four aspects: signal classification, reverberation removal, data mismatch removal in ASR, and ensemble algorithms. With the proposed sentence-level fusion (SLF) and word-level fusion (WLF) ensemble algorithms, a CER of 7.23% was reached in the mixture test set of the reverberated and clean Aishell1 compared to the single model, achieving a reduction in the CER by 20.72%.

Original languageEnglish
Title of host publication2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
EditorsKong Aik Lee, Hung-yi Lee, Yanfeng Lu, Minghui Dong
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages165-169
Number of pages5
ISBN (Electronic)9798350397963
DOIs
StatePublished - 2022
Event13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 - Singapore, Singapore
Duration: 11 12 202214 12 2022

Publication series

Name2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

Conference

Conference13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
Country/TerritorySingapore
CitySingapore
Period11/12/2214/12/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Keywords

  • automatic speech recognition
  • dereverberation
  • model ensemble
  • new structure
  • string confusion network

Fingerprint

Dive into the research topics of 'Improving ASR in Reverberant Environments'. Together they form a unique fingerprint.

Cite this