Abstract
Automatic Speech Recognition (ASR) significantly reduces the effort to create audio transcripts. Despite its convenience, the performance of ASR is unstable in disturbing environments; for instance, indoor signals are usually corrupted by reverberation (reverb), resulting in diminished performance in ASR. A type of solution is to construct an acoustic dereverberation (dereverb) model to pre-process the original signals before submitting them to ASR. However, the acoustic properties of the output signal of the dereverb model differ from that of the training dataset for ASR, resulting in a decline in performance. This paper optimizes the aforementioned structure from four aspects: signal classification, reverberation removal, data mismatch removal in ASR, and ensemble algorithms. With the proposed sentence-level fusion (SLF) and word-level fusion (WLF) ensemble algorithms, a CER of 7.23% was reached in the mixture test set of the reverberated and clean Aishell1 compared to the single model, achieving a reduction in the CER by 20.72%.
| Original language | English |
|---|---|
| Title of host publication | 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 |
| Editors | Kong Aik Lee, Hung-yi Lee, Yanfeng Lu, Minghui Dong |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 165-169 |
| Number of pages | 5 |
| ISBN (Electronic) | 9798350397963 |
| DOIs | |
| State | Published - 2022 |
| Event | 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 - Singapore, Singapore Duration: 11 12 2022 → 14 12 2022 |
Publication series
| Name | 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 |
|---|
Conference
| Conference | 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 |
|---|---|
| Country/Territory | Singapore |
| City | Singapore |
| Period | 11/12/22 → 14/12/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
Keywords
- automatic speech recognition
- dereverberation
- model ensemble
- new structure
- string confusion network
Fingerprint
Dive into the research topics of 'Improving ASR in Reverberant Environments'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver