Compact CNNs for End-to-End Keyword Spotting on Resource-Constrained Edge AI Devices

Joseph Lin, Ren Yuan Lyu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we explore compact convolutional neural networks (CNNs) for end-to-end keyword spotting from raw audio to final recognition results, without using traditional feature extraction based on spectrogram. Such fully CNN models reach 90.5% accuracy, an improvement of 12.15% over traditional methods with similar structures, which only achieve 78.35% accuracy, on the Speech Commands dataset. This shows that learned CNN features outperform predefined FFT-based transforms. The results show that compact end-toend CNNs enable efficient, accurate small vocabulary keyword spotting that is well-suited for resource-constrained edge devices.

Original languageEnglish
Title of host publicationROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing
EditorsJheng-Long Wu, Ming-Hsiang Su, Hen-Hsen Huang, Yu Tsao, Hou-Chiang Tseng, Chia-Hui Chang, Lung-Hao Lee, Yuan-Fu Liao, Wei-Yun Ma
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages222-226
Number of pages5
ISBN (Electronic)9789869576963
StatePublished - 2023
Event35th Conference on Computational Linguistics and Speech Processing, ROCLING 2023 - Taipei City, Taiwan
Duration: 20 10 202321 10 2023

Publication series

NameROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing

Conference

Conference35th Conference on Computational Linguistics and Speech Processing, ROCLING 2023
Country/TerritoryTaiwan
CityTaipei City
Period20/10/2321/10/23

Bibliographical note

Publisher Copyright:
© 2023 ROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing. All rights reserved.

Keywords

  • End-to-end models
  • keyword spotting
  • raw audio processing

Fingerprint

Dive into the research topics of 'Compact CNNs for End-to-End Keyword Spotting on Resource-Constrained Edge AI Devices'. Together they form a unique fingerprint.

Cite this