TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation

Krishna Basak, Nilamadhab Mishra, Hsien Tsung Chang*

*Corresponding author for this work

Research output: Contribution to journalJournal Article peer-review

3 Scopus citations

Abstract

Stuttering, a prevalent neurodevelopmental disorder, profoundly affects fluent speech, causing involuntary interruptions and recurrent sound patterns. This study addresses the critical need for the accurate classification of stuttering types. The researchers introduce “TranStutter”, a pioneering Convolution-free Transformer-based DL model, designed to excel in speech disfluency classification. Unlike conventional methods, TranStutter leverages Multi-Head Self-Attention and Positional Encoding to capture intricate temporal patterns, yielding superior accuracy. In this study, the researchers employed two benchmark datasets: the Stuttering Events in Podcasts Dataset (SEP-28k) and the FluencyBank Interview Subset. SEP-28k comprises 28,177 audio clips from podcasts, meticulously annotated into distinct dysfluent and non-dysfluent labels, including Block (BL), Prolongation (PR), Sound Repetition (SR), Word Repetition (WR), and Interjection (IJ). The FluencyBank subset encompasses 4144 audio clips from 32 People Who Stutter (PWS), providing a diverse set of speech samples. TranStutter’s performance was assessed rigorously. On SEP-28k, the model achieved an impressive accuracy of 88.1%. Furthermore, on the FluencyBank dataset, TranStutter demonstrated its efficacy with an accuracy of 80.6%. These results highlight TranStutter’s significant potential in revolutionizing the diagnosis and treatment of stuttering, thereby contributing to the evolving landscape of speech pathology and neurodevelopmental research. The innovative integration of Multi-Head Self-Attention and Positional Encoding distinguishes TranStutter, enabling it to discern nuanced disfluencies with unparalleled precision. This novel approach represents a substantial leap forward in the field of speech pathology, promising more accurate diagnostics and targeted interventions for individuals with stuttering disorders.

Original languageEnglish
Article number8033
JournalSensors
Volume23
Issue number19
DOIs
StatePublished - 10 2023

Bibliographical note

Publisher Copyright:
© 2023 by the authors.

Keywords

  • Mel-Spectrogram
  • multi-head self-attention
  • speech disfluency
  • stuttered speech
  • transformer

Fingerprint

Dive into the research topics of 'TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation'. Together they form a unique fingerprint.

Cite this