Medical Devices and Materials Engineering SectionHuman Centric Information Processing
Frame-level stuttering detection method in reading support system for people who stutter
Stuttering is a speech disorder characterized by disfluent speech despite the absence of anatomical or functional abnormalities in the peripheral organs related to speech production. In Japan, the number of people who stutter is estimated to be approximately 1% of the population, corresponding to around 1.2 million individuals. This study aims to develop a reading-aloud support system that enables shadowing training for stuttering therapy even in the absence of speech-language pathologists. To realize such a system, a method for automatically detecting segments in which stuttering occurs from users’ speech is required. However, this problem has not been sufficiently studied in previous research. In this study, we investigate a frame-level stuttering detection method that divides speech into temporal frames of several tens of milliseconds and determines whether stuttering is present in each frame. So far, we have compared a threshold-based method using log-power features and a classification model based on recurrent neural networks, confirming that the latter achieves substantially higher detection accuracy than the former. In future work, we plan to investigate methods incorporating Transformers and features extracted from speech foundation models to capture long-term and complex temporal dependencies of stuttering and further improve detection accuracy.
Faculty
Medical Devices and Materials Engineering Section
Human Centric Information Processing
Senior Assistant Professor AIDA Toshiaki

