Within the field of automatic speech recognition, the processing of dysarthric speech is a challenge because standard approaches are ineffective in presence of dysarthria. This paper presents preliminary evidence that the performance of speaker-dependent speech recognition systems trained for speakers with dysarthria may be substantially improved by tuning the size and shift of the spectral analysis window used to compute the initial short-time Fourier transform used in many speech front ends. Evidence for this comes from a set of experiments performed on a small collection of Italian speech (isolated words) from five different speakers suffering from different degrees of dysarthria. The experimental framework used in the paper constructs speaker-dependent GMM-HMM speech recognition models using the triphone Kaldi recipe and varying choices of the spectral analysis window size and shift. Results show a variable improvement (31% to 81%), according to the selected user with dysarthria.
Enabling Smart Home Voice Control for Italian People with Dysarthria: Preliminary Analysis of Frame Rate Effect on Speech Recognition
Marini M.;Mulfari D.;Vanello N.;Fanucci L.
2021-01-01
Abstract
Within the field of automatic speech recognition, the processing of dysarthric speech is a challenge because standard approaches are ineffective in presence of dysarthria. This paper presents preliminary evidence that the performance of speaker-dependent speech recognition systems trained for speakers with dysarthria may be substantially improved by tuning the size and shift of the spectral analysis window used to compute the initial short-time Fourier transform used in many speech front ends. Evidence for this comes from a set of experiments performed on a small collection of Italian speech (isolated words) from five different speakers suffering from different degrees of dysarthria. The experimental framework used in the paper constructs speaker-dependent GMM-HMM speech recognition models using the triphone Kaldi recipe and varying choices of the spectral analysis window size and shift. Results show a variable improvement (31% to 81%), according to the selected user with dysarthria.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.