Abstract:
The conversion of speech to text is essential for communication between speech
and visually impaired people. The focus of this study was to develop and evaluate an
ASR baseline system designed for normal speech to correct speech disorders.
Normal and disordered speech data were sourced from Lwazi project and UCLASS,
respectively. The normal speech data was used to train the ASR system. Disordered
speech was used to evaluate performance of the system. Features were extracted
using the Mel-frequency cepstral coefficients (MFCCs) method in the processing
stage. The cepstral mean combined variance normalization (CMVN) was applied to
normalise the features. A third-order language model was trained using the SRI
Language Modelling (SRILM) toolkit. A recognition accuracy of 65.58% was
obtained. The refinement approach is then applied in the recognised utterance to
remove the repetitions from stuttered speech. The approach showed that 86% of
repeated words in stutter can be removed to yield an improved hypothesized text
output. Further refinement of the post-processing module ASR is likely to achieve a
near 100% correction of stuttering speech
Keywords: Automatic speech recognition (ASR), speech disorder, stuttering