Development of an end-to-end automatic speech recognition system using connectionist temporal classification for the Tshivenda language

dc.contributor.advisorModipa, T. I.
dc.contributor.authorMehlape, Jonas Mosweu
dc.date.accessioned2026-03-12T10:28:39Z
dc.date.available2026-03-12T10:28:39Z
dc.date.issued2025
dc.descriptionThesis (M. Sc. (Computer Science)) -- University of Limpopo, 2025en_US
dc.description.abstractThis study centers on creating an automatic speech recognition (ASR) system for Tshivenda, one of South Africa's under-resourced languages. Utilizing the Connectionist Temporal Classification (CTC) framework and the NCHLT speech corpus. This study focuses on developing an E2E ASR system leveraging CTC techniques for the Tshivenda language. The primary objective is to develop and evaluate an ASR system for the Tshivenda language using the CTC approach. This involves designing and training an ASR model using the NCHLT speech corpus, optimizing model performance through hyperparameter tuning (e.g., learning rate, dropout rate), and evaluating the system’s accuracy through essential metrics such as WER and training loss. The research also focuses on identifying key challenges in recognizing Tshivenda speech and proposes improvements for future work in this area. However, there are several delimitations to the scope of the study that should be considered. First, the research relies on the NCHLT speech corpus, which, although valuable, has limited dialectal diversity and does not fully represent all regional variations of Tshivenda. Additionally, the model was primarily trained on clean speech data, and as such, it does not extensively address the challenges of handling noisy environments or spontaneous speech. Furthermore, while the study focuses on a CTC-based deep learning model, it does not explore the integration of external language models, such as transformer-based models, which could further enhance performance. Finally, due to hardware limitations, the model was trained for 30 epochs, which may have constrained the model's ability to reach its optimal performance, potentially impacting the accuracy of the final system. The model's performance was assessed over 30 epochs using essential metrics, including Word Error Rate (WER), training loss, and validation loss. The top-performing model achieved a final WER of 0.3934, highlighting notable advancements in Tshivenda speech recognition. This research highlights the promise of deep learning models in creating ASR systems for under-resourced languages, while also pointing out critical directions for future exploration. Key advancements include expanding the dataset, integrating language models, and improving the model’s resilience to noisy conditions and spontaneous speech. These steps are essential for enhancing accuracy and practical usability. The study contributes to the broader mission of promoting language preservation and accessibility through technological innovationen_US
dc.format.extentx, 64 leavesen_US
dc.identifier.urihttp://hdl.handle.net/10386/5376
dc.language.isoenen_US
dc.relation.requiresPDFen_US
dc.subjectAutomatic speech recognitionen_US
dc.subjectTshivendaen_US
dc.subjectUnder-resourced languagesen_US
dc.subjectConnectionist temporal classificationen_US
dc.subjectConvolutional neural networksen_US
dc.subjectRecurrent neural networksen_US
dc.subject.lcshAutomatic speech recognitionen_US
dc.subject.lcshTranslating and interpreting -- Technological innovationsen_US
dc.subject.lcshNeural networks (Computer science)en_US
dc.subject.lcshTshivenda languageen_US
dc.titleDevelopment of an end-to-end automatic speech recognition system using connectionist temporal classification for the Tshivenda languageen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
mehlape_jm_2025.pdf
Size:
1.47 MB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: