Development of an end-to-end automatic speech recognition system using connectionist temporal classification for the Tshivenda language

Mehlape, Jonas Mosweu

ULSpace Home
→
Faculty of Science and Agriculture
→
School of Mathematical & Computational Sciences
→
Theses and Dissertations (Computer Science)
→
View Item

dc.contributor.advisor	Modipa, T. I.
dc.contributor.author	Mehlape, Jonas Mosweu
dc.date.accessioned	2026-03-12T10:28:39Z
dc.date.available	2026-03-12T10:28:39Z
dc.date.issued	2025
dc.identifier.uri	http://hdl.handle.net/10386/5376
dc.description	Thesis (M. Sc. (Computer Science)) -- University of Limpopo, 2025	en_US
dc.description.abstract	This study centers on creating an automatic speech recognition (ASR) system for Tshivenda, one of South Africa's under-resourced languages. Utilizing the Connectionist Temporal Classification (CTC) framework and the NCHLT speech corpus. This study focuses on developing an E2E ASR system leveraging CTC techniques for the Tshivenda language. The primary objective is to develop and evaluate an ASR system for the Tshivenda language using the CTC approach. This involves designing and training an ASR model using the NCHLT speech corpus, optimizing model performance through hyperparameter tuning (e.g., learning rate, dropout rate), and evaluating the system’s accuracy through essential metrics such as WER and training loss. The research also focuses on identifying key challenges in recognizing Tshivenda speech and proposes improvements for future work in this area. However, there are several delimitations to the scope of the study that should be considered. First, the research relies on the NCHLT speech corpus, which, although valuable, has limited dialectal diversity and does not fully represent all regional variations of Tshivenda. Additionally, the model was primarily trained on clean speech data, and as such, it does not extensively address the challenges of handling noisy environments or spontaneous speech. Furthermore, while the study focuses on a CTC-based deep learning model, it does not explore the integration of external language models, such as transformer-based models, which could further enhance performance. Finally, due to hardware limitations, the model was trained for 30 epochs, which may have constrained the model's ability to reach its optimal performance, potentially impacting the accuracy of the final system. The model's performance was assessed over 30 epochs using essential metrics, including Word Error Rate (WER), training loss, and validation loss. The top-performing model achieved a final WER of 0.3934, highlighting notable advancements in Tshivenda speech recognition. This research highlights the promise of deep learning models in creating ASR systems for under-resourced languages, while also pointing out critical directions for future exploration. Key advancements include expanding the dataset, integrating language models, and improving the model’s resilience to noisy conditions and spontaneous speech. These steps are essential for enhancing accuracy and practical usability. The study contributes to the broader mission of promoting language preservation and accessibility through technological innovation	en_US
dc.format.extent	x, 64 leaves	en_US
dc.language.iso	en	en_US
dc.relation.requires	PDF	en_US
dc.subject	Automatic speech recognition	en_US
dc.subject	Tshivenda	en_US
dc.subject	Under-resourced languages	en_US
dc.subject	Connectionist temporal classification	en_US
dc.subject	Convolutional neural networks	en_US
dc.subject	Recurrent neural networks	en_US
dc.subject.lcsh	Automatic speech recognition	en_US
dc.subject.lcsh	Translating and interpreting -- Technological innovations	en_US
dc.subject.lcsh	Neural networks (Computer science)	en_US
dc.subject.lcsh	Tshivenda language	en_US
dc.title	Development of an end-to-end automatic speech recognition system using connectionist temporal classification for the Tshivenda language	en_US
dc.type	Thesis	en_US