Show simple item record

dc.contributor.advisor Modipa, T. I.
dc.contributor.author Mehlape, Jonas Mosweu
dc.date.accessioned 2026-03-12T10:28:39Z
dc.date.available 2026-03-12T10:28:39Z
dc.date.issued 2025
dc.identifier.uri http://hdl.handle.net/10386/5376
dc.description Thesis (M. Sc. (Computer Science)) -- University of Limpopo, 2025 en_US
dc.description.abstract This study centers on creating an automatic speech recognition (ASR) system for Tshivenda, one of South Africa's under-resourced languages. Utilizing the Connectionist Temporal Classification (CTC) framework and the NCHLT speech corpus. This study focuses on developing an E2E ASR system leveraging CTC techniques for the Tshivenda language. The primary objective is to develop and evaluate an ASR system for the Tshivenda language using the CTC approach. This involves designing and training an ASR model using the NCHLT speech corpus, optimizing model performance through hyperparameter tuning (e.g., learning rate, dropout rate), and evaluating the system’s accuracy through essential metrics such as WER and training loss. The research also focuses on identifying key challenges in recognizing Tshivenda speech and proposes improvements for future work in this area. However, there are several delimitations to the scope of the study that should be considered. First, the research relies on the NCHLT speech corpus, which, although valuable, has limited dialectal diversity and does not fully represent all regional variations of Tshivenda. Additionally, the model was primarily trained on clean speech data, and as such, it does not extensively address the challenges of handling noisy environments or spontaneous speech. Furthermore, while the study focuses on a CTC-based deep learning model, it does not explore the integration of external language models, such as transformer-based models, which could further enhance performance. Finally, due to hardware limitations, the model was trained for 30 epochs, which may have constrained the model's ability to reach its optimal performance, potentially impacting the accuracy of the final system. The model's performance was assessed over 30 epochs using essential metrics, including Word Error Rate (WER), training loss, and validation loss. The top-performing model achieved a final WER of 0.3934, highlighting notable advancements in Tshivenda speech recognition. This research highlights the promise of deep learning models in creating ASR systems for under-resourced languages, while also pointing out critical directions for future exploration. Key advancements include expanding the dataset, integrating language models, and improving the model’s resilience to noisy conditions and spontaneous speech. These steps are essential for enhancing accuracy and practical usability. The study contributes to the broader mission of promoting language preservation and accessibility through technological innovation en_US
dc.format.extent x, 64 leaves en_US
dc.language.iso en en_US
dc.relation.requires PDF en_US
dc.subject Automatic speech recognition en_US
dc.subject Tshivenda en_US
dc.subject Under-resourced languages en_US
dc.subject Connectionist temporal classification en_US
dc.subject Convolutional neural networks en_US
dc.subject Recurrent neural networks en_US
dc.subject.lcsh Automatic speech recognition en_US
dc.subject.lcsh Translating and interpreting -- Technological innovations en_US
dc.subject.lcsh Neural networks (Computer science) en_US
dc.subject.lcsh Tshivenda language en_US
dc.title Development of an end-to-end automatic speech recognition system using connectionist temporal classification for the Tshivenda language en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ULSpace


Browse

My Account