Development of a Sepedi-English code-switching automatic speech recognition system using connectionist temporal classification

Phaladi, Amanda

ULSpace Home
→
Faculty of Science and Agriculture
→
School of Mathematical & Computational Sciences
→
Theses and Dissertations (Computer Science)
→
View Item

Development of a Sepedi-English code-switching automatic speech recognition system using connectionist temporal classification

Phaladi, Amanda

URI: http://hdl.handle.net/10386/5380

Date: 2025

Abstract:

Speech technology includes several approaches and technologies that allow ma- chines to engage with spoken language, which include spoken dialog systems and automatic speech recognition. The end-to-end (E2E) techniques, such as Connec- tionist Temporal Classification (CTC) and attention-based methods, dominate Auto- matic Spdeech Recognition (ASR) system development. However, these methodolo- gies have primarily advanced in research for high-resourced languages with exten- sive speech datasets, leaving low-resource languages relatively underserved. The efficacy of the CTC method specifically for Sepedi, a low-resource language, remains uncertain. This study addresses this gap by developing and evaluating an automatic speech recognition (ASR) system for Sepedi-English code-switched speech. Utilizing the Se- pedi Prompted Code Switching (SPCS) corpus and applying the CTC approach, we implemented an E2E ASR system. We rigorously evaluated the system’s performance across various parameters using both the National Centre for Human Language Tech- nology (NCHLT) Sepedi test corpus and the Sepedi Prompted Code Switching corpus. Our findings demonstrate promising results overall. However, the system faced challenges in accurately recognizing speech from the Sepedi NCHLT test corpus. This study shows the importance of adapting advanced ASR techniques to suit the linguistic characteristics and data limitations of low-resource languages. Addressing these challenges is crucial for expanding the applicability of speech technology to diverse linguistic contexts, ultimately facilitating broader accessibility and usability of ASR systems worldwide.