Show simple item record

dc.contributor.advisor Manamela, M. J. D.
dc.contributor.author Moila, Mahlodi Mercy
dc.contributor.other Modipa, T. I.
dc.date.accessioned 2025-10-20T10:44:40Z
dc.date.available 2025-10-20T10:44:40Z
dc.date.issued 2025
dc.identifier.uri http://hdl.handle.net/10386/5131
dc.description Thesis (M.Sc. (Computer Science)) -- University of Limpopo, 2025 en_US
dc.description.abstract The transformer-based machine learning technique is a deep learning model that processes the sequential input data using an encoder-decoder process. Transformers process the input data simultaneously using a parallelism approach while paying attention to each word at the time by applying an attention mechanism to each unit text being processed. The transformer-based model has been known to provide more state-of-the-art performance in natural language processing (NLP) tasks than a recurrent neural network (RNN) such as Long Short-Term Memory (LSTM). RNNs have the drawback of suffering from the problem of vanishing gradients and exploding gradients in implementation. The GPT-Sepedi transformer-based model has shown great success in dealing with the process of text generation for the Sepedi language. This has led to a limited text generation system developed using a transformer-based model for the under resourced African language, namely, the Sepedi language. This research project aimed to develop a text generation model for the Sepedi language using transformer based machine learning techniques. The LSTM-Sepedi Attention-based model and the GPT-Sepedi Transformer-based model were developed and trained using a National Centre for Human Language Technology (NCHLT) Sepedi text corpus. The models were compared based on the results that they generated. A GPT-Sepedi Transformer-based model was used to generate the text. The generated text was then compared with the Sepedi language vocabulary to determine the validity of the text. It was found that 61% of the text within the generated texts is found in the Sepedi language vocabulary. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score was used to compare the model generated text to human-written text. The ROUGE score result indicates that the GPT-Sepedi Transformer-based text generation model was able to generate words that humans can write with 83% precision. Even though the precision results indicated a better percentage, the text generated cannot be comprehensible with the recall percentage of 0.05% and 0.1% F1-score results. en_US
dc.description.sponsorship NRF (National Research Foundation) en_US
dc.format.extent xii, 84 leaves en_US
dc.language.iso en en_US
dc.relation.requires PDF en_US
dc.subject Machine learning en_US
dc.subject Transformer en_US
dc.subject Sepedi en_US
dc.subject Text generation en_US
dc.subject GPT en_US
dc.subject.lcsh Deep learning (Machine learning) en_US
dc.subject.lcsh Northern Sotho language en_US
dc.subject.lcsh Machine learning en_US
dc.subject.lcsh Natural language generation (Computer science) en_US
dc.title The development of a text generation model for Sepedi language using transformer-based machine-learning techniques en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ULSpace


Browse

My Account