dc.contributor.advisor |
Manamela, M. J. D. |
|
dc.contributor.author |
Moila, Mahlodi Mercy
|
|
dc.contributor.other |
Modipa, T. I. |
|
dc.date.accessioned |
2025-10-20T10:44:40Z |
|
dc.date.available |
2025-10-20T10:44:40Z |
|
dc.date.issued |
2025 |
|
dc.identifier.uri |
http://hdl.handle.net/10386/5131 |
|
dc.description |
Thesis (M.Sc. (Computer Science)) -- University of Limpopo, 2025 |
en_US |
dc.description.abstract |
The transformer-based machine learning technique is a deep learning model that
processes the sequential input data using an encoder-decoder process. Transformers process the input data simultaneously using a parallelism approach while paying attention to each word at the time by applying an attention mechanism to each unit text being processed. The transformer-based model has been known to provide more state-of-the-art performance in natural language processing (NLP) tasks than a recurrent neural network (RNN) such as Long Short-Term Memory (LSTM). RNNs have the drawback of suffering from the problem of vanishing gradients and exploding gradients in implementation. The GPT-Sepedi transformer-based model has shown great success in dealing with the process of text generation for the Sepedi language. This has led to a limited text generation system developed using a transformer-based model for the under resourced African language, namely, the Sepedi language. This research project aimed to develop a text generation model for the Sepedi language using transformer based machine learning techniques. The LSTM-Sepedi Attention-based model and the GPT-Sepedi Transformer-based model were developed and trained using a National Centre for Human Language Technology (NCHLT) Sepedi text corpus. The models were compared based on the results that they generated. A GPT-Sepedi Transformer-based model was used to generate the text. The generated text was then compared with the Sepedi language vocabulary to
determine the validity of the text. It was found that 61% of the text within the generated texts is found in the Sepedi language vocabulary. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score was used to compare the model generated text to human-written text. The ROUGE score result indicates that the GPT-Sepedi Transformer-based text generation model was able to generate words that humans can write with 83% precision. Even though the precision results indicated a better percentage, the text generated cannot be comprehensible with the recall percentage of 0.05% and 0.1% F1-score results. |
en_US |
dc.description.sponsorship |
NRF (National Research Foundation) |
en_US |
dc.format.extent |
xii, 84 leaves |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.requires |
PDF |
en_US |
dc.subject |
Machine learning |
en_US |
dc.subject |
Transformer |
en_US |
dc.subject |
Sepedi |
en_US |
dc.subject |
Text generation |
en_US |
dc.subject |
GPT |
en_US |
dc.subject.lcsh |
Deep learning (Machine learning) |
en_US |
dc.subject.lcsh |
Northern Sotho language |
en_US |
dc.subject.lcsh |
Machine learning |
en_US |
dc.subject.lcsh |
Natural language generation (Computer science) |
en_US |
dc.title |
The development of a text generation model for Sepedi language using transformer-based machine-learning techniques |
en_US |
dc.type |
Thesis |
en_US |