Show simple item record

dc.contributor.advisor Modipa, T. I.
dc.contributor.author Nkuna, Blessing
dc.contributor.other Ramalepe, P. S.
dc.date.accessioned 2026-03-06T12:23:43Z
dc.date.available 2026-03-06T12:23:43Z
dc.date.issued 2025
dc.identifier.uri http://hdl.handle.net/10386/5363
dc.description Thesis (M. Sc. (Computer Science)) -- University of Limpopo, 2025 en_US
dc.description.abstract Sentiment analysis is an essential natural language processing technique for monitoring online discussions about brands, products, and services. Traditionally focused on monolingual data, sentiment analysis has now expanded to include code-mixed texts, reflecting the growing use of multiple languages within single sentences on social media. This dissertation addresses the gap in sentiment analysis for code-mixed data by developing a Long Short-Term Memory (LSTM) classifier for Xitsonga-English comments extracted from YouTube music reviews. This research aims to design and implement a sentiment analysis model tailored for Xitsonga-English code-mixed texts, evaluating its performance against traditional monolingual sentiment analysis methods. This includes collecting a substantial dataset of Xitsonga-English comments, determining their polarity, developing an LSTM classifier, and assessing its accuracy, precision, recall, and F1-score. Data collection involved scraping 1 998 Xitsonga-English comments from a Xitsonga YouTube channel, cleaning and tokenizing the comments for analysis. Sentiments were defined and categorized into positive, negative, and neutral classes based on specific criteria, with dictionaries developed for both Xitsonga and English lexicons. These lexicons were used to label the comments, facilitating the creation of training data for the LSTM model. Additionally, a word embedding matrix was developed using Word2Vec, capturing semantic similarities between words. The LSTM classifier's architecture included embedding layers initialized with pre-trained word embeddings, two LSTM layers for sequence processing, and a dense output layer for sentiment classification. Despite efforts to address overfitting through regularization and model adjustments, the final LSTM model did not perform as expected on the validation and test datasets, highlighting challenges in generalizing sentiment classification for the collected dataset. To address this, a stacking classifier combining Random Forest, Support Vector Machine, Gradient Boosting, and Logistic Regression was developed and compared with the LSTM model. The stacking classifier showed better generalization on unseen data, indicating its robustness for sentiment analysis tasks in code-mixed contexts. The results highlight the challenges and potential solutions in developing robust sentiment analysis models for code-mixed languages, contributing valuable insights to the domain of natural language processing. en_US
dc.format.extent ix, 85 leaves en_US
dc.language.iso en en_US
dc.relation.requires PDF en_US
dc.subject Sentiment analysis en_US
dc.subject Code-mixed en_US
dc.subject Polarity en_US
dc.subject Annotated en_US
dc.subject Long short-term memory classifier en_US
dc.subject Stacking classifier en_US
dc.subject.lcsh Sentiment analysis en_US
dc.subject.lcsh Deep learning (Machine learning) en_US
dc.subject.lcsh Code switching (Linguistics) en_US
dc.title Developing a code-mixed sentiment analysis model for Xitsonga-English music review en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ULSpace


Browse

My Account