Developing a code-mixed sentiment analysis model for Xitsonga-English music review

dc.contributor.advisorModipa, T. I.
dc.contributor.authorNkuna, Blessing
dc.contributor.otherRamalepe, P. S.
dc.date.accessioned2026-03-06T12:23:43Z
dc.date.available2026-03-06T12:23:43Z
dc.date.issued2025
dc.descriptionThesis (M. Sc. (Computer Science)) -- University of Limpopo, 2025en_US
dc.description.abstractSentiment analysis is an essential natural language processing technique for monitoring online discussions about brands, products, and services. Traditionally focused on monolingual data, sentiment analysis has now expanded to include code-mixed texts, reflecting the growing use of multiple languages within single sentences on social media. This dissertation addresses the gap in sentiment analysis for code-mixed data by developing a Long Short-Term Memory (LSTM) classifier for Xitsonga-English comments extracted from YouTube music reviews. This research aims to design and implement a sentiment analysis model tailored for Xitsonga-English code-mixed texts, evaluating its performance against traditional monolingual sentiment analysis methods. This includes collecting a substantial dataset of Xitsonga-English comments, determining their polarity, developing an LSTM classifier, and assessing its accuracy, precision, recall, and F1-score. Data collection involved scraping 1 998 Xitsonga-English comments from a Xitsonga YouTube channel, cleaning and tokenizing the comments for analysis. Sentiments were defined and categorized into positive, negative, and neutral classes based on specific criteria, with dictionaries developed for both Xitsonga and English lexicons. These lexicons were used to label the comments, facilitating the creation of training data for the LSTM model. Additionally, a word embedding matrix was developed using Word2Vec, capturing semantic similarities between words. The LSTM classifier's architecture included embedding layers initialized with pre-trained word embeddings, two LSTM layers for sequence processing, and a dense output layer for sentiment classification. Despite efforts to address overfitting through regularization and model adjustments, the final LSTM model did not perform as expected on the validation and test datasets, highlighting challenges in generalizing sentiment classification for the collected dataset. To address this, a stacking classifier combining Random Forest, Support Vector Machine, Gradient Boosting, and Logistic Regression was developed and compared with the LSTM model. The stacking classifier showed better generalization on unseen data, indicating its robustness for sentiment analysis tasks in code-mixed contexts. The results highlight the challenges and potential solutions in developing robust sentiment analysis models for code-mixed languages, contributing valuable insights to the domain of natural language processing.en_US
dc.format.extentix, 85 leavesen_US
dc.identifier.urihttp://hdl.handle.net/10386/5363
dc.language.isoenen_US
dc.relation.requiresPDFen_US
dc.subjectSentiment analysisen_US
dc.subjectCode-mixeden_US
dc.subjectPolarityen_US
dc.subjectAnnotateden_US
dc.subjectLong short-term memory classifieren_US
dc.subjectStacking classifieren_US
dc.subject.lcshSentiment analysisen_US
dc.subject.lcshDeep learning (Machine learning)en_US
dc.subject.lcshCode switching (Linguistics)en_US
dc.titleDeveloping a code-mixed sentiment analysis model for Xitsonga-English music reviewen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
nkuna_b_2025.pdf
Size:
1.82 MB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: