The automatic recognition of emotions in speech

Manamela, Phuti, John

ULSpace Home
→
Faculty of Science and Agriculture
→
School of Mathematical & Computational Sciences
→
Theses and Dissertations (Computer Science)
→
View Item

dc.contributor.advisor	Manamela, M. J. D.
dc.contributor.author	Manamela, Phuti, John
dc.contributor.other	Modipa, T. I.
dc.date.accessioned	2021-06-18T07:39:12Z
dc.date.available	2021-06-18T07:39:12Z
dc.date.issued	2020
dc.identifier.uri	http://hdl.handle.net/10386/3347
dc.description	Thesis(M.Sc.(Computer Science)) -- University of Limpopo, 2020	en_US
dc.description.abstract	Speech emotion recognition (SER) refers to a technology that enables machines to detect and recognise human emotions from spoken phrases. In the literature, numerous attempts have been made to develop systems that can recognise human emotions from their voice, however, not much work has been done in the context of South African indigenous languages. The aim of this study was to develop an SER system that can classify and recognise six basic human emotions (i.e., sadness, fear, anger, disgust, happiness, and neutral) from speech spoken in Sepedi language (one of South Africa’s official languages). One of the major challenges encountered, in this study, was the lack of a proper corpus of emotional speech. Therefore, three different Sepedi emotional speech corpora consisting of acted speech data have been developed. These include a RecordedSepedi corpus collected from recruited native speakers (9 participants), a TV broadcast corpus collected from professional Sepedi actors, and an Extended-Sepedi corpus which is a combination of Recorded-Sepedi and TV broadcast emotional speech corpora. Features were extracted from the speech corpora and a data file was constructed. This file was used to train four machine learning (ML) algorithms (i.e., SVM, KNN, MLP and Auto-WEKA) based on 10 folds validation method. Three experiments were then performed on the developed speech corpora and the performance of the algorithms was compared. The best results were achieved when Auto-WEKA was applied in all the experiments. We may have expected good results for the TV broadcast speech corpus since it was collected from professional actors, however, the results showed differently. From the findings of this study, one can conclude that there are no precise or exact techniques for the development of SER systems, it is a matter of experimenting and finding the best technique for the study at hand. The study has also highlighted the scarcity of SER resources for South African indigenous languages. The quality of the dataset plays a vital role in the performance of SER systems.	en_US
dc.description.sponsorship	National research foundation (NRF) and Telkom Center of Excellence (CoE)	en_US
dc.format.extent	xii, 76 leaves	en_US
dc.language.iso	en	en_US
dc.relation.requires	PDF	en_US
dc.subject	Speech emotion recognition	en_US
dc.subject	Machine learning	en_US
dc.subject	Feature extraction	en_US
dc.subject	Classification	en_US
dc.subject	Emotional speech database	en_US
dc.subject.lcsh	Automatic speech recognition	en_US
dc.subject.lcsh	Machine learning	en_US
dc.title	The automatic recognition of emotions in speech	en_US
dc.type	Thesis	en_US