Development of a text-independent automatic speaker recognition system

Mokgonyane, Tumisho Billson

Development of a text-independent automatic speaker recognition system

dc.contributor.advisor	Manamela, M. J. D.
dc.contributor.author	Mokgonyane, Tumisho Billson
dc.contributor.other	Modipa, T. I.
dc.date.accessioned	2022-06-15T07:45:32Z
dc.date.available	2022-06-15T07:45:32Z
dc.date.issued	2021
dc.description	Thesis (M. Sc. (Computer Science)) -- University of Limpopo, 2021	en_US
dc.description.abstract	The task of automatic speaker recognition, wherein a system verifies or identifies speakers from a recording of their voices, has been researched for several decades. However, research in this area has been carried out largely on freely accessible speaker datasets built on languages that are well-resourced like English. This study undertakes automatic speaker recognition research focused on a low-resourced language, Sepedi. As one of the 11 official languages in South Africa, Sepedi is spoken by at least 2.8 million people. Pre-recorded voices were acquired from a speech and language national repository, namely, the National Centre for Human Language Technology (NCHLT), were we selected the Sepedi NCHLT Speech Corpus. The open-source pyAudioAnalysis python library was used to extract three types of acoustic features of speech namely, time, frequency and cepstral domain features, from the acquired speech data. The effects and compatibility of these acoustic features was investigated. It was observed that combining the three acoustic features of speech had a more significant effect than using individual features as far as speaker recognition accuracy is concerned. The study also investigated the performance of machine learning algorithms on low-resourced languages such as Sepedi. Five machine learning (ML) algorithms implemented on Scikit-learn namely, K-nearest neighbours (KNN), support vector machines (SVM), random forest (RF), logistic regression (LR), and multi-layer perceptrons (MLP) were used to train different classifier models. The GridSearchCV algorithm, also implemented on Scikit-learn, was used to deduce ideal hyper-parameters for each of the five ML algorithms. The classifier models were evaluated on recognition accuracy and the results show that the MLP classifier, with a recognition accuracy of 98%, outperforms KNN, RF, LR and SVM classifiers. A graphical user interface (GUI) is developed and the best performing classifier model, MLP, is deployed on the developed GUI intended to be used for real time speaker identification and verification tasks. Participants were recruited to the GUI performance and acceptable results were obtained	en_US
dc.format.extent	xiii, 60 leaves	en_US
dc.identifier.uri	http://hdl.handle.net/10386/3829
dc.language.iso	en	en_US
dc.relation.requires	PDF	en_US
dc.subject	Automatic speaker recognition	en_US
dc.subject	Recording of voices	en_US
dc.subject	Graphical user interface	en_US
dc.subject.lcsh	Automatic speech recognition	en_US
dc.subject.lcsh	Speech processing systems	en_US
dc.subject.lcsh	Icons (Computer graphics)	en_US
dc.title	Development of a text-independent automatic speaker recognition system	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: mokgonyane_tb_2021.pdf
Size:: 1.24 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and Dissertations (Computer Science)
Theses and Dissertations