The development of accented English synthetic voices

Malatji, Promise Tshepiso

ULSpace Home
→
Faculty of Science and Agriculture
→
School of Mathematical & Computational Sciences
→
Theses and Dissertations (Computer Science)
→
View Item

dc.contributor.advisor	Manamela, M. J. D.
dc.contributor.advisor	Modipa, T. I.
dc.contributor.author	Malatji, Promise Tshepiso
dc.date.accessioned	2019-11-27T05:35:41Z
dc.date.available	2019-11-27T05:35:41Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/10386/2917
dc.description	Thesis (M. Sc. (Computer Science)) --University of Limpopo, 2019	en_US
dc.description.abstract	A Text-to-speech (TTS) synthesis system is a software system that receives text as input and produces speech as output. A TTS synthesis system can be used for, amongst others, language learning, and reading out text for people living with different disabilities, i.e., physically challenged, visually impaired, etc., by native and non-native speakers of the target language. Most people relate easily to a second language spoken by a non-native speaker they share a native language with. Most online English TTS synthesis systems are usually developed using native speakers of English. This research study focuses on developing accented English synthetic voices as spoken by non-native speakers in the Limpopo province of South Africa. The Modular Architecture for Research on speech sYnthesis (MARY) TTS engine is used in developing the synthetic voices. The Hidden Markov Model (HMM) method was used to train the synthetic voices. Secondary training text corpus is used to develop the training speech corpus by recording six speakers reading the text corpus. The quality of developed synthetic voices is measured in terms of their intelligibility, similarity and naturalness using a listening test. The results in the research study are classified based on evaluators’ occupation and gender and the overall results. The subjective listening test indicates that the developed synthetic voices have a high level of acceptance in terms of similarity and intelligibility. A speech analysis software is used to compare the recorded synthesised speech and the human recordings. There is no significant difference in the voice pitch of the speakers and the synthetic voices except for one synthetic voice.	en_US
dc.format.extent	xv, 107 leaves	en_US
dc.language.iso	en	en_US
dc.publisher	University of Limpopo	en_US
dc.relation.requires	PDF	en_US
dc.subject	Text-to-speech synthesis system	en_US
dc.subject	Language learning	en_US
dc.subject.lcsh	Text data mining	en_US
dc.subject.lcsh	Data compression (computer science)	en_US
dc.subject.lcsh	Informal language learning	en_US
dc.title	The development of accented English synthetic voices	en_US
dc.type	Thesis	en_US