Grapheme-based continuous speech recognition for some of the under- resourced languages of Limpopo Province

dc.contributor.advisorManamela, M.J.D
dc.contributor.authorManaileng, Mabu Johannes
dc.contributor.otherVelempini, M.
dc.date.accessioned2017-01-25T07:30:49Z
dc.date.available2017-01-25T07:30:49Z
dc.date.issued2015
dc.descriptionThesis (M.Sc. (Computer Science)) -- University of Limpopo, 2015en_US
dc.description.abstractThis study investigates the potential of using graphemes, instead of phonemes, as acoustic sub-word units for monolingual and cross-lingual speech recognition for some of the under-resourced languages of the Limpopo Province, namely, IsiNdebele, Sepedi and Tshivenda. The performance of a grapheme-based recognition system is compared to that of phoneme-based recognition system. For each selected under-resourced language, automatic speech recognition (ASR) system based on the use of hidden Markov models (HMMs) was developed using both graphemes and phonemes as acoustic sub-word units. The ASR framework used models emission distributions by 16 Gaussian Mixture Models (GMMs) with 2 mixture increments. A third-order n-gram language model was used in all experiments. Identical speech datasets were used for each experiment per language. The LWAZI speech corpora and the National Centre for Human Language Technologies (NCHLT) speech corpora were used for training and testing the tied-state context-dependent acoustic models. The performance of all systems was evaluated at the word-level recognition using word error rate (WER). The results of our study show that grapheme-based continuous speech recognition, which copes with the problem of low-quality or unavailable pronunciation dictionaries, is comparable to phoneme-based recognition for the selected under-resourced languages in both the monolingual and cross-lingual speech recognition tasks. The study significantly demonstrates that context-dependent grapheme-based sub-word units can be reliable for small and medium-large vocabulary speech recognition tasks for these languages.en_US
dc.description.sponsorshipTelkom SAen_US
dc.format.extentxv, 105 leavesen_US
dc.identifier.urihttp://hdl.handle.net/10386/1615
dc.language.isoenen_US
dc.publisherUniversity of Limpopoen_US
dc.relation.requiresPDFen_US
dc.subjectGrapheme-baseden_US
dc.subjectSpeech recognitionen_US
dc.subjectUnder- resourced languagesen_US
dc.subject.lcshAutomatic speech recognition.en_US
dc.subject.lcshSpeech perception -- South Africa -- Limpopoen_US
dc.titleGrapheme-based continuous speech recognition for some of the under- resourced languages of Limpopo Provinceen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Manaileng_mj_2016.pdf
Size:
1.58 MB
Format:
Adobe Portable Document Format
Description:
thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: