Speech Recognition

Speech recognition technology allows a computer system to recognize words spoken by a person in order to convert the sound into text. This does not mean that the speech recognition system will necessarily be able to identify the meaning of every word.

The following speech recognition resources are now available through the Language Technologies Portal:

Training Data

Language data is required for the purposes of training speech recognition engines for a new language.

Welsh Pronunciation Lexicon

A pronunciation lexicon consists of mappings from words and their pronunciation descriptions suitable for use by speech recognition and text to speech engines.

Llwytho'r adnodd 'Welsh Pronunciation Lexicon' i lawr o metashare.techiaith.cymru

Download 'Welsh Pronunciation Lexicon' resource from metashare.techiaith.cymru

The lexicon’s pronunciation description were produced with an implementation of Welsh letter-to-sound rules coded in Python. The code can be accessed on GitHub:

techiaith/welsh-lts

The lexicon, which is based on all words that are recognized by the Cysill spellchecker, contains over half a million entries, including mutated, conjugated and other inflected forms.

Speech Corpus

A speech corpus is a large collection of speech audio files with text transcriptions. Such a speech corpus are used to create acoustic models.  The corpus should contain audio of all required phonemes by the largest possible number of speakers.

Such a speech corpus has been crowd sourced through a bespoke app called Paldaruo.  Further information can be found here:

Paldaruo Speech Corpus

 

Text Corpus

A text corpus is necessary for creating language models, or a model of the likelihood that a word follows a given previous word within a sentence. They model the sentences that the speech recognition engine is expected to be able to produce.

 

Speech Recognition Kits

Fortunately there are a number of open source speech recognition kits available from the academic community to facilitate anyone to learn, research, develop and market new speech recognition engines.

These kits simplify the process of producing, testing and then apply acoustic models within various decoders.

These kits are not user friendly with polished GUI front-ends etc, but rather collections of source code files (usually in C and C++) with scripts for compiling and executing within the wider tasks that involved in training, testing and decoding.

The Language Technologies Portal provides its resources for using these kits for Welsh language speech recognition as easy to install and use Docker-based environments.

Kaldi Cymraeg

Kaldi-ASR (http://kaldi-asr.org) has grown in popularity in recent years in academia and industry due to a more permissive open source license and opportunities for neural network based acoustic models.

This resource allows you train your own Kaldi Welsh language speech recognition engine with the resources mentioned above.

techiaith/kaldi-cy

 

HTK Cymraeg

The HTK (Hidden Markov Model Speech Recognition Toolkit) from Cambridge University has been a foundation for speech recognition research since the 90s. It has been successfully applied to implement Welsh language speech recognition with the following resources:

techiaith/seilwaith

 

Julius Cymraeg

Julius is a large vocabulary continuous speech recognition decoder.  Julius is used to put the HTK acoustic models to use

techiaith/julius-cy

 


Other Resources

 

Further Speech Recognition Development

Gwaith Adnabod Lleferydd Uwch (GALLU)

GALLU project outputs

The latest on Speech Recognition from the blog

http://techiaith.cymru/category/speechrecognition/