Forced Aligner

A forced aligner uses speech recognition software, transcriptions and pronunciation dictionaries in order to align and specify the location of every word and phoneme within a sound file.

Once a text has been force-aligned with an entire speech corpus, it’s possible to gain a better understanding of the speech corpus’ quality, as well as improve the training of the acoustic models.

Background

Language-specific speech technologies such as speech recognition and text-to-speech are dependent on speech corpora consisting of speakers’ spoken examples, transcribed and annotated with information on the phonetics and stress pattern for each word.

Alignment may be tackled by manually matching the text with the sound recordings. However, since some speech corpora are enormous, automatic methods are needed to align text with speech.

Information for developers

Here you will find details of a forced aligner to facilitate the creation of Welsh speech corpora.

techiaith/Prosodylab-Aligner

If you would like an easy way of using the forced aligner in the Docker environment, the following resource is also available:

techiaith/docker-ProsodylabAligner-cy

Welsh National Language Technologies Portal

Forced Aligner

Background

Information for developers

Links

Follow Us

Canolfan Bedwyr