Forced Aligner

Prosodylab aligner

A forced aligner uses speech recognition software, transcriptions and pronunciation dictionaries in order to align and specify the location of every word and phoneme within a sound file.

Once a text has been force-aligned with an entire speech corpus, it’s possible to gain a better understanding of the speech corpus’ quality, as well as improve the training of the acoustic models.

Prosodylab aligner

Background

Language-specific speech technologies such as speech recognition and text-to-speech are dependent on speech corpora consisting of speakers’ spoken examples, transcribed and annotated with information on the phonetics and stress pattern for each word.

Alignment may be tackled by manually matching the text with the sound recordings. However, since some speech corpora are enormous, automatic methods are needed to align text with speech.

 

Information for developers

Here you will find details of a forced aligner to facilitate the creation of Welsh speech corpora.
 
techiaith/Prosodylab-Aligner
Rhyngwyneb Python ar gyfer aliniad sain gorfodol gan ddefnyddio HTK a SoX ar gyfer y Gymraeg // Python interface for forced audio alignment using HTK and SoX for Welsh

 

If you would like an easy way of using the forced aligner in the Docker environment, the following resource is also available:
 
techiaith/docker-ProsodylabAligner-cy
Defnyddio Prosodylab-Aligner Cymraeg yn hwylus gyda Docker. // Easy to use Prosodylab Aligner with Welsh support via Docker