Corpus of CC0 Sentences to be Used as Welsh Speech Recognition Prompts

This is a collection of 14,857 sentences released under a CC0 licence. They were collected by members of the Language Technologies Unit, Bangor University, expressly to serve as prompts for Welsh Speech Recognition. The sentences come from various CC0 sources and include:

* Original sentences
* Sentences from novels, essays and other out of copyright material
* Sentences from the Welsh Wicipedia where authors gave us permission to release them under a CC0 licence
* Tweets, emails, and other electronic material gifted to the project to be used as prompts

In a number of cases, the language was adapted and the sentences heavily edited to make them suitable for reading aloud by volunteers.

The corpus was also given to the Mozilla Common Voice project, and these sentences were therefore used to record volunteers.

We wish to thank everyone who helped us collect these sentences, including those who gave us their materials under a CC0 licence, and to Mozilla for their help and leadership with the Common Voice project.

Download ‘Brawddegau Cymraeg’ resource from:

techiaith/brawddegau-adnabod-lleferydd