We are developing Welsh language speech recognition as part of our Welsh Language Communications Infrastructure, sharing it here on the Welsh National Language Technologies Portal with other developers of Welsh language software and apps.
Today we are pleased to share the first version of a Welsh language speech recognition system
Julius Cymraeg (julius-cy)
This project is based on the Julius – an open source large vocabulary continuous speech recognition (LVCSR) system and the files, sripts required to its adaption for supporting to recognize Welsh language speech rather than English or Japanese.
The first release allows julius-cy to recognize very simple questions and commands in Welsh concerning the weather, news, time, music as well as asking for a joke or a proverb. This means that julius-cy is limited to recognising specific sentences and vocabulary:
- “BETH YDY’R TYWYDD HEDDIW?” ( “What’s today’s weather?” )
- “BETH YW TYWYDD YFORY?” ( “What’s tomorrow’s weather?” )
- “BETH YW’R NEWYDDION?” ( “What’s the news?” )
- “FAINT O’R GLOCH YDY HI?” ( “What time is it?” )
- “CHWARAEA GERDDORIAETH CYMRAEG” ( “Play Welsh music?” )
Future versions of julius-cy will attempt to support recognising dictation and more varied speech.
Everything you need to easily get started is available with very liberal licensing on GitHub.
This is amazing! How does it work?
The background page explains more about the internals of the first release:
You can try adding your own texts and questions for julius-cy to recognize after reading this!
Hmm. It doesn’t work very well for me. How can I help?
We are using very initial acoustic models in julius-cy, therefore it may be possible that julius-cy will not be able to fully recognize everyone’s speech successfully.
If this is the case, and you have not already contributed your voice to our Paldaruo Speech Corpus, then please use our Paldaruo ap (http://techiaith.bangor.ac.uk/paldaruo) on any iOS or Android device so that we can improve the acoustic models with your voice.