The need to fight to be allowed to use your language if you want to use it outside the circle of family and friends who know it is one of the constant experiences of being a Welsh speaker. From having to specifically ask for a Welsh language form at the dentist (which is usually lurking in a cupboard in the back store) to choosing the Welsh option on a telephone line only to be put through to an apologetic English speaker. On the computer and within software, the story is often the same. Normally, a Welsh language pack must be installed to get the interface and other resources in Welsh, and before that you need to be aware of the existence of the language pack and where to access it.
One of those aspects that makes the latest developments within the field of Artificial Intelligence exciting is the fact that the Welsh language is within the major language models from the start. There is no need to install additional features, or jump into the settings. You can type Welsh into Claude or Copilot and get a surprisingly fluent answer.
In our work here in the Unit, ensuring that our Speech Recognition and Text-to-Speech models are also multilingual models that can speak Welsh and English has been a major priority. Although that is technically more challenging, creating one model that knows two languages, and can switch between them without difficulty in the middle of a sentence even is key.
That is because orally, as Welsh speakers, we mix English into our Welsh. Even when sticking to Welsh vocabulary only, models need to cope with the names of people, films, organizations and products where their spelling does not match the common pronunciation of Welsh. If we want to transcribe the way Welsh speakers speak from day to day then, our models have to cope with this linguistic mixing within the Welsh language, and the change between the Welsh language from one sentence to another.
That is exactly what our Speech and Text to Speech Recognition models (available from https://www.techiaith.cymru ) are trying to do, and you can see them in action within Macsen’s digital assistant ( https://macsen.techiaith.cymru ) and Transcriber’s transcription and subtitling website ( https://trawsgrifiwr.techiaith.cymru ). You can also listen to the bilingual synthetic voices.
If we want to support the use of Welsh as a living, modern language, we need to ensure that it is as easy to use in Wales as English is. Thanks to investment by the Welsh Government over the years, it is possible to ensure that the Welsh language is not treated less favorably than English within the most progressive technical environments. If you are developing for this area, take a look at the resources available from https://huggingface.co/techiaith and https://github.com/techiaith and contact us via techiaith@bangor.ac.uk for more details.
