Category Archives: Resources

Introducing Lleisiwr – Welsh Open Source Voice Banking and Text to Speech

In November 2017, The Language Technology Unit received a small grant from the Welsh Government’s Technology and the Welsh Language Fund, to work with the NHS as partners on a project to allow patients on the brink of losing their voice to bank their voice and then generate a personal digital synthetic voice. This had never before been availabe for Welsh speakers, and is a great step forward for Welsh speaking patients.

More information about this service can be found here including details for sofware developers about the package’s source code.

Here is a short video that shows you how to register for the service

There has been quite a favourable initial response on the social websites :

New Speech Resources

New speech resources have just been published by us under the Macsen project, funded by the Welsh Government. See details below. Enjoy!

HTK Acoustic Model

http://techiaith.cymru/htk/paldaruo-16kHz-2017-12-08.tar.gz

Lexicon

http://techiaith.cymru/htk/lexicon-2017-12-08.tar.gz

Prosodylab Aligner

There are also new HTK acoustic models included in the Welsh Prosodylab Aligner:

https://github.com/techiaith/Prosodylab-Aligner/tree/v2.0_paldaruo_4

Kaldi Acoustic Model

http://techiaith.cymru/kaldi/decoders/paldaruo_macsen/tri3-2017-12-18.tar.gz

Training code in GitHub

https://github.com/techiaith/kaldi-cy

Towards a Welsh ‘Siri’…..

It is increasingly possible for you to speak with devices such as your phone or computer in order to command and control applications and devices as well as to receive intelligent and relevant answers to questions voiced in natural language.

Such capabilities are possible as a consequence of recent advancements in speech recognition, machine translation and natural language processing and understanding. As such they are the prime enablers for a disruptive change and a fundamental shift in how users and consumers engage with their devices and how they more widely use technology.

If looked at in its wider historical context, this is only the next step in the evolution of human computer interaction; from keyboard, to mouse, to touch, to voice and language.

There are four main commercial platforms driving this change, namely Siri, Ok Google, Microsoft Cortana and Amazon Alexa, as well as some lesser known open platforms.

 

 

To date, these provide their powerful capabilities in English and some other major languages, with little evidence that they are likely to extend their choice of languages to the ‘long tail’ of smaller languages, including Welsh, in the near future.

The Language Technologies Unit has been sponsored by the Welsh Government through its Welsh Language Technology and Digital Media Fund and S4C therefore to fulfill the ‘Welsh Language Communications Infrastructure‘ project, ensuring that users with a preferred language of Welsh are not left behind in such developments.

Our first deliverable as part of the project is a brief report on how we can achieve this. It concludes that the commercial offerings by the large companies do not provide any technical means at the moment for realising a Welsh language digital assistant. Thus only open alternatives such as finer grained online APIs and various open source software allow us to progress.

It is hoped that the project will lay the foundations for a range of Welsh language technologies to be used in such environments, including improving the work done to date on Welsh language speech recognition as well as machine translation for leveraging some of capabilities provided via English language based technologies.

All of the software and resources developed by the project will be available here from the Welsh National Language Technologies Portal. The project will stimulate the development of new Welsh language software and services that could contribute to the mainstreaming of Welsh in the next phase of human-computer interaction.

In the meantime, we need your help! Please contribute your voice to our speech corpus via our Paldaruo app:

paldaruo

iTunes Google Play

Language Technologies Portal Blog

During the next few weeks and months (and leading up to our ‘Through Technological Means’ conference) we will be publishing a number of language technology resources through Twitter (@techiaith) and this blog.

We hope to share stories on other developers and coders using these new resources, so contact us if any of them have been useful to your activities or projects.

There’s an exciting collection of new stuff on the way, giving a serious boost to coders and developers of new Welsh software.

We would like to thank the Welsh Government and their Welsh-language Technology and Digital Media Fund for sponsoring this work which forms part of the National Welsh national Language Technologies Portal.

Follow our blog for all our latest news!