RoboLlywydd

Or the ability to create your own naturally sounding Welsh language synthetic voices…

Ar part of our work on the Macsen project, we’ve created tools that will enable you to create naturally sounding Welsh language synthetic voices. The tools make it easy for you to prepare recordings scripts, record an individual’s voice, and with its knowledge of Welsh language pronunciation, build for you a Welsh language synthetic voice that sounds very similar to the recorded individual.

Here are examples of the voices of two members of the techiaith team having been synthesized with the new tools:

Male:

Female:

The team had the opportunity to demonstrate these tools at a recent SeneddLab 2017 event where a new voice was created within one hour, named ‘RoboLlywydd’ and used to speak the answers to questions about the National Assembly for Wales. Although the ‘RoboLlywydd’ name was just for fun, it showed that it’s possible to create and use many different individual voices within your own personal digital assistants. The following video talks more about this (especially after the fifth and a half minute):

We used an already open source system called MaryTTS which can now be used to create Welsh voices using the resources at the following GitHub repository:

techiaith/docker-marytts

Introducing Macsen

During 2015-2016 we have been developing new resources that enable you to talking in Welsh with computers. See Start Speaking Welsh to your Computer, Towards a Welsh ‘Siri’

This is a technology which is becoming increasingly prevalent as the human voice is used more and more for question and answer systems on mobile phones and tablets, and voice control for such things as television sets, robots and dictation systems. If Welsh cannot be used in these environments, then the language will be excluded from the digital world and Welsh speakers will have no choice but to speak English with these devices.

In order to pave the way for new Welsh medium technologies we have produced a Welsh question and answer prototype, where a personal assistant called “Macsen” is able to answer questions such as what is the news or weather.

Here is a video that introduces Macsen and demonstrates it at work on a small Raspberry Pi computer:

All of Macsen’s code and resources are available on GitHub so that anyone can expand its capabilities and develop their own Macsen. The homepage for Macsen on the web and where you’ll know where to begin is:

http://techiaith.cymru/macsen

We will continue to work on speech recognition and other open resources for Macsen. Get in touch with us if you’re a software company, coding club, school or a hacker with an interest in including Macsen into your own software projects.

‘Macsen’ was developed within the  ‘Welsh Language Communications Infrastructure’ project which was funded by the Welsh Government and S4C.

Start speaking Welsh to your computer

We are developing Welsh language speech recognition as part of our Welsh Language Communications Infrastructure, sharing it here on the Welsh National Language Technologies Portal with other developers of Welsh language software and apps.

Today we are pleased to share the first version of a Welsh language speech recognition system

Julius Cymraeg (julius-cy)

This project is based on the Julius – an open source large vocabulary continuous speech recognition (LVCSR) system and the files, sripts required to its adaption for supporting to recognize Welsh language speech rather than English or Japanese.

mic_web
http://julius.osdn.jp/en_index.php

The first release allows julius-cy to recognize very simple questions and commands in Welsh concerning the weather, news, time, music as well as asking for a joke or a proverb. This means that julius-cy is limited to recognising specific sentences and vocabulary:

  • “BETH YDY’R TYWYDD HEDDIW?” ( “What’s today’s weather?” )
  • “BETH YW TYWYDD YFORY?” ( “What’s tomorrow’s weather?” )
  • “BETH YW’R NEWYDDION?” ( “What’s the news?” )
  • “FAINT O’R GLOCH YDY HI?” ( “What time is it?” )
  • “CHWARAEA GERDDORIAETH CYMRAEG” ( “Play Welsh music?” )

Future versions of julius-cy will attempt to support recognising dictation and more varied speech.

github_logo
https://github.com/techiaith/julius-cy

Everything you need to easily get started is available with very liberal licensing on GitHub.

Got to:

https://github.com/techiaith/julius-cy

 

This is amazing! How does it work?

The background page explains more about the internals of the first release:

https://github.com/techiaith/julius-cy/blob/master/CEFNDIR.md

You can try adding your own texts and questions for julius-cy to recognize after reading this!

Hmm. It doesn’t work very well for me. How can I help?

We are using very initial acoustic models in julius-cy, therefore it may be possible that julius-cy will not be able to fully recognize everyone’s speech successfully.

If this is the case, and you have not already contributed your voice to our Paldaruo Speech Corpus, then please use our Paldaruo ap (http://techiaith.bangor.ac.uk/paldaruo) on any iOS or Android device so that we can improve the acoustic models with your voice.

New Cloud based Welsh Machine Translation

coin-tinyPart of our Welsh Language Communications Infrastructure project is to improve the machine translation resources for leveraging some capabilities that are provided via English language based technologies.

As a result, the Welsh National Language Technologies Portal Moses-SMT machine translation’s capabilities are now available from the API Centre thus making it easy to integrate into your software including translation memory systems such as Trados, WordFast and CyfieithuCymru (TranslateWales)

Welsh<>English Moses-SMT joins a wide range of other language technologies API services such as Cysill (Welsh spelling and grammar checker), text-to-speech, parts of speech tagger, language detection, lemmatizer and Vocab to enhance Welsh support of your website, app and software.

api cloudSimilar to these other API services, you can get started by obtaining your API key (How to register for an API key) and follow the documentation and code examples we’ve prepared on GitHub. Please see: https://github.com/PorthTechnolegauIaith/moses-smt/blob/master/docs/APIArlein.md

Before you go ahead however, we’d like to emphasize once more the importance of quality control – It is your responsibility to ensure that this machine translation software is used appropriately, including the use of careful post-editing (see Quality Issues).

Demo

We have prepared a demo so that you can evaluate the machine translation engines.

Please see:  http://techiaith.cymru/translation/demo

 

Towards a Welsh ‘Siri’…..

It is increasingly possible for you to speak with devices such as your phone or computer in order to command and control applications and devices as well as to receive intelligent and relevant answers to questions voiced in natural language.

Such capabilities are possible as a consequence of recent advancements in speech recognition, machine translation and natural language processing and understanding. As such they are the prime enablers for a disruptive change and a fundamental shift in how users and consumers engage with their devices and how they more widely use technology.

If looked at in its wider historical context, this is only the next step in the evolution of human computer interaction; from keyboard, to mouse, to touch, to voice and language.

There are four main commercial platforms driving this change, namely Siri, Ok Google, Microsoft Cortana and Amazon Alexa, as well as some lesser known open platforms.

 

 

To date, these provide their powerful capabilities in English and some other major languages, with little evidence that they are likely to extend their choice of languages to the ‘long tail’ of smaller languages, including Welsh, in the near future.

The Language Technologies Unit has been sponsored by the Welsh Government through its Welsh Language Technology and Digital Media Fund and S4C therefore to fulfill the ‘Welsh Language Communications Infrastructure‘ project, ensuring that users with a preferred language of Welsh are not left behind in such developments.

Our first deliverable as part of the project is a brief report on how we can achieve this. It concludes that the commercial offerings by the large companies do not provide any technical means at the moment for realising a Welsh language digital assistant. Thus only open alternatives such as finer grained online APIs and various open source software allow us to progress.

It is hoped that the project will lay the foundations for a range of Welsh language technologies to be used in such environments, including improving the work done to date on Welsh language speech recognition as well as machine translation for leveraging some of capabilities provided via English language based technologies.

All of the software and resources developed by the project will be available here from the Welsh National Language Technologies Portal. The project will stimulate the development of new Welsh language software and services that could contribute to the mainstreaming of Welsh in the next phase of human-computer interaction.

In the meantime, we need your help! Please contribute your voice to our speech corpus via our Paldaruo app:

paldaruo

iTunes Google Play

More Welsh text-to-speech resources on GitHub

Since its launch in March, a few coders and companies have been using the cloud based Welsh language text-to-speech API service.

Very often however, developers from companies in particular wish to utilise Welsh language text-to-speech available offline and in Microsoft Windows based environments. We also get from time to time e-mails from text-to-speech developers of other lesser resourced languages asking for help on using their own voices in Microsoft Windows.

Our Welsh language text-to-speech voice is possible thanks to the superb Festival Speech Synthesis System. However, Festival, as its developers openly admit, does not support Microsoft Windows very well at all.

We think that Festival and its Welsh voice should be possible in Microsoft Windows. Therefore, we’ve published the speech data that makes Festival talk Welsh on GitHub as well as hack on the side to create a Visual Studio Solution project that makes Festival run natively on Windows with a very basic COM and .NET interface.

The voice data can be found here: https://github.com/PorthTechnolegauIaith/llais_festival

While our attempt to get get our Welsh text-to-speech voice running on Windows and our contribution to improving Festival on Microsoft Windows can be found here: https://github.com/techiaith/Festival_Windows

Without these resources there are very few, if any, options for Welsh or any Festival voice to be usable on Windows. We hope that these contributions are of great help and can be improved upon with the aid of Welsh language and international open source communities.

Coding a Welsh language robot

As part of our mission to promote the acquisition of computing skills amongst Welsh speakers, the Language Technologies Unit has been developing a series of computer science lessons aimed at primary school children.

The basis of these resources is the Raspberry Pi foundation’s collection of Turing Test lessonsrobot. The resources were originally created in English and placed on the foundation’s website under an open license, allowing for free distribution and sharing.

Our contribution has been to translate the whole course into Welsh, and to place it on GitHub, so that it can be made accessible to the public to use or adapt towards any purpose that they wish. We’ve also created a brand new lesson for the course that is specifically geared towards Welsh speaking children. This special lesson introduces children to some of the resources of the Language Resources Portal, including Welsh language text-to-speech, Cysill Ar-lein (online spelling and grammar checker for Welsh), language detection and parts of speech tagger, all in a fun and easy format.

tyrbinau 006
Children from Garndolbenmaen Primary school enjoying their coding lesson with Dewi Bryn Jones, Patrick Robertson and Rapiro the Robot.

The lesson was trialled by Dewi Bryn Jones and Patrick Robertson at Garndolbenmaen Primary School in March, and was considered a resounding success. See this previous blog post for a video created by the children to learn more about the day’s events.

All of the resources are available on GitHub under an open license here. These include the three original lessons that were translated, the special lesson on adding Welsh features to the robot and also instructions for setting up for teachers and students.

Here is the lesson structure:

Lessons

And you can find the special Welsh lesson here:

Moses SMT update

When we released our machine translation resources earlier this month, we were using the first version of Moses, version 1.0. We have now updated the script to the latest version: Moses 3.0.

Everything is available from either GitHub at http://github.com/PorthTechnolegauIaith/moses-smt or from Docker.com https://registry.hub.docker.com/u/techiaith/moses-smt/.

Moses 3.0 offers a number of improvements for translators. According to the release notes (which you can read here) these updates include features which make the decoding process quicker, release more memory, and make Moses more effective in the process of matching sentences correctly.

We will also be taking advantage of this update in order to improve the translation engine CofnodYCynulliad (which we’ve previously talked about here) with additional data which we will be collecting from the Welsh Assembly.

Additionally, we plan to create a new domain specific engine for translating software, with the help of data provided by Rhoslyn Prys from meddal.com.

These are good examples of the iterative nature of translation engines, where it’s possible to keep adding data in order to develop and improve them continuously. Keep an eye out for more developments on this soon.

Machine Translation on Mac OS X

Since we’ve already released our machine translation system in Docker, it’s easy enough to get it running on an OS X system!

First, you will need to install one or two pieces of software on your computer. This tutorial uses a homebrew to install the packages.
(You can look again at the original tutorial if you like).

Installing VirtualBox

  • Docker needs VirtualBox on OS X (and Windows) to run the Linux virtual engineering. Download VirtualBox from the VirtualBox website.

Installing boot2docker and docker

We will be using a Homebrew in order to install these. Open Terminal and write the following commands:

  • ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

    This will install the homebrew on your computer.

  • Next, install boot2docker and docker with the following commands:
    brew install boot2docker
    brew install docker

     

  • Start boot2docker (so that you can download the virtual engine) like this:
    boot2docker init

     

Increasing Virtual Box’s disk space

VirtualBox’s virtual disk will be created with a size limit of 20GB. The machine translation system (Moses SMT), including the language model file, needs more disk space than this, so the disk size will obviously need to be increased. This is unfortunately quite a long process, but the good news is that Docker have written a very simple tutorial on how to do it!

We recommend that you increase the disk size to 30GB (although the machine translation system only needs around 21GB).

Downloading and installing the translation system

Once you’ve increased the disk size in VirtualBox, you will need to start the boot2docker engine. Go back to Terminal, and write:

boot2docker up

Make a note of what is printed on the screen at the end of this command. This is important because you will need it to communicate with Docker. It should look something like this:

Writing /Users/patrick/.boot2docker/certs/boot2docker-vm/ca.pem
Writing /Users/patrick/.boot2docker/certs/boot2docker-vm/cert.pem
Writing /Users/patrick/.boot2docker/certs/boot2docker-vm/key.pem
    export DOCKER_CERT_PATH=/Users/patrick/.boot2docker/certs/boot2docker-vm
    export DOCKER_TLS_VERIFY=1
    export DOCKER_HOST=tcp://192.168.59.103:2376

The last three lines are particularly important. Copy them, and then paste them into your Terminal window so that you can run the export commands.

Docker is ready

Now, after all this work, Docker should be ready!
Download the machine translation file using the following command:

docker pull techiaith/moses-smt

And then start the engine with:

docker run --name moses-smt-cofnodycynulliad-en-cy -p 8008:8008 -p 8080:8080 techiaith/moses-smt start -e CofnodYCynulliad -s en -t cy

Note: this command downloads a translation model which is based on the Proceedings of the National Assembly for Wales corpus. You can change the name ‘CofnodYCynulliad’ after the ‘start’ command to any one of the three below:

  • CofnodYCynulliad (en-cy a cy-en) – two large models which are based on the Proceedings of the National Assembly for Wales. One is specifically for translation from English to Welsh (en-cy), and the other is for translation from Welsh to English (cy-en). Size: ~3.7GB each.
  • CofnodBachYCynulliad – a much smaller model of the proceedings corpus which is based on a sub-set of the data (we recommend this if you just want to experiment quickly). Size: ~65MB
  • Deddfwriaeth – this engine was trained with data from the Legislation corpus. Size: ~900MB

These three language models are also available for download from techiaith.org. See http://techiaith.org/moses/

It’s also important to note that you can use your own language model for this step (if you’ve already trained one)! Remember that the data we provide is a basis only, and it’s fairly simple to train your own language model. See the docs for more information on how to do this here.

See Moses working

The final ‘docker run’ command creates a server on your local computer on the port 8008. To connect with this port, you will need to open ports in the VirtualBox. Open the  ‘VirtualBox.app’ program (in your ‘Applications’ folder, and then click on Settings’, and then on the ‘Network’ tab. There is a button at the bottom of the screen called ‘port forwarding’. Add rules as you can see below:

virtualbox

That’s it!

Go to http://127.0.0.1:8008 in your browser and start translating!

diolch

Thanks!!!

We would like to thank everyone who attended the Through Technological Means conference, and all those who gave presentations and contributed their time and energy towards making it a great day.

But most of all, we’d like to pass our special thanks on to the children of Garndolbenmaen primary school. They came to talk about their experiences using our synthetic voice resources in recent lessons they received on coding with the Raspberry Pi, which were provided by the Unit. They had prepared a video for the conference, but unfortunately there were technical problems when it was played. So now at last (and with apologies for those difficulties), here is the full video that was made by the children of Garndolbenmaen primary school:

The children described to the audience their experience during the lessons, where they were taught core coding skills using the Language Technology Unit’s Welsh medium Turing Test resources. The children also had the opportunity to meet one very special guest – the Vice-chancellor of Bangor University!

DSC_0010

The children explained to the Vice-chancellor, professor John Hughes, that they had thoroughly enjoyed working on the project, and that they had learnt a variety of very useful skills. One or two even said that they would like to be professional coders in the future! The children were also able to meet with some of the guest speakers who had travelled from far and wide to attend the conference. Below, from left to right, are John Judge from Ireland, Dwayne Bailey from South Africa (but who is currently working in London) and Kepa Sarasola from the Basque Country .

siaradwyr_NDF8994

Here are the children meeting the guest speakers, as well as those members of the Language Technologies Unit who worked on the Language Technologies Portal project, not forgetting Rapiro, the little robot who speaks Welsh:

Grwp_NDF8993

The children also shared their story with Radio Cymru:

Post Cyntaf : http://www.bbc.co.uk/programmes/b053hsb6 – at 1:16:25.

And the BBC News programme on S4C :

http://www.bbc.co.uk/cymrufyw/31833000

And there were many positive comments on Twitter :