Author Archives: Myfyr

Machine Translation on Mac OS X

Since we’ve already released our machine translation system in Docker, it’s easy enough to get it running on an OS X system!

First, you will need to install one or two pieces of software on your computer. This tutorial uses a homebrew to install the packages.
(You can look again at the original tutorial if you like).

Installing VirtualBox

  • Docker needs VirtualBox on OS X (and Windows) to run the Linux virtual engineering. Download VirtualBox from the VirtualBox website.

Installing boot2docker and docker

We will be using a Homebrew in order to install these. Open Terminal and write the following commands:

  • ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

    This will install the homebrew on your computer.

  • Next, install boot2docker and docker with the following commands:
    brew install boot2docker
    brew install docker

     

  • Start boot2docker (so that you can download the virtual engine) like this:
    boot2docker init

     

Increasing Virtual Box’s disk space

VirtualBox’s virtual disk will be created with a size limit of 20GB. The machine translation system (Moses SMT), including the language model file, needs more disk space than this, so the disk size will obviously need to be increased. This is unfortunately quite a long process, but the good news is that Docker have written a very simple tutorial on how to do it!

We recommend that you increase the disk size to 30GB (although the machine translation system only needs around 21GB).

Downloading and installing the translation system

Once you’ve increased the disk size in VirtualBox, you will need to start the boot2docker engine. Go back to Terminal, and write:

boot2docker up

Make a note of what is printed on the screen at the end of this command. This is important because you will need it to communicate with Docker. It should look something like this:

Writing /Users/patrick/.boot2docker/certs/boot2docker-vm/ca.pem
Writing /Users/patrick/.boot2docker/certs/boot2docker-vm/cert.pem
Writing /Users/patrick/.boot2docker/certs/boot2docker-vm/key.pem
    export DOCKER_CERT_PATH=/Users/patrick/.boot2docker/certs/boot2docker-vm
    export DOCKER_TLS_VERIFY=1
    export DOCKER_HOST=tcp://192.168.59.103:2376

The last three lines are particularly important. Copy them, and then paste them into your Terminal window so that you can run the export commands.

Docker is ready

Now, after all this work, Docker should be ready!
Download the machine translation file using the following command:

docker pull techiaith/moses-smt

And then start the engine with:

docker run --name moses-smt-cofnodycynulliad-en-cy -p 8008:8008 -p 8080:8080 techiaith/moses-smt start -e CofnodYCynulliad -s en -t cy

Note: this command downloads a translation model which is based on the Proceedings of the National Assembly for Wales corpus. You can change the name ‘CofnodYCynulliad’ after the ‘start’ command to any one of the three below:

  • CofnodYCynulliad (en-cy a cy-en) – two large models which are based on the Proceedings of the National Assembly for Wales. One is specifically for translation from English to Welsh (en-cy), and the other is for translation from Welsh to English (cy-en). Size: ~3.7GB each.
  • CofnodBachYCynulliad – a much smaller model of the proceedings corpus which is based on a sub-set of the data (we recommend this if you just want to experiment quickly). Size: ~65MB
  • Deddfwriaeth – this engine was trained with data from the Legislation corpus. Size: ~900MB

These three language models are also available for download from techiaith.org. See http://techiaith.org/moses/

It’s also important to note that you can use your own language model for this step (if you’ve already trained one)! Remember that the data we provide is a basis only, and it’s fairly simple to train your own language model. See the docs for more information on how to do this here.

See Moses working

The final ‘docker run’ command creates a server on your local computer on the port 8008. To connect with this port, you will need to open ports in the VirtualBox. Open the  ‘VirtualBox.app’ program (in your ‘Applications’ folder, and then click on Settings’, and then on the ‘Network’ tab. There is a button at the bottom of the screen called ‘port forwarding’. Add rules as you can see below:

virtualbox

That’s it!

Go to http://127.0.0.1:8008 in your browser and start translating!

diolch

Thanks!!!

We would like to thank everyone who attended the Through Technological Means conference, and all those who gave presentations and contributed their time and energy towards making it a great day.

But most of all, we’d like to pass our special thanks on to the children of Garndolbenmaen primary school. They came to talk about their experiences using our synthetic voice resources in recent lessons they received on coding with the Raspberry Pi, which were provided by the Unit. They had prepared a video for the conference, but unfortunately there were technical problems when it was played. So now at last (and with apologies for those difficulties), here is the full video that was made by the children of Garndolbenmaen primary school:

The children described to the audience their experience during the lessons, where they were taught core coding skills using the Language Technology Unit’s Welsh medium Turing Test resources. The children also had the opportunity to meet one very special guest – the Vice-chancellor of Bangor University!

DSC_0010

The children explained to the Vice-chancellor, professor John Hughes, that they had thoroughly enjoyed working on the project, and that they had learnt a variety of very useful skills. One or two even said that they would like to be professional coders in the future! The children were also able to meet with some of the guest speakers who had travelled from far and wide to attend the conference. Below, from left to right, are John Judge from Ireland, Dwayne Bailey from South Africa (but who is currently working in London) and Kepa Sarasola from the Basque Country .

siaradwyr_NDF8994

Here are the children meeting the guest speakers, as well as those members of the Language Technologies Unit who worked on the Language Technologies Portal project, not forgetting Rapiro, the little robot who speaks Welsh:

Grwp_NDF8993

The children also shared their story with Radio Cymru:

Post Cyntaf : http://www.bbc.co.uk/programmes/b053hsb6 – at 1:16:25.

And the BBC News programme on S4C :

http://www.bbc.co.uk/cymrufyw/31833000

And there were many positive comments on Twitter :

 

 

Creating domain-specific Translation Engines

Many translators believe that there is only one translation engine within their translation infrastructure.  But some translators use many engines; domain-specific translation engines.

Domain-specific translation engines are engines created in order to translate for particular topics, styles or registers. For many translators, domain-specific engines offer superior translation compared to normal machine translation systems.

Domain-specific engines are particularly effective in situations where translation memories are already being used successfully to save time and money. If you use domain-specific translation memories, a translation engine can use the same allocation and a post-editing routine to increasy the efficacy and productivity of translation beyond that produced by normal translation memory systems.

Today we are releasing resources in the Language Technologies Portal and on GitHub which allow you to create, using Moses-SMT, your own domain-specific translation engines.

Be advised – you will need a Linux computer such as Ubuntu, with at least 4Gb of RAM and a significant amount of paralell Welsh-English text. Our method produces domain-specific translation machines which don’t need much memory to run, but do take some GBs of space on your hard disk.

To get started, you will need to install Moses-SMT using the instructions on the following page : Installing Moses-SMT on Linux. The installation scripts include adaptations that we’ve made that make it easier for you to train Moses-SMT with your own parallel Welsh-English text.

The page Create Moses-SMT engines  provides detailed instructions on how to get started. But in short, if you had a paralell text taken from your own work translating marketing (or ‘marchnata’ in Welsh) documents, you would need to do the following.

First, place the Welsh text in a file named ‘Marchnata.cy’ and the English text in ‘Saesneg.en’ and then keep these files in the sub-folder  ‘corpus’ inside the folder of your machine ‘Marketing’ like this:

moses@ubuntu:~/moses-smt$ cd ~/moses-models/Marchnata/corpus
moses@ubuntu:~/moses-models/Marchnata/corpus$ ls
Marchnata.cy  Marchnata.en

The data is now ready to be trained. You will only need a single command, noting the name and the direction of the translation (i.e. Welsh to English, or English to Welsh).

So, if you’d like to create a machine for the marketing data that translates from English to Welsh, you would type the command line as it is below:

moses@ubuntu:~/moses-smt$ python moses.py train -e Marchnata -s en -t cy

This will cause a lot of data to appear on the screen. The command, depending on the size of your original dataset, will probably take hours to complete. There is no need to follow the progress reports particularly closely, but you will need to keep an eye out for any serious error messages to check whether or not the training has succeeded.

If it is successful, follow the prompt to edit and change files of your new machine.

To start the new machine, you will need the following command:

moses@ubuntu:~/moses-smt$ python moses.py start -e Marchnata -s en -t cy