Machine Translation on Mac OS X

Since we’ve already released our machine translation system in Docker, it’s easy enough to get it running on an OS X system!

First, you will need to install one or two pieces of software on your computer. This tutorial uses a homebrew to install the packages.
(You can look again at the original tutorial if you like).

Installing VirtualBox

  • Docker needs VirtualBox on OS X (and Windows) to run the Linux virtual engineering. Download VirtualBox from the VirtualBox website.

Installing boot2docker and docker

We will be using a Homebrew in order to install these. Open Terminal and write the following commands:

  • ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

    This will install the homebrew on your computer.

  • Next, install boot2docker and docker with the following commands:
    brew install boot2docker
    brew install docker

     

  • Start boot2docker (so that you can download the virtual engine) like this:
    boot2docker init

     

Increasing Virtual Box’s disk space

VirtualBox’s virtual disk will be created with a size limit of 20GB. The machine translation system (Moses SMT), including the language model file, needs more disk space than this, so the disk size will obviously need to be increased. This is unfortunately quite a long process, but the good news is that Docker have written a very simple tutorial on how to do it!

We recommend that you increase the disk size to 30GB (although the machine translation system only needs around 21GB).

Downloading and installing the translation system

Once you’ve increased the disk size in VirtualBox, you will need to start the boot2docker engine. Go back to Terminal, and write:

boot2docker up

Make a note of what is printed on the screen at the end of this command. This is important because you will need it to communicate with Docker. It should look something like this:

Writing /Users/patrick/.boot2docker/certs/boot2docker-vm/ca.pem
Writing /Users/patrick/.boot2docker/certs/boot2docker-vm/cert.pem
Writing /Users/patrick/.boot2docker/certs/boot2docker-vm/key.pem
    export DOCKER_CERT_PATH=/Users/patrick/.boot2docker/certs/boot2docker-vm
    export DOCKER_TLS_VERIFY=1
    export DOCKER_HOST=tcp://192.168.59.103:2376

The last three lines are particularly important. Copy them, and then paste them into your Terminal window so that you can run the export commands.

Docker is ready

Now, after all this work, Docker should be ready!
Download the machine translation file using the following command:

docker pull techiaith/moses-smt

And then start the engine with:

docker run --name moses-smt-cofnodycynulliad-en-cy -p 8008:8008 -p 8080:8080 techiaith/moses-smt start -e CofnodYCynulliad -s en -t cy

Note: this command downloads a translation model which is based on the Proceedings of the National Assembly for Wales corpus. You can change the name ‘CofnodYCynulliad’ after the ‘start’ command to any one of the three below:

  • CofnodYCynulliad (en-cy a cy-en) – two large models which are based on the Proceedings of the National Assembly for Wales. One is specifically for translation from English to Welsh (en-cy), and the other is for translation from Welsh to English (cy-en). Size: ~3.7GB each.
  • CofnodBachYCynulliad – a much smaller model of the proceedings corpus which is based on a sub-set of the data (we recommend this if you just want to experiment quickly). Size: ~65MB
  • Deddfwriaeth – this engine was trained with data from the Legislation corpus. Size: ~900MB

These three language models are also available for download from techiaith.org. See http://techiaith.org/moses/

It’s also important to note that you can use your own language model for this step (if you’ve already trained one)! Remember that the data we provide is a basis only, and it’s fairly simple to train your own language model. See the docs for more information on how to do this here.

See Moses working

The final ‘docker run’ command creates a server on your local computer on the port 8008. To connect with this port, you will need to open ports in the VirtualBox. Open the  ‘VirtualBox.app’ program (in your ‘Applications’ folder, and then click on Settings’, and then on the ‘Network’ tab. There is a button at the bottom of the screen called ‘port forwarding’. Add rules as you can see below:

virtualbox

That’s it!

Go to http://127.0.0.1:8008 in your browser and start translating!

diolch