Creating domain-specific Translation Engines

Many translators believe that there is only one translation engine within their translation infrastructure.  But some translators use many engines; domain-specific translation engines.

Domain-specific translation engines are engines created in order to translate for particular topics, styles or registers. For many translators, domain-specific engines offer superior translation compared to normal machine translation systems.

Domain-specific engines are particularly effective in situations where translation memories are already being used successfully to save time and money. If you use domain-specific translation memories, a translation engine can use the same allocation and a post-editing routine to increasy the efficacy and productivity of translation beyond that produced by normal translation memory systems.

Today we are releasing resources in the Language Technologies Portal and on GitHub which allow you to create, using Moses-SMT, your own domain-specific translation engines.

Be advised – you will need a Linux computer such as Ubuntu, with at least 4Gb of RAM and a significant amount of paralell Welsh-English text. Our method produces domain-specific translation machines which don’t need much memory to run, but do take some GBs of space on your hard disk.

To get started, you will need to install Moses-SMT using the instructions on the following page : Installing Moses-SMT on Linux. The installation scripts include adaptations that we’ve made that make it easier for you to train Moses-SMT with your own parallel Welsh-English text.

The page Create Moses-SMT engines  provides detailed instructions on how to get started. But in short, if you had a paralell text taken from your own work translating marketing (or ‘marchnata’ in Welsh) documents, you would need to do the following.

First, place the Welsh text in a file named ‘Marchnata.cy’ and the English text in ‘Saesneg.en’ and then keep these files in the sub-folder  ‘corpus’ inside the folder of your machine ‘Marketing’ like this:

moses@ubuntu:~/moses-smt$ cd ~/moses-models/Marchnata/corpus
moses@ubuntu:~/moses-models/Marchnata/corpus$ ls
Marchnata.cy  Marchnata.en

The data is now ready to be trained. You will only need a single command, noting the name and the direction of the translation (i.e. Welsh to English, or English to Welsh).

So, if you’d like to create a machine for the marketing data that translates from English to Welsh, you would type the command line as it is below:

moses@ubuntu:~/moses-smt$ python moses.py train -e Marchnata -s en -t cy

This will cause a lot of data to appear on the screen. The command, depending on the size of your original dataset, will probably take hours to complete. There is no need to follow the progress reports particularly closely, but you will need to keep an eye out for any serious error messages to check whether or not the training has succeeded.

If it is successful, follow the prompt to edit and change files of your new machine.

To start the new machine, you will need the following command:

moses@ubuntu:~/moses-smt$ python moses.py start -e Marchnata -s en -t cy