Evaluating Translations Produced by Amazon Mechanical Turk

2132 Words5 Pages

Abstract

We investigate the use of Amazon Mechanical Turk for the creation of translations from English to Haitian Creole. The intention is to produce a bilingual corpus for Statistical Machine Translation. In several experiments we offer varying amounts of money for the translation tasks. The current results show that there is no clear correlation between pay and the translation quality. Almost all translations show a significant overlap with online translation tools which indicates that the workers did often not translate the sentences themselves.

1 Introduction

Our group is currently involved in the development of an English↔Haitian Creole translation system for use in the earthquake region of Haiti. One of the current tasks is the rapid production of a bilingual English↔Haitian Creole in-domain medical dialogue corpus to be able to train a Statistical Machine Translation system. Some native Haitian Creole speakers volunteered to help with translations and we also intend to use professional translators to support the effort.

Amazon’s Mechanical Turk (AMT) is an interesting alternative here as it would be cheaper than using professional translators. This is especially relevant for an English↔Haitian Creole translation system as the commercial potential is probably limited.

One of the main concerns with using AMT for NLP tasks, especially translation is the quality of the resulting data and the availability of workers with Haitian Creole knowledge.

The experiments in this paper address these concerns and evaluate the translations produced by Amazon Mechanical Turk compared with professionals and unpaid volunteers. We investigate the overall quality of the produced translations and compare the translations done at differe...

... middle of paper ...

...e seems to be reasonably well represented.

It will be necessary to confirm the experiments with further translations to have a larger test set. A professional translation will be used as a gold standard to have a more confident reference translation for the automatic evaluations.

It would also be interesting to do similar experiments with other more common or more uncommon language pairs for further comparisons.

References

Winter Mason, Duncan J. Watts. 2009. Financial Incentives and the 'Performance of Crowds'. Proceedings of KDD-HCOMP 2009, Paris, France.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of the ACL 2002 conference, Philadelphia, PA.

Jason Pontin. 2007. Artificial Intelligence, With Help From the Humans, The New York Times, March 25th, 2007.

Open Document