Monday, September 12, 2005

Information Sciences Institute - Grammar Lost Translation Machine In Researchers Fix Will

Daniel Marcu and Kevin Knight at UCS/ISI "propose to implement a trainable tree-based language model and parser, and to carry out empirical machine-translation experiments with them. USC/ISI's state-of-the-art machine translation system already has the ability to produce, for any input sentence, a list of 25,000 candidate English outputs. This list can be manipulated in a post-processing step. We will re-rank these lists of candidate string translations with our tree-based language model, and we plan for better translations to rise to the top of the list."

In this vein, David Chiang of the University of Maryland, Institute for Advanced Computer Studies, will present Friday, September 16, at the monthly NYCNLP meetings organized by NYU's Dan Melamed. Below is David's abstract:

The introduction of data-driven methods into machine translation (MT) in the 1990s created a whole new way of doing MT, and the recent move from the word-based models developed at IBM to the phrase-based models developed by Och and others has led to a breakthrough in MT performance. The next breakthrough, the move to syntax-based models that deal with the full hierarchical structures of sentences, is still waiting to happen. Several approaches have been tried, making considerable progress but not yet surpassing the performance level of simpler phrase-based models. Hiero is a step towards that breakthrough from the other side: it starts with a phrase-based model and incorporates formal characteristics of syntax-based models to improve on both. Like the latter, it deals with hierarchical structures, but it takes after the former in that it is unconstrained by syntactic theories, and can be trained from parallel bilingual text without any syntactic annotation, manual or automatic. In the recent NIST MT Evaluation, it outperformed several state-of-the-art systems, both phrase-based and syntax-based, on both Chinese-English and Arabic-English translation. I will present Hiero's underlying model, its implementation, and experimental results.

D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proc. ACL-05.

No comments: