Wednesday, November 23, 2005

OpenLogos

Members of the MT community may be interested in knowing, if they do not always do so, that the German Research Center for Artificial Intelligence (DFKI) is offering the Logos Machine Translation System in an open-source derivative known as OpenLogos. OpenLogos runs on the Linux platform with PostgreSQL and maybe downloaded from http://logos-os.dfki.de/

This open-software offering is being made to individuals, universities and public institutions free-of-charge, with a view to its exploitation in both current and new language combinations.

OpenLogos is based upon the long-standing commercial, rule-driven Logos System owned by GlobalWare AG (Eisennach)
http://allpr.de/20096/GlobalWare-AG-und-DFKI-praesentieren-LOGOS-Open-Source.html

For those interested in knowing about the underlying linguistic technology of OpenLogos, the article Bernard (Bud) Scott: The Logos Model: An Historical Perspective. In: Machine Translation 18 (2003), pp. 1-72 provides a comprehensive overview of the Logos approach to machine translation.

An earlier on-line description of the linguistic and computational motivations for the Logos Model is available at http://iai.iai.uni-sb.de/iaien/iaiwp/p11/index.html

Bud Scott
Parse International, Inc.
bud.scott@verizon.net

[NLP around NYC] free toolkit for syntax-driven SMT

The 2005 JHU Language Engineering Workshop has released a free toolkit for syntax-driven statistical machine translation (a.k.a. "translation by parsing"). The "GenPar" Toolkit is intended to serve as a springboard for research. Its modular design makes it also useful for educational purposes.

GenPar features:
* User, system, and design documentation.
* Flexibility -- it is dynamically configurable via nested config files.
* Intuitive, object-oriented design, making it easy to modify and extend.
* Complete validation suite.
* Fully integrated prototype SMT systems for 3 language pairs. These prototypes are certainly not state-of-the-art (so far). However, they are complete, in the sense that no additional software is required to build an MT system, apply it to new input, and automatically evaluate the results. These prototypes can also serve as blueprints/templates for other language pairs.

GenPar is downloadable from here:
http://www.clsp.jhu.edu/ws2005/groups/statistical/GenPar.html

The accompanying "MTV" tool for visualizing tree-structured alignments is downloadable from here:
http://www.clsp.jhu.edu/ws2005/groups/statistical/mtv.html

A report outlining the context in which these tools were created is at
http://www.clsp.jhu.edu/ws2005/groups/statistical/documents/finalreport.pdf

Researchers at several institutions are actively developing GenPar and MTV. We welcome inquiries from potential contributors and collaborators. Of course, we also welcome feedback from users.

Contact:
Dan Melamed
New York University
lastname AT cs DOT nyu DOT edu

Monday, October 31, 2005

Friday, October 28, 2005

Let's talk! The computer can translate

Let's talk! The computer can translate...announced he would take questions from reporters in Germany and America, the computer heard it as "so we glycogen it alternating questions between Germany and America."

I would be curious to see how this got translated into German and then hear it synthesized by some speech generator for German listeners :) ...and, as always, it's only five years away from working perfectly.

Thursday, September 15, 2005

Slashdot | A Useful Grammar Checker?

Yesterday was a very "linguistic" day at Slashdot. There was a post about A Useful Grammar Checker
Programming
Posted by Cliff on Wednesday September 14, @06:02PM
from the what-set-of-rules-do-you-use dept.
burtdub asks: 'With the amount of raw text data available, there seems to be no shortage of ambitious language projects on the horizon, from Universal Language Translators to Junk Email Filtering. However, the mess that is the English language still seems to elude commercial attempts while being relatively ignored by the open source community. What would it take to make a useful, functional grammar checker?'"

And, of course, as we are used to see on Slashdot, responses flew from all over and in any direction...

A graduate student, apparently from OSU, gave a tough answer to the usual reproduction of assertion of messiness of English compared to perfection of other languages in his post:
"Most of the comments about grammar here have been incredibly stupid, by the way. Here's an important thing you learn in an intro to ling class: all languages are equally complicated. It's not going to be easier to write a grammar checker for any language above any other. e.g. You might have to worry more about morphology in one language and word order in another."

Monday, September 12, 2005

Information Sciences Institute - Grammar Lost Translation Machine In Researchers Fix Will


Daniel Marcu and Kevin Knight at UCS/ISI "propose to implement a trainable tree-based language model and parser, and to carry out empirical machine-translation experiments with them. USC/ISI's state-of-the-art machine translation system already has the ability to produce, for any input sentence, a list of 25,000 candidate English outputs. This list can be manipulated in a post-processing step. We will re-rank these lists of candidate string translations with our tree-based language model, and we plan for better translations to rise to the top of the list."

In this vein, David Chiang of the University of Maryland, Institute for Advanced Computer Studies, will present Friday, September 16, at the monthly NYCNLP meetings organized by NYU's Dan Melamed. Below is David's abstract:

The introduction of data-driven methods into machine translation (MT) in the 1990s created a whole new way of doing MT, and the recent move from the word-based models developed at IBM to the phrase-based models developed by Och and others has led to a breakthrough in MT performance. The next breakthrough, the move to syntax-based models that deal with the full hierarchical structures of sentences, is still waiting to happen. Several approaches have been tried, making considerable progress but not yet surpassing the performance level of simpler phrase-based models. Hiero is a step towards that breakthrough from the other side: it starts with a phrase-based model and incorporates formal characteristics of syntax-based models to improve on both. Like the latter, it deals with hierarchical structures, but it takes after the former in that it is unconstrained by syntactic theories, and can be trained from parallel bilingual text without any syntactic annotation, manual or automatic. In the recent NIST MT Evaluation, it outperformed several state-of-the-art systems, both phrase-based and syntax-based, on both Chinese-English and Arabic-English translation. I will present Hiero's underlying model, its implementation, and experimental results.

PAPER:
D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proc. ACL-05.
http://www.umiacs.umd.edu/~dchiang/papers/chiang-acl05.pdf

Wednesday, August 24, 2005

NIST 2005 Machine Translation Evaluation Results

Finally something real from Google... and this time even beating many of the old timers. The table below shows the BLEU results for Arabic to English. While this is good advertizing for Google, it lacks comparaison to the present leaders in Arabic translation: Language Weaver and, the well established Apptek. It's great though to see competition heating up. I wonder when Fluent Machines will show off their high BLEU scores.

Site BLEU-4 Score
GOOGLE 0.5131
ISI 0.4657
IBM 0.4646
UMD 0.4497
JHU-CU 0.4348
EDINBURGH 0.3970
SYSTRAN 0.1079
MITRE 0.0772
FSC 0.0037

The participants were:
U.S. Army Research Laboratory, Advanced Telecommunications Research Institute International Spoken Language Translation Research Laboratories - Japan, University of Edinburgh - UK, Fitchburg State College, Google, Harbin Institute of Technology Machine Intelligence & Translation Laboratory - China, IBM, Chinese Academy of Sciences Institute of Computing Technology - China, University of Southern California Information Sciences Institute, ITC-IRST - Italy, Johns Hopkins University & University of Cambridge, Linear B - UK, MITRE Corporation, National Research Council of Canada, NTT Communication Science Laboratories - Japan, RWTH Aachen University - Germany, Saarland University - Germany, Sakhr Software, SYSTRAN Language Translation Technologies, University of Maryland