Wednesday, August 24, 2005

NIST 2005 Machine Translation Evaluation Results

Finally something real from Google... and this time even beating many of the old timers. The table below shows the BLEU results for Arabic to English. While this is good advertizing for Google, it lacks comparaison to the present leaders in Arabic translation: Language Weaver and, the well established Apptek. It's great though to see competition heating up. I wonder when Fluent Machines will show off their high BLEU scores.

Site BLEU-4 Score
GOOGLE 0.5131
ISI 0.4657
IBM 0.4646
UMD 0.4497
JHU-CU 0.4348
SYSTRAN 0.1079
MITRE 0.0772
FSC 0.0037

The participants were:
U.S. Army Research Laboratory, Advanced Telecommunications Research Institute International Spoken Language Translation Research Laboratories - Japan, University of Edinburgh - UK, Fitchburg State College, Google, Harbin Institute of Technology Machine Intelligence & Translation Laboratory - China, IBM, Chinese Academy of Sciences Institute of Computing Technology - China, University of Southern California Information Sciences Institute, ITC-IRST - Italy, Johns Hopkins University & University of Cambridge, Linear B - UK, MITRE Corporation, National Research Council of Canada, NTT Communication Science Laboratories - Japan, RWTH Aachen University - Germany, Saarland University - Germany, Sakhr Software, SYSTRAN Language Translation Technologies, University of Maryland

Saturday, August 06, 2005

Can Google Stay Google?

Can Google Stay Google?: "'We're in a target-rich environment of interesting problems,' says Alan Eustace, one of Google's handful of vice presidents of engineering and its head of research. Take the technology for 'machine translation' of human language. Right now, Google can automatically translate Web pages from English into a bunch of major languages and vice versa -- German, Spanish, French, Italian, Portuguese, Japanese, Chinese, and Korean. The list will get longer in the next year or two. But that's just the beginning, Eustace says: 'The goal is to make the Internet language-independent.' Ultimately, all search results will come back instantly in your own language, regardless of what tongue you speak -- and what dialect the pages are written in. Every Google user will be like a delegate in the General Assembly of the United Nations putting on headphones to hear translations of the speaker up front. At the UN, it doesn't matter whether you speak only French and the orator is waxing eloquent in Chinese. The Web will be the same way.
Automated universal translation is the kind of long-range vision that inspires people like Eustace. It fascinates them because it's a technical Mount Everest that they can climb, but also because it's an idealistic goal that's potentially enriching to global society. 'In the long term, if you can create technology that can unify information around the world and remove the language barrier, that would be very special,' he says. "

Are we there yet? - I would love to be able to finally translate something in this engine. From what can be seen at Google, the quality of translation isn't far from what Systran, Logos, AppTek and Barcelona systems have been delivering since the '80.

Here is what Google writes about its translation in their language tools FAQ:
The translation isn't as good as I'd like it to be. Can you make it more accurate?

The translation you are seeing was produced automatically by state-of-the-art technology. Unfortunately, today's most sophisticated software doesn't approach the fluency of a native speaker or possess the skill of a professional translator. Automatic translation is very difficult, as the meaning of words depends upon the context in which they are used. Because of this, accurate translation requires an understanding of context, as well as an understanding of the structure and rules of a language. While many engineers and linguists are working on the problem, it will be some time before anyone can offer a quick and seamless translation experience. In the interim, we hope the service we provide is useful for most purposes.