Friday, December 02, 2005
Language Weaver Offers New Language Translation Module for Persian
Bidirectional language pairs available include: Arabic/English, Chinese/English, Persian/English, French/English, and Spanish/English; unidirectional languages include Somali to English and Hindi to English.
Wednesday, November 23, 2005
OpenLogos
This open-software offering is being made to individuals, universities and public institutions free-of-charge, with a view to its exploitation in both current and new language combinations.
OpenLogos is based upon the long-standing commercial, rule-driven Logos System owned by GlobalWare AG (Eisennach)
http://allpr.de/20096/GlobalWare-AG-und-DFKI-praesentieren-LOGOS-Open-Source.html
For those interested in knowing about the underlying linguistic technology of OpenLogos, the article Bernard (Bud) Scott: The Logos Model: An Historical Perspective. In: Machine Translation 18 (2003), pp. 1-72 provides a comprehensive overview of the Logos approach to machine translation.
An earlier on-line description of the linguistic and computational motivations for the Logos Model is available at http://iai.iai.uni-sb.de/iaien/iaiwp/p11/index.html
Bud Scott
Parse International, Inc.
bud.scott@verizon.net
[NLP around NYC] free toolkit for syntax-driven SMT
GenPar features:
* User, system, and design documentation.
* Flexibility -- it is dynamically configurable via nested config files.
* Intuitive, object-oriented design, making it easy to modify and extend.
* Complete validation suite.
* Fully integrated prototype SMT systems for 3 language pairs. These prototypes are certainly not state-of-the-art (so far). However, they are complete, in the sense that no additional software is required to build an MT system, apply it to new input, and automatically evaluate the results. These prototypes can also serve as blueprints/templates for other language pairs.
GenPar is downloadable from here:
http://www.clsp.jhu.edu/ws2005/groups/statistical/GenPar.html
The accompanying "MTV" tool for visualizing tree-structured alignments is downloadable from here:
http://www.clsp.jhu.edu/ws2005/groups/statistical/mtv.html
A report outlining the context in which these tools were created is at
http://www.clsp.jhu.edu/ws2005/groups/statistical/documents/finalreport.pdf
Researchers at several institutions are actively developing GenPar and MTV. We welcome inquiries from potential contributors and collaborators. Of course, we also welcome feedback from users.
Contact:
Dan Melamed
New York University
lastname AT cs DOT nyu DOT edu
Monday, October 31, 2005
Slashdot | Can Your Mouth Become Multilingual?
Friday, October 28, 2005
Let's talk! The computer can translate
I would be curious to see how this got translated into German and then hear it synthesized by some speech generator for German listeners :) ...and, as always, it's only five years away from working perfectly.
Thursday, September 15, 2005
Slashdot | A Useful Grammar Checker?
Programming
Posted by Cliff on Wednesday September 14, @06:02PM
from the what-set-of-rules-do-you-use dept.
burtdub asks: 'With the amount of raw text data available, there seems to be no shortage of ambitious language projects on the horizon, from Universal Language Translators to Junk Email Filtering. However, the mess that is the English language still seems to elude commercial attempts while being relatively ignored by the open source community. What would it take to make a useful, functional grammar checker?'"
And, of course, as we are used to see on Slashdot, responses flew from all over and in any direction...
A graduate student, apparently from OSU, gave a tough answer to the usual reproduction of assertion of messiness of English compared to perfection of other languages in his post:
"Most of the comments about grammar here have been incredibly stupid, by the way. Here's an important thing you learn in an intro to ling class: all languages are equally complicated. It's not going to be easier to write a grammar checker for any language above any other. e.g. You might have to worry more about morphology in one language and word order in another."
Monday, September 12, 2005
Information Sciences Institute - Grammar Lost Translation Machine In Researchers Fix Will
Daniel Marcu and Kevin Knight at UCS/ISI "propose to implement a trainable tree-based language model and parser, and to carry out empirical machine-translation experiments with them. USC/ISI's state-of-the-art machine translation system already has the ability to produce, for any input sentence, a list of 25,000 candidate English outputs. This list can be manipulated in a post-processing step. We will re-rank these lists of candidate string translations with our tree-based language model, and we plan for better translations to rise to the top of the list."
In this vein, David Chiang of the University of Maryland, Institute for Advanced Computer Studies, will present Friday, September 16, at the monthly NYCNLP meetings organized by NYU's Dan Melamed. Below is David's abstract:
The introduction of data-driven methods into machine translation (MT) in the 1990s created a whole new way of doing MT, and the recent move from the word-based models developed at IBM to the phrase-based models developed by Och and others has led to a breakthrough in MT performance. The next breakthrough, the move to syntax-based models that deal with the full hierarchical structures of sentences, is still waiting to happen. Several approaches have been tried, making considerable progress but not yet surpassing the performance level of simpler phrase-based models. Hiero is a step towards that breakthrough from the other side: it starts with a phrase-based model and incorporates formal characteristics of syntax-based models to improve on both. Like the latter, it deals with hierarchical structures, but it takes after the former in that it is unconstrained by syntactic theories, and can be trained from parallel bilingual text without any syntactic annotation, manual or automatic. In the recent NIST MT Evaluation, it outperformed several state-of-the-art systems, both phrase-based and syntax-based, on both Chinese-English and Arabic-English translation. I will present Hiero's underlying model, its implementation, and experimental results.
PAPER:
D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proc. ACL-05.
http://www.umiacs.umd.edu/~dchiang/papers/chiang-acl05.pdf
Wednesday, August 24, 2005
NIST 2005 Machine Translation Evaluation Results
Site BLEU-4 Score
GOOGLE 0.5131
ISI 0.4657
IBM 0.4646
UMD 0.4497
JHU-CU 0.4348
EDINBURGH 0.3970
SYSTRAN 0.1079
MITRE 0.0772
FSC 0.0037
The participants were:
U.S. Army Research Laboratory, Advanced Telecommunications Research Institute International Spoken Language Translation Research Laboratories - Japan, University of Edinburgh - UK, Fitchburg State College, Google, Harbin Institute of Technology Machine Intelligence & Translation Laboratory - China, IBM, Chinese Academy of Sciences Institute of Computing Technology - China, University of Southern California Information Sciences Institute, ITC-IRST - Italy, Johns Hopkins University & University of Cambridge, Linear B - UK, MITRE Corporation, National Research Council of Canada, NTT Communication Science Laboratories - Japan, RWTH Aachen University - Germany, Saarland University - Germany, Sakhr Software, SYSTRAN Language Translation Technologies, University of Maryland
Saturday, August 06, 2005
Can Google Stay Google?
Automated universal translation is the kind of long-range vision that inspires people like Eustace. It fascinates them because it's a technical Mount Everest that they can climb, but also because it's an idealistic goal that's potentially enriching to global society. 'In the long term, if you can create technology that can unify information around the world and remove the language barrier, that would be very special,' he says. "
Are we there yet? - I would love to be able to finally translate something in this engine. From what can be seen at Google, the quality of translation isn't far from what Systran, Logos, AppTek and Barcelona systems have been delivering since the '80.
Here is what Google writes about its translation in their language tools FAQ:
The translation isn't as good as I'd like it to be. Can you make it more accurate?
The translation you are seeing was produced automatically by state-of-the-art technology. Unfortunately, today's most sophisticated software doesn't approach the fluency of a native speaker or possess the skill of a professional translator. Automatic translation is very difficult, as the meaning of words depends upon the context in which they are used. Because of this, accurate translation requires an understanding of context, as well as an understanding of the structure and rules of a language. While many engineers and linguists are working on the problem, it will be some time before anyone can offer a quick and seamless translation experience. In the interim, we hope the service we provide is useful for most purposes.
Saturday, June 25, 2005
Google and Albanian
Pass the hát, Google appears to also change "ë" in "e". Fortunately for Albanophiles, Yahoo maintains the difference by not folding the accented characters.
Read also Language Log and Technologies du Langage.
[just found out that, if the word containing diacritics is surrounded by quotes, Google will limit the search only to diacritic-marked-words]
Thursday, June 02, 2005
The Google Translator
Friday, May 20, 2005
Still waiting for that first translation...
“Historically, the approach to building machine translation systems is to have expert machine linguists write down dictionaries and rules on how to translate, say, from Chinese to English,” said researcher Franz Och. “Trying to write down all the rules on how to translate from Chinese to English is very hard.”
Instead, Google is fine-tuning a translation program that can automatically translate back and forth between documents in different languages — a sort of virtual Rosetta Stone.
Current machine translations are inconsistent at best, Och said. One current translation program translated “The White House confirmed the existence of a new bin Laden tape” in Arabic to “Alpine white new presence tape registered for coffee confirms Laden,” in English."
Tuesday, May 17, 2005
Washington Post uses Teragram
Sunday, May 08, 2005
Computers Grading Students' Writing?
Other programs compare a student's paper with a database of already-scored papers, seeking to assign it a score based on what other similar-quality assignments have received.
Educational Testing Service sells Criterion, which includes the 'e-Rater' used to score GMAT essays. Vantage Learning has IntelliMetric, Maplesoft sells Maple T.A., and numerous other programs are used on a smaller scale. "
Thursday, May 05, 2005
Les Européens unissent leurs forces pour créer une bibliothèque virtuelle
So, it is Quaero the name of the EU library project covered in more linguistically relevant detail by Mark Liberman at Language Log.
Wednesday, May 04, 2005
Microsoft IP Ventures - Natural Language Processing for Educational Courseware
Microsoft IP Ventures - Natural Language Processing for Educational Courseware: "Natural Language Processing for Educational Courseware creates dynamic learning programs from any static educational content consisting of questions from the material that continuously adapt to a student based on previous answers."
Based on NLPWin which processes English, Spanish, German, French, and Japanese, it could become a very effective tool for creating new and interesting learning tools. I wonder whether it could break the monopoly of classrooms in language teaching/learning.
Monday, May 02, 2005
Teragram Adds Hungarian to its Linguistic Suite
Saturday, April 30, 2005
Microsoft Looks to Yukon for Data Mining Gold - but text mining remains just a "capability"
Sunday, February 20, 2005
Why chattering classes have nothing to say
The Observer | UK News | Why chattering classes have nothing to say: The art of conversation is dead but the artistry of chatter is thriving, with Britons overwhelmingly admitting they rarely talk about anything more serious than traffic and television.
According to a survey of more than 2,000 adults, almost two-thirds of us admit to indulging in shallow chit-chat at the expense of weighty dialogue - even though we secretly long for more meaningful exchanges.
...
"The survey also found that more than two -thirds of people believe the telephone is the best way to have intelligent conversations, although Ned Sherrin, presenter of Loose Ends , the Radio 4 comedy show, a lexicographer and author of 20 books, admits hating the telephone. 'I would rather see the contours of their face, the clouds and the flicker of their tears. I find the telephone irritating and unsatisfactory, and like to get them over with as quickly as possible,' he said."
Monday, February 07, 2005
Cindy Adams of PageSix: Natural Language Dialogue
Yahoo! Movies: Entertainment News & Gossip: "LAGUARDIA Airport ladies room. A voice from another stall says, 'Hi, how are you?' The other lady, not one to chat up restroom strangers, sputters, 'Oh . . . fine . . . .' The Voice: 'So what're you up to?' The Embarrassed Sputterer, 'Ohhh, just traveling . . . .' The Voice: 'Can I come over?' Not quite knowing how to handle this bizarre turn, the Embarrassed Sputterer sputters: 'N-n-n-no. I'm a little busy right now.' The next sound is The Voice saying nervously: 'Listen, I'll have to call you back. There's an idiot in the other stall who keeps answering all my questions.'"
Friday, February 04, 2005
Slashdot | DARPA Contracts For AI Technology
2B enhances Factiva's reputation
'Whilst we were re-assessing the market this looked like the best way to accelerate our re-entry into the market,' Hart said of the decision to acquire 2B after dropping IBM.
Factiva announced in December 2004 that the IBM WebFountain web analysis platform was being dropped as the core technology for Factiva Insight for Reputation. WebFountain failed to provide timely content for analysis according to Factiva insiders. Hart denied that the IBM chapter had put Factiva behind in its reputation management plans."
Corpora launches 'language-savvy' knowledge discovery tool
Saturday, January 08, 2005
Live OpenSource Dictionary Project
Monday, January 03, 2005
CBS News | Defining Google | January 2, 2005�20:01:07
...Google engineer Alan Eustace explains, "One of the ideas that we’re working on is machine translation. We strongly believe that there’s enough data on the Web and in the world right now to allow us to automatically translate from one language to another." ...