Friday, May 20, 2005

Still waiting for that first translation...

Will it be Google to reach this long lasting dream first? Maybe Fluent Machines still has an advantage... Nothing new in this aritcle from The Stanford Daily Online Edition: "Machine translations, for instance, have come a long way at Google.

“Historically, the approach to building machine translation systems is to have expert machine linguists write down dictionaries and rules on how to translate, say, from Chinese to English,” said researcher Franz Och. “Trying to write down all the rules on how to translate from Chinese to English is very hard.”

Instead, Google is fine-tuning a translation program that can automatically translate back and forth between documents in different languages — a sort of virtual Rosetta Stone.

Current machine translations are inconsistent at best, Och said. One current translation program translated “The White House confirmed the existence of a new bin Laden tape” in Arabic to “Alpine white new presence tape registered for coffee confirms Laden,” in English."

Tuesday, May 17, 2005

Washington Post uses Teragram

Teragram is doing well: After reading that NY Times uses their categorization engine, Washington Post is mentioned in Teragram News: "'Our paper covers all facets of news and is continually receiving updates, information and electronic content,' said John Whall, director of applications development for Washingtonpost.Newsweek Interactive (WPNI). 'The taxonomy management abilities of Teragram TK240 enable us to manage, process and present this constant flow of information in a way that is useful to our editors and, more importantly, our readers.'"

Sunday, May 08, 2005

Computers Grading Students' Writing?

News: "SAGrader analyzes sentences and paragraphs, looking for keywords as well as the relationship between terms.

Other programs compare a student's paper with a database of already-scored papers, seeking to assign it a score based on what other similar-quality assignments have received.

Educational Testing Service sells Criterion, which includes the 'e-Rater' used to score GMAT essays. Vantage Learning has IntelliMetric, Maplesoft sells Maple T.A., and numerous other programs are used on a smaller scale. "

Thursday, May 05, 2005

Les Européens unissent leurs forces pour créer une bibliothèque virtuelle

Les Européens unissent leurs forces pour créer une bibliothèque virtuelle: "La semaine dernière, le président de la République a donc lancéun programme de développement d'un nouveau moteur de recherche sur le Net franco-allemand avec le chancelier allemand Gerhard Schröder. Quaero – c'est son nom – sera dédié à l'image, au son et à la vidéo."

So, it is Quaero the name of the EU library project covered in more linguistically relevant detail by Mark Liberman at Language Log.

Wednesday, May 04, 2005

Microsoft IP Ventures - Natural Language Processing for Educational Courseware

Microsoft's research labs and developments teams, for years, have produced technologies that have been out of reach to outside entrepreneurs. Finally, they are making available their IP through Microsoft IP Ventures. Among other technologies, some of their NLP stuff is available as well:

Microsoft IP Ventures - Natural Language Processing for Educational Courseware: "Natural Language Processing for Educational Courseware creates dynamic learning programs from any static educational content consisting of questions from the material that continuously adapt to a student based on previous answers."

Based on NLPWin which processes English, Spanish, German, French, and Japanese, it could become a very effective tool for creating new and interesting learning tools. I wonder whether it could break the monopoly of classrooms in language teaching/learning.

Monday, May 02, 2005

Teragram Adds Hungarian to its Linguistic Suite

Teragram Adds Hungarian to its Linguistic Suite: "Hungarian's linguistic challenges are easily handled by Teragram's dictionary as it breaks apart and parses meaning from highly agglutinative words. For example, the Hungarian word 'mostohagyerekeidhez,' meaning 'to your step children' is actually composed of many smaller pieces and can be deconstructed into 'mostoha gyerek e i d hez.' In this example, Teragram's software breaks the word down into its basic elements to derive meaning: 'mostoha' (meaning 'step' in English), gyerek ('child'), 'gyereke' (the possessive marker 'e' turns the meaning to 'child belonging to'), 'gyerekei' (the plural marker 'i' turns the meaning to 'children belonging to'), 'gyerekeid' (the second person marker 'd' turns the meaning to 'the children belonging to you' i.e. 'your children'), 'gyerekeidhez' (the inflectional 'hez' turns the meaning to 'to your children'). 'The ability of Teragram's powerful linguistic engine to deconstruct words into meaningful parts is critical to improving the precision of information retrieval applications and search accuracy,' says Dr. Schabes. 'That's what our customers look to us to uniquely provide.'"