Wednesday, November 23, 2005

[NLP around NYC] free toolkit for syntax-driven SMT

The 2005 JHU Language Engineering Workshop has released a free toolkit for syntax-driven statistical machine translation (a.k.a. "translation by parsing"). The "GenPar" Toolkit is intended to serve as a springboard for research. Its modular design makes it also useful for educational purposes.

GenPar features:
* User, system, and design documentation.
* Flexibility -- it is dynamically configurable via nested config files.
* Intuitive, object-oriented design, making it easy to modify and extend.
* Complete validation suite.
* Fully integrated prototype SMT systems for 3 language pairs. These prototypes are certainly not state-of-the-art (so far). However, they are complete, in the sense that no additional software is required to build an MT system, apply it to new input, and automatically evaluate the results. These prototypes can also serve as blueprints/templates for other language pairs.

GenPar is downloadable from here:
http://www.clsp.jhu.edu/ws2005/groups/statistical/GenPar.html

The accompanying "MTV" tool for visualizing tree-structured alignments is downloadable from here:
http://www.clsp.jhu.edu/ws2005/groups/statistical/mtv.html

A report outlining the context in which these tools were created is at
http://www.clsp.jhu.edu/ws2005/groups/statistical/documents/finalreport.pdf

Researchers at several institutions are actively developing GenPar and MTV. We welcome inquiries from potential contributors and collaborators. Of course, we also welcome feedback from users.

Contact:
Dan Melamed
New York University
lastname AT cs DOT nyu DOT edu

No comments: