Tuesday, June 26, 2007
Human Language Technology Center at Johns Hopkins
Johns Hopkins Gazette | June 25, 2007: "The Johns Hopkins University has been awarded a long-term multimillion-dollar contract to establish and operate a Human Language Technology Center of Excellence near the Homewood campus. The center's research will focus on advanced technology for automatically analyzing a wide range of speech, text and document image data in multiple languages."
Thursday, June 21, 2007
FactSpotter from XRCE Grenoble
Searching for documents that contain specific information can be time consuming and frustrating in today’s office environment. Xerox scientist Frédérique Segond helped develop FactSpotter, a new technology that takes ordinary search to the next level by digging into more documents, analyzing meaning of words and context, accepting queries in everyday language. Segond manages parsing and semantics research at Xerox Research Centre Europe, in Grenoble.
Friday, May 25, 2007
Business Objects to Acquire Text Analytics Leader Inxight Software
Combination of Inxight and Business Objects to Deliver First Full Spectrum Business Intelligence Platform: "With the acquisition of Inxight Software, Inc., Business Objects expands its leadership in extending BI to embrace enterprise search. Going beyond basic keyword searches and solutions that simply provide a ranked listing of searched items, Inxight’s web services-based federated search and extraction capabilities extend the value of enterprise search engines by instantly clustering and filtering results from multiple search engines, including Google Search Appliance and Oracle Secure Enterprise Search. By providing a BI platform that leverages these capabilities, Business Objects will become the first vendor to bridge the gap between search and intelligence – delivering a broader view of data and dramatically accelerating the ability to locate hidden information in search results that might otherwise be overlooked. "
Tuesday, May 15, 2007
How Google translates without understanding
How Google translates without understanding The Register: "The Google approach is a lesson in practical software development: try things and see what sticks. It has just a few major steps:
1. Google starts with lots and lots of paired-example texts, like formal documents from the United Nations, in which identical content is expertly translated into many different languages. With these documents they can discover that 'white house' tends to co-occur with 'casa blanca,' so that the next time they have to translate a text containing 'white house' they will tend to use 'casa blanca' in the output.
2. They have even more untranslated text in each language, which lets them make models of 'well-formed' sentence fragments (for example, preferring 'white house' to 'house white'). So the raw output from the first translation step can be further massaged into (statistically) nicer-sounding text.
3. Their key for improving the system - and winning competitions - is an automated performance metric, which assigns a translation quality number to each translation attempt. More on this fatally weak link below."
1. Google starts with lots and lots of paired-example texts, like formal documents from the United Nations, in which identical content is expertly translated into many different languages. With these documents they can discover that 'white house' tends to co-occur with 'casa blanca,' so that the next time they have to translate a text containing 'white house' they will tend to use 'casa blanca' in the output.
2. They have even more untranslated text in each language, which lets them make models of 'well-formed' sentence fragments (for example, preferring 'white house' to 'house white'). So the raw output from the first translation step can be further massaged into (statistically) nicer-sounding text.
3. Their key for improving the system - and winning competitions - is an automated performance metric, which assigns a translation quality number to each translation attempt. More on this fatally weak link below."
Monday, May 07, 2007
PROMT 8.0: revamped translation software
PROMT revamped translation software product line: OSP International: "Evaluation of machine translation quality is usually quite individual but PROMT claims that PROMT 8.0 analyzes the context and generates grammatically correct translation of most of linguistic structures and set expressions. The user can teach the translator, enriching its vocabulary by adding personal dictionaries and using earlier translated text pieces in further translations. The quality of translation, especially of specialized texts, also largely depends on setting up software according to the document subject. The system set-up procedure, which many users used to ignore because of its length and complexity, has been much simplified in version 8.0. "
Saturday, April 07, 2007
Rosette Linguistics Platform by Basis Technology
Rosette Linguistics Platform by Basis Technology: "Basis Technology's interface module for the Rosette® Linguistics Platform (RLP) adds extensive multilingual support to Lucene quickly and easily.
RLP is the same multilingual text analysis technology used by the leading commercial search engines including Google, Yahoo!, Ask, and Live.com Search. That means users can enjoy the same quality of experience with Lucene they have come to expect with their favorite web and enterprise search engines."
RLP is the same multilingual text analysis technology used by the leading commercial search engines including Google, Yahoo!, Ask, and Live.com Search. That means users can enjoy the same quality of experience with Lucene they have come to expect with their favorite web and enterprise search engines."
Sunday, April 01, 2007
Machine Translation: Google News Headlines
Buzz of the week: it's interesting to see how each of the titles of this same story puts a different slant - from "seek" to "on the cards" to "speaking" and, finally to "has visions."
Google seeks world of instant translations
Boston Globe - Boston,MA,USA
Franz Och, head of statistical machine translation efforts at Google, is
photographed at his office in Mountain View, California, March 20, 2007.
...
Instant translation of content on the cards for Google
IT PRO - London,Greater London,UK
Could statistical machine translation deliver real-time language
translation of text and other content for the search giant? ...
Google speaking everyone's language
CNNMoney.com - USA
Google's (down $0.45 to $463.17, Charts) approach, called statistical
machine translation, differs from past efforts in that it forgoes language
experts who ...
Google has visions of instant online translation
Mobile Digest - London,England,UK
This 'statistical machine translation' doesn't rely directly on language
experts, grammatical rules, and dictionaries, as existing systems do, ...
Google seeks world of instant translations
Boston Globe - Boston,MA,USA
Franz Och, head of statistical machine translation efforts at Google, is
photographed at his office in Mountain View, California, March 20, 2007.
...
Instant translation of content on the cards for Google
IT PRO - London,Greater London,UK
Could statistical machine translation deliver real-time language
translation of text and other content for the search giant? ...
Google speaking everyone's language
CNNMoney.com - USA
Google's (down $0.45 to $463.17, Charts) approach, called statistical
machine translation, differs from past efforts in that it forgoes language
experts who ...
Google has visions of instant online translation
Mobile Digest - London,England,UK
This 'statistical machine translation' doesn't rely directly on language
experts, grammatical rules, and dictionaries, as existing systems do, ...