Saturday, September 23, 2006

IBM technology translates Arabic media broadcasts to English

IBM Research Press Resources IBM technology translates Arabic media broadcasts to English: "Codenamed 'TALES' (for Translingual Automatic Language Exploitation System), the IBM technology processes the audio signal from Arabic television and radio stations and translates its spoken content into English text. Once this text is indexed by the CriticalTV platform, Critical Mention's clients will be able to conduct real-time searches of Arabic media, and receive alerts instantly when a search term is detected."

Tuesday, September 12, 2006

There's no data like more data...

Intelligent Enterprise Magazine: Google, Competitors Look Toward the Ultimate Search: "'Page rank is one factor with which we work; others are classification, clustering and synonym finding,' says Peter Norvig, Google's director of search quality. Norvig adds that Google is also working with technologies such as statistical machine translation, speech recognition and entity detection. The plan is to leverage what Google 'owns' on the Web to learn as many words, and consequent word relations, as possible. That, he says, would enable intuitive, cognitive 'conversations' to take place between searcher and search engine.
'We are on our way to learning from more than 1 trillion words procured from public Web pages, where others may have a billion,' he says, adding, 'there's no data like more data. ... Regardless of how clever the algorithm, the number of words is a critical factor.'"