Friday, December 01, 2006
Saturday, November 25, 2006
When I read about the pattern apparently established by "gaydar" using the reanalysis of ra(dio)-dar, that word seemed very familiar even though it was the first time I saw it. And then came to mind Arkadiy Gaydar (Аркадий Гайдар), a Russian writer of the leninist era. Looking it up on Google, I found 1,130,000 ghits for Гайдар against the 930,000 ghits for the English spelling. This made me think of the usefulness of a phonetic search engine. Using finite state transducers from Xerox PARC, I had built one long time ago (1994) - it ended up being used in telephone directories to look up names by knowing approximate spelling (or just how to pronounce them...)
Saturday, October 21, 2006
This contrasts with other translation systems, in which linguists painstakingly code grammar rules with long lists of exceptions to each rule. Och's system recently received the highest score in a competition of translation systems conducted by the U.S. Commerce Department's National Institute of Standards and Technology."
Saturday, September 23, 2006
Tuesday, September 12, 2006
'We are on our way to learning from more than 1 trillion words procured from public Web pages, where others may have a billion,' he says, adding, 'there's no data like more data. ... Regardless of how clever the algorithm, the number of words is a critical factor.'"
Friday, August 04, 2006
Saturday, July 29, 2006
The most exciting new MT approach around. Meaningful Machines will finally let us know something more detailed. The corresponding patents have been out for a while, but I am looking forward to read this paper...
Wednesday, May 31, 2006
Wednesday, May 03, 2006
A very informative commentary by Mark Liberman is found here and a not so exciting discussion going on here.
Sunday, March 26, 2006
Bibliothèque Numérique Francophone: "Après Gallica, après la Bibliothèque Numérique Européenne, cap donc aujourd'hui vers la Bibliothèque Numérique Francophone. Nul doute qu'avec ce chef d'escadrille visionnaire, les éditeurs et autres milieux professionnels moutonniers du livre français lancés derrière lui dans le combat anti-Google sont en train de participer activement au futur rayonnement des savoirs, de la culture et de la langue française sur internet. Qui parlait de déclin de la France?"
Monday, March 13, 2006
Tuesday, March 07, 2006
That's great - but - what's new in there? OCR? Siemens' MT (METAL)? In any case, everything seems to be two years away - even this statement is not new...
Sunday, March 05, 2006
Paul Horn, the director of IBM Research: It continues to be a big thing for IBM and for IBM Research, but it's not just WebFountain. The basic issues are, really, natural language understanding in general. What WebFountain was able to do, which made it powerful, was it would go in and would scan text documents on the Web and it would understand enough about what people were saying that you could query it about what people were saying. You could imagine that there's a lot of countries, including our own, that would care a lot about scanning documents and even open documents and crawling through them to see what people were saying. A lot of the early work on WebFountain was done in three languages--English, Arabic and Chinese--and you can guess who might sponsor that work.
WebFountain is an example of a natural language technology that allows you to essentially analyze from an intelligence point of view what people are saying, but the important point is that this is just a small piece of many, many problems that companies have and where you want to take advantage of natural language understanding, such as translating spoken English to Russian and back again.
We talked about call centers. Natural language understanding can be incredibly powerful, even if you've got a call center operator, just by monitoring the calls and trying to understand what the issues are. There's enormous amounts of natural language and analytic issues in how companies interact with their customers. WebFountain was a specific application of natural language and search technology, but it's just one.
Wednesday, January 25, 2006
Nice article about the current developments in text mining.
...Text-mining engines, which can cost as little as a few thousands dollars, take up where Google leaves off, searching articles, webpages, blogs, and e-mail (and eventually, even mobile phone calls or television broadcasts) for ideas and even emotions, rather than specific terms...