Sunday, December 26, 2004

Slashdot | Post-Googleism At IBM With Piquant

Slashdot Post-Googleism At IBM With Piquant - a discussion of this NY Times article by James Fallows. IBM Systems Journal fall 2004 issue discusses more in detail Unstructured information which represents the vast majority of the data collected and accessible to enterprises. This data may be in various formats and may lack the organization of traditional sources such as database records. Exploiting this information requires systems for managing and extracting knowledge from large collections of unstructured data and applications for discovering patterns and relationships. This issue presents eight papers on the tools, methods, and architectures which are evolving for managing unstructured information in areas such as life science and market research. The issue also contains a paper on managing an enterprise architecture and one on autonomic computing systems.

Thursday, December 23, 2004

LXA:: Products

LXA:: Products: "The Lexalytics Knowledge Appliance® is a new class of software that solves a problem no search engine has been able to: actually make the data better than it was to begin with. The application provides the ability to search, find and extract relevant data and content from desktops, the Internet, corporate intranets and extranets. All of the important concepts from the data (people, companies, places, sentiment or tone classifications and documents summaries) are added to the original documents as enriched XML. This enriched content can be fed to any search engine or database product allowing new ways to utilize the content."

Saturday, November 13, 2004 "Applying a mix of sophisticated text-mining techniques, natural language disambiguation algorithms, and artificial intelligence methodologies to the public Web and news databases creates remarkable sets of highly structured and useful documents in a unified format, ready to be searched. "

Friday, November 12, 2004

The Professor and the Adman

The Professor and the Adman

AND's and now Crystal Research's disambiguation engine, Textonomy can identify the difference between an article about the economic sense of the word "depression" versus one about its psychological sense. Using that distinction as a starting point, the Advance product prevents a publisher or contextual ad provider from serving an ad for psychotherapy alongside an article about Black Tuesday.

Textonomy Advance is now competing in the contextual advertising space.

Friday, October 22, 2004

Stochasto's Natural Language Search Engine

Stochasto ASA, with offices in Oslo and Moscow, owns unique, patented technology in three areas:
- Intelligent search based on natural language
- Heuristic antivirus detection
- Protection, including encryption

These technologies are based on scientific advances in stochastic analysis and artificial intelligence, and they have been developed over many years in Russia. The company has an R&D facility in Moscow with a staff of 30, most of whom are highly experienced computer scientists.

Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature

Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature: "Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched."
Textpresso can be accessed at or via WormBase at

Wednesday, October 20, 2004

Data Classification Using a Digital Taxonomy

Data Classification Using a Digital Taxonomy (SYS-CON)(Printview): "Another implementation challenge is ensuring that data is classified correctly. There are auto-classification tools available that attempt to derive data context by using natural-language algorithms. These tools attempt to 'understand' the content of the given data by evaluating not just the keywords, but also the circumstance. Once attained, the tools will assign the data to the proper term in the taxonomy. The accuracy of these tools won't match human classification but could be acceptable especially if the data is already tagged using some form of metadata."

Monday, October 11, 2004

Article: Device translates spoken Japanese and English�| New Scientist

Article: Device translates spoken Japanese and English�| New Scientist: "A handheld device that enables a user to chat in another language - without having to learn any words or phrases for themselves - has been developed by Japanese electronics firm NEC.

The system is about the size of a handheld PDA and converts spoken Japanese to English and vice versa. It is planned for launch in Japan in the next few months."

Thursday, October 07, 2004

NIST - 32 new grants made for innovative technology R&D

Medical News Today - UK

... Syntax- and Rule-Based Decoding for Statistical Machine Translation Systems Develop an integrated, statistical phrase-based and syntactic rule approach to ...

Machine Translation News

GOOGLE'S Next Targets: Clustering and Translation
eWeek - USA
... Another space in Google's research net is statistical machine translation for turning Web pages into other languages, said Peter Norvig, director of search ...

LANGUAGE Weaver Raises $4M - Thousand Oaks,CA,USA
... board. Language Weaver is developing statistical machine translation software, used for translating foreign languages into English. ...

Tuesday, October 05, 2004

NLP news

Contact Center Coordination
Line 56 News - USA
... For ABN AMRO, eGain developed a virtual assistant (bot) to guide busy financial professionals with natural language queries and avoid a 20-document search ...

Knowledge is power
MSI Magazine - Oakbrook,IL,United States
... of National employees--product managers, field sales and service engineers, and Web designers--can log onto the system and use natural language commands to ...

Saturday, October 02, 2004

Text Mining News

MICROSOFT ups the ante for BI

Computer Business Review - UK

... upgraded with richer data mining algorithms and clustering capabilities for regression, segmentation, sequence, and association analysis as well as text mining ...

Tuesday, September 28, 2004

NLP News

INFORMATION System To Help Scientists Analyze Mechanisms Of Social ...
Science Daily (press release) - USA
... to enable semantic indexing will be developed by ChengXiang Zhai, professor of computer science and an expert on processing natural language for information ...

Monday, September 27, 2004

NLP News

MARKET Central Unveils SourceWare Search Engine for Corporate ...
Business Wire (press release) - San Francisco,CA,USA
... Using sophisticated natural language processing and interactive artificial intelligence (AI) algorithms based on automated classification, SourceWare Search ...

MERCURY gives business a role in software QA - USA
... New features also include drag-and-drop creation of business process tests, and the use of natural language to construct test components and test cases. ...

CLOSING the gap between business and developers
ADT Magazine - USA
... It has drag-and-drop capabilities, as well as the ability to construct test components and test cases with natural language, scriptless, keyword-driven testing ...

TUVOX Announces TuVox Perfect Router Speech-Enabled Call Routing ...
Business Wire (press release) - San Francisco,CA,USA
... Built completely on open standards, TuVox's enterprise software can automate virtually any type of call -- including natural language call routing, self ...

Saturday, September 25, 2004

Text Mining News

VISUAL Analytics Inc. and Basis Technology to Deliver Advanced...

PR Newswire (press release) - USA
... About Basis Technology Basis Technology provides software solutions for multilingual text mining and information retrieval applications. ...

NLP news

CONTACT Centers Await Next-Generation Speech
CRM Daily - USA
... On other hand, he says, with a speech-enabled application built on a natural-language platform, an application can handle a call by asking the customer outright ...

MICROSOFT showcases research initiatives
WindowsForDevices - Palo Alto,CA,USA
... speech recognition, user-interface research, programming tools and methodologies, operating systems and networking, graphics, natural language processing, and ...

Researchers develop data-mining system for biological literature.

RESEARCHERS develop data-mining system for biological literature.

Drug - Montpellier, France

24/09/2004 - Scientists in the US have developed Textpresso, a new text-mining system for scientific literature which implements a unique search engine using ...

Friday, August 20, 2004

Language may shape human thought
reports this week about a research project on Pirahã language numerals.

Hunter-gatherers from the Pirahã tribe, whose language only contains words for the numbers one and two, were unable to reliably tell the difference between four objects placed in a row and five in the same configuration, revealed the study.
./ [a.k.a. slashdot] responded with a lively discussion.

For more interesting stuff about Pirahã check Dan Everett's page.

Mark Liberman at Language Log has a point of view with which I agree. I don't think it's the language that determines the cognitive capacity. I think the cognitive capacity is independent of language as well as physical capacity is independent of hair color.

Thursday, August 19, 2004

Pros sort computer translating -- The Washington Times

The Washington Times has a short overview of the summarization and machine translation research going on at U. Maryland and Georgetown.

Saturday, July 31, 2004

Conference on E-mail and Anti-Spam

ACADEMICS Enlist in Spam Battle
eWeek - USA
... On Friday, the inaugural Conference on E-mail and Anti-Spam opened here at Microsoft Corp's campus with a decidedly different approach to fighting unwanted e-mail. Rather than touting products, speakers vetted research from universities and industry laboratories. Their approaches moved far beyond the Bayesian filtering of yesteryear to the use of sequencing techniques from bioinformatics, cryptography and natural language processing to tackle spam. ...

Tuesday, July 27, 2004

Less Lost in Translation

Microsoft taking a novel approach to MT: according to this article in Technology Review:
... A tool introduced recently in China by Microsoft helps writers who are not native in English to write better English. Called the English Writing Wizard, it is the first product that addresses the difficult task of giving suggestions to someone who has little or no ability to distinguish between good and bad advice. Although the wizard can help with translation, it is not, strictly speaking, a machine translation tool. It is more akin to the grammar checkers familiar to users of common word-processing programs-but enhanced to work with people not native in English. A writer uncertain about how to phrase an idea in English can type it directly in Chinese and get a high-quality translation. ...

[more at Technology Review]

Monday, July 26, 2004

MICROSOFT Announces SQL Server 2005 Support for AMD Extended

MICROSOFT Announces SQL Server 2005 Support for AMD Extended ...
Yahoo News (press release) - USA
... Data mining enhancements to SQL Server 2005 Beta 2 include a new Neural Network algorithm, a Text Mining feature, query enhancements and Reporting Services ...

BATTLEFIELD Tech for Aid Workers

BATTLEFIELD Tech for Aid Workers
Wired News - USA
... E-TAP was designed and built by BBN Technologies, with the real-time machine translation software developed by Language Weaver. ...

News in Linguistics

STUDENTS turning to sign language
San Jose Mercury News (subscription) - San Jose,CA,USA
... than 145 colleges - including those in the University of California - according to a list compiled by Sherman Wilcox, chair of the linguistics department at ...

EXPERTS appeal for popularizing Esperanto, protecting language ...
Xinhua - China
... Su Jinzhi, professor with the Institute of Applied Linguistics of the State Language Work Committee, said there are two ways to solve language inequality: one ...

'VALUES' is a slope politicians slip and slide around on
San Francisco Chronicle - San Francisco,CA,USA
... Geoffrey Nunberg is a linguistics professor at Stanford University and the author of "Going Nucular: Language, Politics and Culture in Controversial Times ...

FROM Homeless Shelter To Ivy League Dorm
Hartford Courant (subscription) - Hartford,CT,USA
... and acted in the freshman play, "Jesus Hopped the A Train." Daniels took a wide range of courses, including intermediate French, linguistics, and music theory. ...

FRENCH author pens 233-page novel without any action words, draws ...
Houston Chronicle - Houston,TX,USA
... The very notion -— in the words of linguistics professor Geoffrey Pullum, on a Web site about language -— "nuts, bonkers, round the bend.". ...

MARINES Seek Retirees For AD - USA
Retirees with experience in the intelligence, communications, public affairs, civil affairs, linguistics, logistics and administration fields are among the ...

'¡CHISTES!' Offers Insight Into Hispanic Culture
Albuquerque Journal (subscription) - Albuquerque,NM,USA
... Southwest. A bonus for those interested in linguistics is a glossary comparing the region's Spanish to "standard" Spanish. Garcia ...

Machine Translation at Work

HUMANITARIAN effort yields brilliant technology, teamwork
San Jose Mercury News
(subscription) - San Jose,CA,USA
... audience. This blending of human and machine translation capabilities makes the best use of both. The machines get us part of the way. ...

Tuesday, June 22, 2004

MultiLingual Computing, Inc.: Retrieving Information in Multiple Languages

Trey Jones, formerly at AppTek has written this nice introductory article at MultiLingual Computing describing some very time appropriate technologies developed at his former company. I have had the chance to see some of these technologies in action and I could say that AppTek has put in place a really impressive system that could be very useful to many agencies today.

Monday, June 21, 2004

Vericept Teams with Expert to Create Grant Template to Aid Schools in Obtaining Federal Funding for Security in Schools

Vericept Corporation, the leading provider of risk management solutions, today announced it is currently teaming with expert Nancy Willard, executive director of the Center for Safe and Responsible Internet Use, to create a grant template to aid schools in obtaining federal funding for use of Vericept's technology, through the Safe and Drug Free Schools program. Vericept enables "virtual adult supervision" of student Internet, e-mail and network use through its Vericept Intelligence Platform. Monitoring thousands of schools across the country, Vericept's solution identifies instances of network misuse by children, such as cyber bullying, visits to Web sites that provide potentially dangerous information, online drug dealing and e-mail dialogues threatening violent acts - providing school staff with the tools and information they need to deter and prevent potentially harmful behavior.

Online Editors for Typing Foreign Characters

Tomasz Szynalski just posted on linguist the link to Online Editors for Typing Foreign Characters. They provide buttons and keyboard shortcuts for entering foreign characters (accents, umlauts) without having to memorize Alt codes or install keyboard layouts.

I have referred my students to book publishers' sites in the past. I am glad they will now have access to this clean, ad free editors.

The URLs are:

Tuesday, June 15, 2004

PeopleSoft Teams with iPhrase to Deliver Leading Customer Interaction Technology for PeopleSoft Enterprise CRM 8.9

iPhrase Contact Classification Server offers robust natural language functionality for multi-channel customer interactions. Unlike other solutions that rely on simple keyword matching or statistical classification, ICCS is able to understand the intent of highly dynamic and informal communications such as e-mail, chat sessions and free-form self-service queries. The product also has the unique ability to learn in real time from customer and agent interactions and increase its accuracy over time. These capabilities can reduce implementation and maintenance costs without sacrificing consistent, accurate responses.

Friday, June 04, 2004

Email product sends strong message to investors

XtraMind has been creating a positive impression amongst customers using their innovative product XM-MailMinder but with private investors as well.

Logos MT Technology - still in play...

It was interesting to read that Logos Machine Translation Technology, after many adventures, gets a new chance. Apptek is a big technology provider to the government. No better place to get a second, third, nth chance... If this becomes true, that would be another proof that this company really "walks on water."

Eisenach, May 4, 2004 – The technology corporation GlobalWare AG has sold a comprehensive developer license for the LOGOS technology to Net2Voice amounting to 2.65 million Euros. The company, headquartered in McLean, Virginia, develops and markets multilingual speech based software solutions for multi-media, internet and telephone applications in addition to translation technologies and speech recognition modules based on transfer systems.

Tuesday, June 01, 2004

Language Weaver and P.H. Brink International Sign Beta Agreement to Develop Domain-Specific Translations

P.H. Brink to License Software for Usage in Its Automated Workflow System.

Language Weaver and P.H. Brink International Sign Beta Agreement to Develop Domain-Specific Translations

Language Weaver and P.H. Brink International Sign Beta Agreement to Develop Domain-Specific Translations; P.H. Brink to License Software for Usage in Its Automated Workflow System. This is an interesting development. All the "old" MT companies have tried this approach to try to make money - unfortunately, many of them did not succeed. I am surprised LW is following this proven loser approach. Are they becoming just another MT company?

Friday, May 28, 2004

A visual search engine for information exploration

It would be really great if .:. Gurusoft .:. makes it.

It would make all of us, abridged from Abridge, feel vindicated. It's great that so many of our ideas and prototypes of early 2001 are surfacing in one shape or another all over the world...

Monday, May 24, 2004

EasyAsk Listed in 'Visionary' Quadrant of Gartner's 2004 Magic Quadrant for Enterprise Search

EasyAsk's Enterprise 9 enables enterprises to quickly, easily and comprehensively search for critical business information in structured and unstructured data, regardless of format or location, and be rewarded with only relevant results. Enterprise 9 users search or browse for relevant information using keywords, phrases, natural language questions, common terms and/or business-specific nomenclature.

BlueLithium Unveils BlueTheory, First One-to-One Online Advertising Platform Connecting Web Users with Relevant Ads

BlueLithium's BlueTheory uses natural language processing to understand Web content and sends an ad that matches the relevancy of the content. The publisher receives phenomenal returns for their content, the advertiser reaches new customers, and the consumer appreciates the relevant advertising without compromising privacy or the overall Internet experience.

Scientific American: Talking to Bill Gates

Scientific American: Talking to Bill Gates - on artificial intelligence, computer sciences education and more (click here for the full interview in PDF)

Sunday, May 23, 2004

Explaining business concepts using NLP

Now, this was a surprise: usually we go through hoops and contortions to explain NLP concepts and use any possible analogy to make them clearer. Ephraim Schwartz at InfoWorld is doing just the opposite: he is using NLP concepts to explain supply chain by numbers:

From a common-sense point of view, the way statistical analysis works seems illogical. The more complex a data set becomes, for instance, the easier it is to make predictions.

For example, take NLU (natural language understanding) research. If I begin with only the word “the,” a computer program would have far less than a 1 percent chance of predicting the remainder of the sentence. However, if I add the word “day” to follow “the,” making the sentence more complex, the likelihood of guessing the third word might be 50 percent or better. The word “day” is singular and “the day” will require a verb.

Does this mean natural language processing is becoming more mainstream?

Simplifying a complex art

At the three-day-long AAACL Symposium in Montclair, NJ, I enjoyed a very nicely done presentation titled "Where have we been and where are we going" in Natural Language Processing (including speech, corpus linguistics, machine translation) from Ken Church (now a Senior Researcher at Microsoft).

While reading the article linked in the title, I agree: there are so many concepts and ideas in our field too that need to be translated to everyday language. Ken, by the way, even though he was speaking to a group of experts in the field, kept it very simple and informative.

Friday, May 21, 2004

MSN *bots ready for the testers

MSN will launch Newsbot a search tool that aggregates news from over 4000 sources worldwide and Blogbot a tool that produces relevant search results from blogs by the end of this year. Answerbot, which will feature a natural language interface for displaying search results, will arrive in the next wave following Newsbot and Blogbot.

Wednesday, May 19, 2004

The Fight Against Spam

The Fight Against Spam, by François Joseph de Kermadec -- In last week's Part 1 of this series, François Joseph de Kermadec showed you how to build the foundation for spam-fighting strategies. In Part 2, he fine-tunes this approach and digs deeper into

Even though the title is about spam, it contains a nice, non-geek introduction to Vector Space and Latent Semantic Analysis clustering.

Tuesday, May 18, 2004

LREC-2004, Lisbon, Portugal

4th INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC) "In Memory of Antonio Zampolli" expects more than 1000 participants in Lisbon.

Thursday, May 13, 2004

EZArchive Launches Advanced Search and Retrieval to Optimize Personal Media Asset Management

EZArchive’s search and retrieval capabilities will incorporate both keyword and natural language search to provide the most sophisticated means for users to quickly and easily locate digital media.

Friday, May 07, 2004

InQuira Doubles Deployments, Releases New Version

InQuira software enables customers to get information via the Web, instant messaging or e-mail. Using semantics and an industry-specific dictionary, the software searches across a variety of enterprise information repositories to quickly find the best information based on a customer query. They also apply this same technology to the call center, so that customer service representatives can handle more calls and resolve more issues faster.

Wednesday, May 05, 2004

Single click generates lists to end all lists

KnowItAll, a search engine under development at the University of Washington, Seattle, trawls the web for data and then collates it in the form of a list.

Friday, April 30, 2004

On top of the world: But for how long?

Rivals gearing up to do battle - Firmly ensconced as the Internet's most popular search service, and with a $2.7 billion public stock offering in its future, Google Inc. seems to be sitting on top of the world. But the world is round. It's easy to roll off. And there are several rich and canny competitors eager to give Google a push.

New UK centre for text mining may improve information management

The JISC, BBSRC AND EPSRC have announced funding of £1m to establish a National Centre for Text Mining. The remit of the Centre, the first publicly funded centre in the world, is to contribute to the associated national and international research agenda, to establish a service for the wider academic community, and to make connections with industry.

Thursday, April 29, 2004

Air Force acquires geographic search tool

The Air Force Office of Special Investigations has recently purchased a tool developed by MetaCarta Inc. of Cambridge, Mass., that organizes and presents geographic information on suspected and potential terrorists.

Thursday, April 22, 2004

Nstein Technologies acquires KMtechnologies

Nstein Technologies Inc. (TSX-V: EIN), an emerging leader in new Business Intelligence (BI) solutions, announced today that it has acquired KMtechnologies, a Canadian software developer of web-based collaboration solutions as well as document, knowledge and customer-relations management solutions.

Wednesday, April 21, 2004

Language Weaver Honored As Company Of The Year

Language Weaver, Inc., a software company developing statistical machine translation software (SMTS), today announced it has been chosen as Company of the Year in the start-up category by the attendees at The Venture Forum produced by Larta Institute.

Friday, March 12, 2004

Research on English and Foreign Language Exploitation - REFLEX

The Department of the Interior, National Business Center (NBC), Acquisition and Property Management Division, Southwest Branch, Fort Huachuca AZ, acting as the contracting agent for the Intelligence Technology Innovation Center (ITIC) and other Intelligence Community, Defense, and Homeland Security agencies, invites the submission of research proposals for developing technology that partially or fully automates aspects of the process of acquiring information from text documents and other natural language-based sources. Government information analysts need dramatic improvements over today's capabilities in order to be able to accomplish their analytic tasks in the future.

Thursday, March 04, 2004

Robo-talk helps pocket translator

Small robots with friendly faces have helped out in the development of handheld translation gadgets to be tried out by travellers in Japan.

Monday, March 01, 2004

Translation in the Age of Terror

A new U.S. government center will connect linguists on the front lines of the war against terror with translation assistance technologies that can digitize, parse, and digest raw intelligence material.

Tuesday, February 24, 2004

"Chatting" in Iraq

A “coalition chat line” now being used at several U.S. and allied sites around Iraq enables commanders and operators who speak different languages to communicate rapidly and reliably, using the “instant messaging” practices familiar to millions of teenagers.

Language Weaver Offers New Language Translation Module For Simplified Chinese

Language Weaver, a software company developing statistical machine translation software (SMTS), today announces the commercial availability of a Simplified Chinese to English language pair module for its automated translation product. The company also announces that Franz Och, world renowned University of Southern California Information Sciences Institute (USC/ISI) computer scientist and researcher, has joined the company as a consultant and is guiding the development of the Chinese system, among other tasks.

Thursday, February 12, 2004

Creating a Digital Aristotle: A Computerized Knowledge System for Scientists and Students

Project Halo, a staged, long-term research and development initiative that aims to develop a “Digital Aristotle” — an application capable of answering novel questions and solving advanced problems in a wide range of scientific disciplines.

Wednesday, February 04, 2004

An Opportunity for New NSF Funding for Linguistics

Terascale Linguistics presentations from the January 8th 2004, Boston, LSA, Town-Hall-style Meeting.

Thursday, January 15, 2004

Language tools for fight on terror

Software to allow security officials to better search and translate documents in foreign languages, especially Arabic, has been demonstrated at a technology show in Las Vegas.

Sunday, January 04, 2004

A Fountain of Knowledge: 2004 will be the year of the analysis engine

The great strength of computers is that they can reliably manipulate vast amounts of data very quickly. Their great weakness is that they don’t have a clue as to what any of that data actually means. Computer scientists have been laboring for decades to eliminate that weakness, with some limited successes in some limited domains. Now, IBM Corp. appears to have made a major breakthrough in the field of machine understanding. [WebFountain's site]