Saturday, November 29, 2008

Official Google Blog: Our international approach to search

Official Google Blog: Our international approach to search: "... improving Google's international search. This is a tough challenge, since Google search is used in many countries and languages where our engineers have little personal knowledge. Initially, the international search improvements were done by Search Quality engineers who were passionate about their languages and countries: Lina from Sweden improved our parsing of compound words in German and Swedish; Dimitra from Greece introduced diacritical support; Ishai from Israel worked on transliteration corrections for Hebrew and Arabic; Trystan from Australia created methods for identifying local search results and ranking them together with foreign ones from the same language; Alex, a bilingual Ukrainian and Russian, introduced morphological understanding of these languages. As the importance of our international search grew, we solicited help from Googlers in all our offices. Finally, we are leveraging an international network of search specialists who help us understand search within the unique combination of their language and country."

The post is long and there are many improvements presented, however, after running my old tests for Albanian, I see that diacritics still cannot play the discriminating role they should. One of the top pages when searching for 'të' is the page for Tellurium. As suggested in a previous post here 'të' should be entered surrounded by quotes - it's the only way to be assured that the result will contain this exact string. One of the pages returned, containing as many 'ë' as Albanian was in Mixe (Ayuk) - a language spoken in Totontepec, Oaxaca, Mexico.