Baidu Beating Google in China

The China Internet Network Information Center (CNNIC) just published a survey of the Chinese search engine market.

china_search_cnnic.jpg

Baidu comes out on top in market share. But Google is more popular among high-end users. And Baidu relies heavily on MP3 search to drive traffic, primarily among the younger generation.

But I found this one question to be particularly worrisome for the folks at Google:

For users who never used search 6 months ago, now use as their only or primary search engine:
– Baidu 48.2%
– Sohu 19.6%
– Google 12.5%
– Sina 7.1%

Google does have a 2.6% share of Baidu, but it’s going to have to put a good portion of its newly raised $4bn to work to get the rest.

You can download the full survey report at this link.

Will Google Kill the Translation Industry?

Last week I received a “Factory Tour” invite from Google but didn’t give it much thought. I wish I had because I missed a preview of the company’s ambitious machine translation (MT) efforts.

Thankfully, Philipp Lenssen includes a great recap of the Webcast at this site: Google Blogoscoped. It’s worth a read.

Apparently Google is taking massive libraries of source and target text and dumping them into a database where the relationships between source and target text are analyzed and memorized. This database is then leveraged to translate new source text. Philipp explains it better than I…

This is the Rosetta Stone approach of translation. Let’s take a simple example: if a book is titled “Thus Spoke Zarathustra” in English, and the German title is “Also sprach Zarathustra”, the system can begin to understand that “thus spoke” can be translated with “also sprach”. (This approach would even work for metaphors surely, Google researchers will take the longest available phrase which has high statistical matches across different works.) All it needs is someone to feed the system the two books and to teach it the two are translations from language A to language B, and the translator can create what Franz Och called a “language model.” I suspect it’s crucial that the body of text is immensely large, or else the system in its task of translating would stumble upon too many unlearned phrases. Google used the United Nations Documents to train their machine, and all in fed 200 billion words. This is brute force AI, if you want —  it works on statistical learning theory only and has not much real “understanding” of anything but patterns.

This sure is brute force MT. I’ll be very interested to know just how long a string a text Google can effectively translate. More important, how will Google handle the flood of brand names, oddball terms, and local slang?

But let’s just assume that Google does make this ambitious project a success; how will this affect the translation industry in general and Web globalization in particular?

Assuming this all does work moderately well, companies will be incented to pull all text out of graphics to make the most of this free translation service. After all, if Google is providing users in Vietnam a free translation of your company’s Web site, why not do what you can to make everything translatable.

This would also be yet another blow to Macromedia Flash, not that the emergence of AJAX isn’t doing enough damage.

But what about the impact on translation vendors? i don’t think they have much to worry about, yet. The need for high-quality, human-edited translation isn’t going away anytime soon. Long term, however, all bets are off. Google should be on every translation vendor’s radar; this company has lots of money, lots of smarts, and lots of incentive to provide the world’s text in all the world’s languages.

Is Google Losing Its Global Edge?

Suw Charman says “I still think there’s a fundamental mental block regarding the rest of the world from a lot of American companies and developers.” She points out the new Google Maps application and its glaring lack of any country besides the US of A.

I was thinking the same thing myself the other day when I first tried it out.

escondido.jpg

First, I looked up my home, as I imagine most people do, then I scrolled west and west and west, thinking “Shouldn’t Japan be coming up pretty soon?” But it didn’t, just lots of blue water…
maps_google.jpg

In Google’s defense, I’ll assume this is another one of their “beta” projects. It is pretty nifty and I do hope other countries are forthcoming,