Haitian Creole is now a machine translation staple

In response to the earthquake in Haiti, Microsoft quickly expanded its machine translation engine to include Haitian Creole.

Today I noticed that Google has an alpha version of its Haitian Creole engine as well.

Though it’s sad that it took a natural disaster to spur attention to a particular language, I’m glad to see the language available.

It’s hard to underestimate the importance of readily accessible machine translation. Just as search engines help us better understand the world, machine translation engines help us better understand one another.

And, yes, they’re far from perfect. But they’re far better than nothing at all. And they are finding their way into countless applications and countless fixed and mobile devices, each additional language offering another glimpse into another world.

Google Translate: Now in 51 languages

In February of this year, Google Translate surpassed 40 languages.

Six months later, Google added ten more languages, a two-year growth trajectory illustrated below:

google_translate_languages

Google went from 13 languages to 51 languages in less than 16 months.

Not bad.

And, yes, I’m aware that we must not confuse quantity of translations with quality of translations. Your translation mileage will most certainly vary by language pair. Still, as language pairs go, Google is the only game in town across many.

Here are the 10 most recently added languages:

  • Albanian
  • Afrikaans
  • Belarusian
  • Icelandic
  • Irish
  • Macedonian
  • Malay
  • Swahili
  • Welsh
  • Yiddish

On a related noted, 41 of these languages are now incorporated into Google Docs.

Decyphering Google Translate on your web logs

Whenever I read this site’s web logs, I’m always fascinated by the number of referrals via Google Translate.

Every month there seems to be more of them, which could mean that the quality of Google Translate is improving, or this site is doing better in the rankings, or some combination of the two. Or, it could be simply be that more people have discovered Google Translate.

Given my passion for country codes, it’s fair to say that I also enjoy language codes. And it is through language codes that you can figure out what languages users were translating your site “from” and “to.”

Here is one referral string from my site:

google_translate

First, you can see that the person was using Google Korea, so it’s fair to say the person was translating from English into Korean. The “To” line is actually the blog title post translated into Korean.

That was an easy one.

This next one is a bit more challenging:

google_translate2

This person was using Google.com, so you have to focus on the language codes. There are two here — an “id” (which follows  “hl=”) and an “en” (which follows “sl=”). What this means is the person was translating from English into Indonesian (Bahasa Indonesia).

Here is what the translated page looks like:

google_translate2a

The quick and easy way to know the target language is to focus on the “hl=” string. In the screen shot below, the target language is German.

google_translate3

And here is a language code reference if you want to study your web logs.

What I want to know is what percentage of web traffic is taken up by Google Translate. Anyone care to share their Web log stats?

Based on my cursory analysis, I would estimate the figure to be between 5% and 10%, but that’s very rough.