Posted on

Google and Wikipedia: Partners in (multilingual) content

In 2008, Google launched a project called Knol. Remember it?

It was designed to replace Wikipedia.

Google apparently wasn’t happy that so many of its visitors were quickly abandoning it for this nonprofit wealth of information.

But Knol died a silent death in 2012 and Wikipedia is, fortunately, still very much alive and well.

In fact, Google.org just donated $2 million to Wikipedia, a relatively small but generous donation to what I believe to be one of the greatest achievements of this internet.

Google wants Wikipedia to invest more heavily in fostering more languages. Wikipedia has already been doing this with projects such as the Project Tiger for Indic languages.

In addition to the donation, Google and Wikipedia are expanding Project Tiger, an initiative to expand the content on Wikipedia into additional languages. The pilot program has already increased the amount of locally relevant content in 12 Indic languages. With the expansion, the goal is to include ten more languages.

Techcrunch

And while Google’s investment in this project sounds altruistic, Google has plans for all that new content across in all those new languages.

Google has a 109-language machine translation engine — and the way this engine improves translation quality is by devouring well-translated source and target language content. And when it comes to languages such as Luganda and Māori, Google needs a great deal more content.

That’s where Wikipedia comes in.

Nevertheless, I’m glad to see Google putting money where it belongs. I visit Wikipedia at least a dozen times a week and the internet would be a sadder place without it.

PS: Wikipedia emerged on top of the 2018 Web Globalization Report Card — and is looking very good for the next report (now underway).

Posted on

A More Responsive Google Translate

Google Translate is the world’s most popular translation tool. The company says it now translates 30 trillion sentences a year across 103 languages.

The key data point here is the 103 languages. No other free translation tool comes close to this range of languages. And while the quality across the lesser-used languages is quite uneven, to put it kindly, Google Translate is still the only game in town. Which means some translation is far better, even poor quality, than none at all.

Last week, Google Translate debuted an upgraded design that is now fully responsive. Here is the new interface:

Google Translate: November 2018

And the previous interface:

Google Translate: July 2018

The functionality remains the same, but I appreciate the more prominent “detect language” selector on the text input side. Google pioneered browser-based language detection a decade ago and it is wise to call attention to this powerful feature. Many users assume that they must need to know the source language before taking advantage of Google Translate.

Also nice to see is increased default sizes of the target languages on this menu:

Target language menu

One recommendation I would make — adding a generic globe icon above this menu of languages. Perhaps the downward arrow is sufficient, but I would rather use an icon that speaks across all languages.

Now, for those of you wondering about this list of languages — as in Why are they all in English? — you ask a great question. As I note in my book and reports, you want your global gateway to be globally agnostic, so each language should be presented in its native language. But that’s the rule for a global gateway. What we have here is not a global gateway, but a localized user interface — localized into English.

If I change my web browser setting to Spanish, I will be greeted with this interface:

Spanish-language interface: Target language menu

Google Translate morphs into Google Traductor

Posted on

The humans behind machine translation

Google Translate is the world’s most popular machine translation tool.

And, despite predictions by many experts in the translation industry, the quality of Google Translate has improved nicely over the past decade. Not so good that professional translators are in any danger of losing work, but good enough that many of these translators will use Google Translate to do a first pass on their translation jobs.

But even the best machine translation software can only go so far on its own. Eventually humans need to assist.

Google has historically been averse to any solution that required lots and lots of in-person human input — unless these humans could interact virtually with the software.

Behind Google’s machine translation software are humans.

In the early days of Google Translate, there were very few humans involved. The feature that identified languages based on a small snippet of text was in fact developed by one employee as his 20% project.

Google Translate is a statistical machine translation engine, which means it relies on algorithms that digest millions of translated language pairs. These algorithms, over time, have greatly improved the quality of Google Translate.

But algorithms can only take machine translation so far.

Eventually humans must give these algorithms a little help.

Google Translate Community

So it’s worth mentioning that Google relies on “translate-a-thons”  to recruit people to help improve the quality.

According to Google, more than 100 of these events have been held resulting in addtion of more than 10 million words:

It’s made a huge difference. The quality of Bengali translations are now twice as good as they were before human review. While in Thailand, Google Translate learned more Thai in seven days with the help of volunteers than in all of 2014.

Of course, Google has long relied on a virtual community of users to help improve translation and search results. But actual in-person events is a relatively new level of outreach for the company — and I’m glad to see it.

This type of outreach will keep Google Translate on the forefront in the MT race.

If you want to get involved, join Google’s Translate Community.

Posted on

Web globalization predictions for 2014

Globe

I’m optimistic about the year ahead.

I base this optimism in part on discussions I’ve had this year with dozens of marketing and web teams across about ten countries. While every company has its own unique worldview and challenges, a number of patterns have emerged. And I can tell you that there is a great deal of enthusiasm for web globalization — backed by C-level investments.

And this enthusiasm is not simply driven by China any longer — which is a healthy thing to see. Executives have a more realistic and sober view of China, and this has resulted in smarter and longer-term planning and investments. That’s not to say China won’t continue to dominate the headlines in 2014, as it most certainly will. But companies are now taking a closer look at countries such as Thailand, Indonesia, Turkey, India, and much of the Middle East.

As I look ahead, here are a few other trends I see emerging in the year ahead:

  • Machine translation (MT) goes mainstream. I’ll have much more to say about this in future (you can subscribe to updates on the right) but suffice it to say, MT is not just for customer support anymore. Companies are looking to use MT as a competitive differentiator, and we’re going to see more real-world examples on customer-facing websites. And customers around the world will love it. (And, no, I’m suggesting that human translators are in any danger of losing their jobs; quite the opposite!)
  • Responsive global websites also go mainstream. True, there are valid reasons for NOT embracing responsive websites, but for most companies, this is a clear path forward. It helps manage the chaos internally and frees up resources for mobile apps — which are becoming, for some of us, more important than the website itself.
  • Language pullback. What? Companies are going to drop languages? That’s right. Some that I’ve spoken to already have dropped a language or two, and others are considering following along. I’m never a fan of dropping languages for budgetary reasons, as this is almost always a shortsighted decision, but it’s a fact of life as companies learn to align their language strategies with their budgets. In the end, pullbacks are far from ideal but probably a sign that companies are no longer making blind assumptions that adding languages will automatically increased sales (this isn’t always the case). So even this trend, while minor, is ultimately going to be a positive one.
  • Privacy becomes a selling point. The “NSA-gate” scandal is only just beginning to be felt around the world. And the threat to American-based tech companies is very real. I will not be surprised if Google or Microsoft announces non-US hosted services (to bypass the NSA’s grip and attempt to rebuild trust with consumers). And there are already a number of startups emerging in various countries promising to keep user data safe from the “evil” American intelligence agencies. You know this is a serious issue when Apple and Google and Microsoft (and other tech companies) all agree on something.
  • A non-Latin gTLD awakens American companies. I’ve long written about why I think the Internet is still broken for non-English speakers. But now that ICANN is moving ahead with delegation of generic TLDs, I believe that one (or more) of these domains will act as a wake-up call to those companies that have long overlooked them — and I’m including a number of Silicon Valley software companies as well. I don’t want to predict what domain I think it will be (they are all available for you to see) — let me know if you have a candidate.
  • Apple drops flags from its global gateway. True, this is not my first prediction along these lines. But do I think 2014 will be the year. And this will make my life a bit easier because I won’t have to respond to any more “But Apple is using flags so why can’t we” questions.

So what do you think about the year ahead?

If you have any predictions to share, please let me know.