The humans behind machine translation

Google Translate is the world’s most popular machine translation tool.

And, despite predictions by many experts in the translation industry, the quality of Google Translate has improved nicely over the past decade. Not so good that professional translators are in any danger of losing work, but good enough that many of these translators will use Google Translate to do a first pass on their translation jobs.

But even the best machine translation software can only go so far on its own. Eventually humans need to assist.

Google has historically been averse to any solution that required lots and lots of in-person human input — unless these humans could interact virtually with the software.

Behind Google’s machine translation software are humans.

In the early days of Google Translate, there were very few humans involved. The feature that identified languages based on a small snippet of text was in fact developed by one employee as his 20% project.

Google Translate is a statistical machine translation engine, which means it relies on algorithms that digest millions of translated language pairs. These algorithms, over time, have greatly improved the quality of Google Translate.

But algorithms can only take machine translation so far.

Eventually humans must give these algorithms a little help.

Google Translate Community

So it’s worth mentioning that Google relies on “translate-a-thons”  to recruit people to help improve the quality.

According to Google, more than 100 of these events have been held resulting in addtion of more than 10 million words:

It’s made a huge difference. The quality of Bengali translations are now twice as good as they were before human review. While in Thailand, Google Translate learned more Thai in seven days with the help of volunteers than in all of 2014.

Of course, Google has long relied on a virtual community of users to help improve translation and search results. But actual in-person events is a relatively new level of outreach for the company — and I’m glad to see it.

This type of outreach will keep Google Translate on the forefront in the MT race.

If you want to get involved, join Google’s Translate Community.

Google Translate turns 80, as in languages

google_translate

From Afrikaans to Zulu, the evolution of Google Translate is one of Google’s greatest success stories that few people fully appreciate. Perhaps because Google is reluctant to release usage data (which I imagine is significant).

So Google now supports 80 languages, having added support for Somali, Zulu, three Nigerian languages (Igbo, Hausa,Yoruba), Mongolian, Nepali, and Punjabi.

I’m not a fan of Google+. I won’t be caught dead wearing Google Glass. But I’ll be the first to sing the praises of Google Translate and Google’s ongoing investment in languages.

Google long ago set the bar for what a “global” website or web app should support in terms of languages. It raised that bar to 40 languages a few years ago is now raising it again to 60. If Google Translate is any indicator, that bar will be raised again over the next decade.

To give you an idea of just how far Google Translate has come in the past eight years, here is a screen grab I took back in 2006:

google_translate_2006

It’s amusing to see Arabic, Japanese, Korean, and Chinese labeled as BETA languages.

And impressive to see that Google Translate has grown from roughly 10 languages to 80 languages in eight years.

PS: Google Translate is one of the reasons Google does so well in the annual Web Globalization Report Card. I’m nearly complete with the 2014 edition and, yes, Google is looking good again this year.

 

 

 

Web globalization predictions for 2014

Globe

I’m optimistic about the year ahead.

I base this optimism in part on discussions I’ve had this year with dozens of marketing and web teams across about ten countries. While every company has its own unique worldview and challenges, a number of patterns have emerged. And I can tell you that there is a great deal of enthusiasm for web globalization — backed by C-level investments.

And this enthusiasm is not simply driven by China any longer — which is a healthy thing to see. Executives have a more realistic and sober view of China, and this has resulted in smarter and longer-term planning and investments. That’s not to say China won’t continue to dominate the headlines in 2014, as it most certainly will. But companies are now taking a closer look at countries such as Thailand, Indonesia, Turkey, India, and much of the Middle East.

As I look ahead, here are a few other trends I see emerging in the year ahead:

  • Machine translation (MT) goes mainstream. I’ll have much more to say about this in future (you can subscribe to updates on the right) but suffice it to say, MT is not just for customer support anymore. Companies are looking to use MT as a competitive differentiator, and we’re going to see more real-world examples on customer-facing websites. And customers around the world will love it. (And, no, I’m suggesting that human translators are in any danger of losing their jobs; quite the opposite!)
  • Responsive global websites also go mainstream. True, there are valid reasons for NOT embracing responsive websites, but for most companies, this is a clear path forward. It helps manage the chaos internally and frees up resources for mobile apps — which are becoming, for some of us, more important than the website itself.
  • Language pullback. What? Companies are going to drop languages? That’s right. Some that I’ve spoken to already have dropped a language or two, and others are considering following along. I’m never a fan of dropping languages for budgetary reasons, as this is almost always a shortsighted decision, but it’s a fact of life as companies learn to align their language strategies with their budgets. In the end, pullbacks are far from ideal but probably a sign that companies are no longer making blind assumptions that adding languages will automatically increased sales (this isn’t always the case). So even this trend, while minor, is ultimately going to be a positive one.
  • Privacy becomes a selling point. The “NSA-gate” scandal is only just beginning to be felt around the world. And the threat to American-based tech companies is very real. I will not be surprised if Google or Microsoft announces non-US hosted services (to bypass the NSA’s grip and attempt to rebuild trust with consumers). And there are already a number of startups emerging in various countries promising to keep user data safe from the “evil” American intelligence agencies. You know this is a serious issue when Apple and Google and Microsoft (and other tech companies) all agree on something.
  • A non-Latin gTLD awakens American companies. I’ve long written about why I think the Internet is still broken for non-English speakers. But now that ICANN is moving ahead with delegation of generic TLDs, I believe that one (or more) of these domains will act as a wake-up call to those companies that have long overlooked them — and I’m including a number of Silicon Valley software companies as well. I don’t want to predict what domain I think it will be (they are all available for you to see) — let me know if you have a candidate.
  • Apple drops flags from its global gateway. True, this is not my first prediction along these lines. But do I think 2014 will be the year. And this will make my life a bit easier because I won’t have to respond to any more “But Apple is using flags so why can’t we” questions.

So what do you think about the year ahead?

If you have any predictions to share, please let me know.

 

 

 

Measuring translation quality: A Q&A with TAUS founder Jaap van der Meer

Every translation vendor offers the highest-quality translations.

Or so they say.

But how do you know for sure that one translation is better than another translation?

And, for that matter, how do you fairly benchmark machine translation engines?

TAUS has worked on this challenge for the past three years along with a diverse network of translation vendors and buyers, including Intel, Adobe, Google, Lionbridge, and Moravia (among many others).

They’ve developed something they call the Dynamic Quality Framework (DQF) and they took it live earlier this month with a website, knowledgebase and evaluation tools.

TAUS DQF

To learn more, I recently interviewed TAUS founder and director Jaap van der Meer.

Q: Why is a translation quality framework needed?
In 2009 and 2010 we did a number of workshops with large enterprises with the objective to better understand the changing landscape for translation and localization services. As part of these sessions we always do a SWOT analysis and consistently quality assurance and translation quality popped up on the negative side of the charts: as weaknesses and threats. All the enterprises we worked with mentioned that the lack of clarity on translation quality led to disputes, delays and extra costs in the localization process. Our members asked us to investigate this area further and to assess the possibilities for establishing a translation quality framework.

Q: You have an impressive list of co-creators. It seems that you’ve really built up momentum for this service. Were there any key drivers for this wave of interest and involvement?
Well, on top of the fact that translation quality was already not well defined ever since there is a translation industry, the challenges in the last few years have become so much greater because of the emergence of new content types and the increasing interest in technology and translation automation.

Q: What if the source content is poorly written (full of grammatical errors, passive voice, run-on sentences). How does the DQF take this into account?
We work with a user group that meets every two months and reviews new user requirements. Assessing source content quality has come up as a concern of course and we are studying now how to take this into account in the Dynamic Quality Framework.

Q: Do you have any early success stories to share of how this framework has helped companies improve quality or efficiency?
We have a regular user base now of some 100 companies. They use DQF primarily to get an objective assessment of the quality of their MT systems. Before they worked with BLEU scores only, which is really not very helpful in a practical environment and not a real measurement for the usability of translations. Also many companies work with review comments from linguists which tend to be subjective and biased.

Q: How can other companies take part? Do they need to be TAUS members?
Next month (December) we will start making the DQF tools and knowledge bases available for non-members. Users will then be able to sign up for just one month (to try it out) or for a year without becoming members of TAUS.

Q: The DQF can be applied not only to the more structure content used in documentation and knowledgebases but also marketing content. How do you measure quality when content must be liberally transcreated into the target language? And what value does the DQF offer for this type of scenario?
We have deliberately chosen the name “Dynamic” Quality Framework, because of the many variables that determine how to evaluate the quality. The type of content is one of the key variables indeed. An important component of the Dynamic Quality Framework is an online wizard to profile the user’s content and to decide – based on that content profile – which evaluation technique and tool to use. For marketing text this will be very different than for instructions for use.

Q: Do you see DQF having an impact on the creation of source content as well?
Yes, even today the adequacy and fluency evaluation tools – that are part of DQF – could already be applied to source content. But as we proceed working with our user group to add features and improve the platform we will ‘dynamically’ evolve to become more effective for source content quality evaluation as well.

Q: An argument against quality benchmarks is that they can be used to suck the life (or art) out of text (both source and translated text). What would you say in response to this?
No, I don’t think so. You must realize that DQF is not a mathematical approach to assessing quality and only counting errors (as most professionals in the industry have been doing for the longest time now with the old LISA QA model or derivatives thereof). For a nice and lively marketing text the DQF content profiler will likely recommend a ‘community feedback’ type of evaluation.

Q: Where do you see the DQF five years from now in terms of functionality?
Our main focus is now on integration and reporting. Next year we will provide the APIs that allow users to integrate DQF in their own editors and localization workflows. This will make it so much easier for a much larger group of users to add DQF to their day-to-day production environment. In our current release we provide many different reports for users, but what we like to do next year is allow users to define their own reports and views of the data in a personalized dashboard.

TAUS Link

Looking for a translation icon?

If you haven’t visited the Noun Project yet, take a moment and drop by.

It’s a great initiative to provide open source icons. All you have to do is provide attribution according to the Creative Commons license.

I noticed recently the addition of a translations icon.

I believe Microsoft was the first company to develop a translations icon along these lines, which was used as part of Microsoft Office.

Here’s an icon currently in use on the Bing Translator page:

Google quickly followed along with its Google Translate icon, shown here:

(Contact me if there is another company that is using a variation of this translation icon.)

To be clear, I would NOT use this icon as part of a global gateway.

This icon is not about finding localized content — it’s about getting content translated (usually via machine translation).

For the global gateway, I recommend this open source icon:

For more on global gateways, check out The Art of the Global Gateway.

UPDATE: Here’s the machine translation icon used by Yamagata Europe: