Measuring translation quality: A Q&A with TAUS founder Jaap van der Meer

Every translation vendor offers the highest-quality translations.

Or so they say.

But how do you know for sure that one translation is better than another translation?

And, for that matter, how do you fairly benchmark machine translation engines?

TAUS has worked on this challenge for the past three years along with a diverse network of translation vendors and buyers, including Intel, Adobe, Google, Lionbridge, and Moravia (among many others).

They’ve developed something they call the Dynamic Quality Framework (DQF) and they took it live earlier this month with a website, knowledgebase and evaluation tools.


To learn more, I recently interviewed TAUS founder and director Jaap van der Meer.

Q: Why is a translation quality framework needed?
In 2009 and 2010 we did a number of workshops with large enterprises with the objective to better understand the changing landscape for translation and localization services. As part of these sessions we always do a SWOT analysis and consistently quality assurance and translation quality popped up on the negative side of the charts: as weaknesses and threats. All the enterprises we worked with mentioned that the lack of clarity on translation quality led to disputes, delays and extra costs in the localization process. Our members asked us to investigate this area further and to assess the possibilities for establishing a translation quality framework.

Q: You have an impressive list of co-creators. It seems that you’ve really built up momentum for this service. Were there any key drivers for this wave of interest and involvement?
Well, on top of the fact that translation quality was already not well defined ever since there is a translation industry, the challenges in the last few years have become so much greater because of the emergence of new content types and the increasing interest in technology and translation automation.

Q: What if the source content is poorly written (full of grammatical errors, passive voice, run-on sentences). How does the DQF take this into account?
We work with a user group that meets every two months and reviews new user requirements. Assessing source content quality has come up as a concern of course and we are studying now how to take this into account in the Dynamic Quality Framework.

Q: Do you have any early success stories to share of how this framework has helped companies improve quality or efficiency?
We have a regular user base now of some 100 companies. They use DQF primarily to get an objective assessment of the quality of their MT systems. Before they worked with BLEU scores only, which is really not very helpful in a practical environment and not a real measurement for the usability of translations. Also many companies work with review comments from linguists which tend to be subjective and biased.

Q: How can other companies take part? Do they need to be TAUS members?
Next month (December) we will start making the DQF tools and knowledge bases available for non-members. Users will then be able to sign up for just one month (to try it out) or for a year without becoming members of TAUS.

Q: The DQF can be applied not only to the more structure content used in documentation and knowledgebases but also marketing content. How do you measure quality when content must be liberally transcreated into the target language? And what value does the DQF offer for this type of scenario?
We have deliberately chosen the name “Dynamic” Quality Framework, because of the many variables that determine how to evaluate the quality. The type of content is one of the key variables indeed. An important component of the Dynamic Quality Framework is an online wizard to profile the user’s content and to decide – based on that content profile – which evaluation technique and tool to use. For marketing text this will be very different than for instructions for use.

Q: Do you see DQF having an impact on the creation of source content as well?
Yes, even today the adequacy and fluency evaluation tools – that are part of DQF – could already be applied to source content. But as we proceed working with our user group to add features and improve the platform we will ‘dynamically’ evolve to become more effective for source content quality evaluation as well.

Q: An argument against quality benchmarks is that they can be used to suck the life (or art) out of text (both source and translated text). What would you say in response to this?
No, I don’t think so. You must realize that DQF is not a mathematical approach to assessing quality and only counting errors (as most professionals in the industry have been doing for the longest time now with the old LISA QA model or derivatives thereof). For a nice and lively marketing text the DQF content profiler will likely recommend a ‘community feedback’ type of evaluation.

Q: Where do you see the DQF five years from now in terms of functionality?
Our main focus is now on integration and reporting. Next year we will provide the APIs that allow users to integrate DQF in their own editors and localization workflows. This will make it so much easier for a much larger group of users to add DQF to their day-to-day production environment. In our current release we provide many different reports for users, but what we like to do next year is allow users to define their own reports and views of the data in a personalized dashboard.


Gabble On: Using machine translation to learn a language

Ethan Shen, who has become quite an expert on the various machine translation (MT) engines, has launched a nifty web service designed to help you improve your language skills: Gabble On.

Basically, the site leverages an MT engine (Google, Bing, Systran) to display a news article in the target language.

It’s still a work in progress, but I like the way it displays source and target sentences side by side so you can follow along sentence by sentence.

I think the site has the greatest potential for teaching vocabulary.

Ethan welcomes input so give it a test drive and tell him what you think!


Is Google the best machine translation engine? It depends…

Two weeks ago, I introduced Ethan Shen and his project to analyze the three major free machine translation (MT) engines — Google, Microsoft, and Yahoo! Babelfish — by relying on translator reviews.

Ethan has provided me with a mid-point summary of results, which I’ve included below. I was surprised to find that Microsoft and Babelfish are beating Google on some languages pairs, as well as on shorter text strings. Although Google is emerging the overall winner — and receiving some much-deserved attention from the media — it’s nice to see some healthy competition.

That said, quality is only one piece of the puzzle. The other piece — perhaps much more important — is usability. Now that Google has embedded its MT engine into Gmail and Reader — and now its Chrome client –I find I’m using Google exclusively as my MT engine.

Here are Ethan’s findings so far (emphasis mine):

At the highest level, it appears that survey participants prefer Google Translate’s results across the board.

In a few languages (Arabic, Polish, Dutch) the preference is overwhelming with votes for Google doubling its nearest competitor

However, once you remove voters that have self defined their fluency in the source or target language as “limited,” the contest becomes closer along some of the heavily trafficked languages. For example:

  • Microsoft Bing Translator leads in German
  • Yahoo! Babelfish leads in Chinese
  • Google maintains its lead in Spanish, Japanese, and French

Observing only the self-defined “limited fluency” voter reveals a strong brand bias. If your fluency in the target translation language is limited, it would stand to reason your ability to assess the quality of the translation is very limited. And yet…

  • Limited-fluency voters chose Google over Bing by 2 to 1
  • They also chose Google over Yahoo! Babelfish by 5 to 1

As I had guessed, Yahoo! and Microsoft’s hybrid rules-based MT model performed better on shorter text passages

For phrases below 50 characters, Google’s lead in Spanish, Japanese, and French disappear. And Microsoft’s lead in German widens.

Beyond 50 characters, Google’s relative performance seems to improve across the board.

For passages that are only one sentence, the same effect is seen, though to a lesser extent than under 50 characters.

On March 4th, we made a few changes to our survey – hiding the brands and randomizing the positions of the text results before voting.  Since then, we have not yet collected enough data to draw conclusions, but Babelfish seems to be receiving the biggest boost, perhaps showing the effects of the recent neglect of that tool.

Clearly, Ethan needs more data to arrive at more concrete conclusions. If you’re a translator and you want to lend a hand, here is the voting site.

PS: Here’s an interview with Google’s MT guru Franz Josef Och.

China to overtake US in Web users … next month

According to the China Internet Network Information Center (CNNIC), China is poised to overtake the US in Web users very, very shortly. Here is a news article.

CNNIC says that China now has 210 million Web users, an increase of more than 73 million over the past year. These are staggering growth figures and it’s safe to project that China will overtake the US, which is hovering around 215 million Web users, sometime late next month.

When it does, I’m sure the CNNIC will be the first to announce it.

Now the race is on for companies to localize their Web sites for these 210 million Web users, like Starwood, which is ahead of many of its competitors.

What more can you say? When it comes to potential Web users, China has a lot more headroom than the US. What we’re seeing transpire was inevitable, though the timing is much faster than many of us (me included) would have predicted.

And India could one day surpass China’s numbers.

PS: I visited the CNNIC Web site to read the press release, but they hadn’t translated any English-language content since last year. So I went to the source, the Chinese-language site, and used Google Translate to read their press release. Check it out for yourself — and I must say that the quality of Google’s machine translations since it started using its own statistical machine translation software has improved considerably.

Machine translation gets specialized

To follow up on my last post about the transformation of the translation industry, I was just sent a press release from a maker of statistical machine translation, Language Weaver, regarding a new product they and across Systems will be releasing in 2008.

The product is software designed specifically for German/English translation of content that falls within the mechanical engineering, construction, automotive manufacturing, and plant construction industries.I know; it’s a mouthful.

But the gist of it is this: It’s machine translation and workflow software designed specifically for an industry.

The quality of statistical machine translation (SMT) is easy to poke fun at when you rely on the mass-market free translation engines. But once you begin to optimize the SMT engine for a specific industry, where context and terminology have been narrowed considerably, the quality suddenly gets respectable.

Statistical machine translation software and translation workflow software are about as intimidating as software can get.

Traditionally, companies have had to exert a great deal of time and energy to customize machine translation and translation workflow products to their specific industries. But I think what Language Weaver and across are doing is a sign of things to come.

And this is great news for customers because it will allow them to see the benefit of this software sooner rather than later.

Says the release:

“New digital content continues to flood the manufacturing industries. With this package, we deliver significant productivity right out of the box, already customized and integrated with a translation management system — the hard work of aligning data for the machine translation already has been done for manufacturing companies of all sizes,” says Kirti Vashee, vice president of sales and marketing for Language Weaver. “So it gets easier and faster for translators to access appropriate translations for manufacturing industry phrases and sentences, which have a technical language orientation. That helps companies to bring products to market faster and provides the functionality for translation of content that has never before been translated.”