Google Translate: Ten Years Later

translate

I remember when Google Translate went live. Hard to believe it was 10 years ago.

I remember thinking that this relatively new technology, known as Statistical Machine Translation (SMT), was going to change everything.

At the time, many within the translation community were dismissive of Google Translate. Some viewed it as a passing phase. Very few people said that machine translation would ever amount to much more than a novelty.

But I wasn’t convinced that this was a novelty. As I wrote in 2007 I believed that the technologists were taking over the translation industry:

SMT is not by itself going to disrupt the translation industry. But SMT, along with early adopter clients (by way of the Translation Automation Users Society), and the efforts of Google, are likely to change this industry in ways we can’t fully grasp right now.

Here’s a screen grab of Google Translate from 2006, back when Chinese, Japanese, Korean and Arabic were still in BETA:

google_translate_May2006

Growth in languages came in spurts, but roughly at a pace of 10 languages per year.

google_translate_growth

And here is a screen grab today:

google_translate_May2016

 

Google Translate has some impressive accomplishments to celebrate:

  • 103 languages supported
  • 100 billion words translated per day
  • 500 million users around the world
  • Most common translations are between English and Spanish, Arabic, Russian, Portuguese and Indonesian
  • Brazilians are the heaviest users of Google Translate
  • 3.5 million people have made 90 million contributions through the Google Translate Community

 

The success of Google Translate illustrates that we will readily accept poor to average translations versus no translations at all. 

To be clear, I’m not advocating that companies use machine translation exclusively. Machine translation can go from utilitarian to ugly when it comes to asking someone to purchase something. If anything, machine translation has shown to millions of people just how valuable professional translators truly are. 

But professional translators simply cannot translate 100 billion words per day.

Many large companies now use machine translation, some translating several billion words per month.

Companies like Intel, Microsoft, Autodesk, and Adobe now offer consumer-facing machine translation engines. Many other companies are certain to follow.

Google’s investment in languages and machine translation has been a key ingredient to its consistent position as the best global website according to the annual Report Card.

Google Translate has taken translation “to the people.” It has opened doors and eyes and raised language expectations around the world.

I’m looking forward to the next 10 years.

Q&A with SYSTRAN about its new cloud-based machine translation platform

It has been a decade since Google Translate took machine translation to the masses — a topic for a future post.

But most companies will not be using Google Translate anytime soon to power their machine translation efforts. They want more control over customizing the engine, leveraging existing translation memories, and other capabilities that Google doesn’t yet offer. So they turn to vendors such as Microsoft, SDL, and SYSTRAN, a company that pioneered machine translation decades ago.

SYSTRAN was acquired by a Korean machine translation company in 2014 and earlier this year launched an online machine translation platform called SYSTRAN.io. This platform allows companies to leverage machine translation (and other services) via API. In other words, you don’t have to purchase an expensive enterprise license or host any software — you just connect your software to SYSTRAN’s engine. And, perhaps best of all, SYSTRAN has allowed anyone to take a free test drive of roughly a million characters of translation per month.

To learn more, here’s a Q&A I recently conducted with the company:

Screen Shot 2016-05-31 at 10.42.13 AM

What are the benefits/solutions that SYSTRAN.io provides?
SYSTRAN.io allows software developers, customer experience (CX) companies, multi-national marketing departments, social media and marketing technology companies, and online gaming developers to access the same software to develop multilingual applications that were once only available to large, international companies.

How many language pairs are supported?
There are up to 50 languages supported, depending on the particular module.

What is the most popular usage model (so far) for SYSTRAN.io?
In terms of volume of user queries:
So far, the most popular usage is for language translation on mobile devices.

In terms of numbers of solutions built:
Language translation within customer support forums is strong because companies and customer-support, software-as-a-service agencies can translate large numbers of documents in their FAQ knowledge base. This helps decrease call volume (the highest operational cost of customer support) and increase customer satisfaction scores because users can find their answers faster.

How do you leverage the platform to conduct “sentiment analysis” of user-generated content?
The number of available media (social media, review sites, blogs, support forums) as part of the user experience are growing everyday, companies are receiving unstructured commentary across these platforms and in many different languages. Developers using a combination of SYSTRAN.io’s modules will enable that content – across multiple languages – to be mined for information in any language, for positive or negative comments, and then can categorize those comments and generate responses in the language of the user. Imagine 50,000 comments, where 20,000 rank negative, but 500 are extremely negative and defacing, Those are the ones you want to reply to first.

For example, with the Olympics coming up, imagine a brand is sponsoring an athlete and he gets caught the night before the big race for using enhancement drugs or for cheating on his wife – it hits social media fast. How do you respond if you don’t know about it because the comments are in multiple languages? Or, on the opposite end of the spectrum, imagine many fans see an athlete wearing a particular shoe or clothing item and they want to know where they can get it – and they are asking on twitter. Right now, there are many eCommerce sites and marketing agencies that are “listening” for those tweets in multiple languages and selling to customers online. Systran.io can make it easier for developers to make apps that listen in multiple languages and then respond in those same languages.

See this page for more info: https://platform.systran.net

Explain “anonymization” as a feature of your service
Because of laws such as the safe harbor act, law firms, financial publishers, and many other multi-national firms are required to remove “personal information” such as names, address, and social security numbers from any information they send overseas to their counterparts at another office. In this scenario, companies need to remove this information or “anonomyze” it from the large data set. Send a different packet or code with the personal information and their team mates can receive and assemble the data safely.

How is SYSTRAN.io different than SYSTRAN’s Enterprise platform?
SYSTRAN.io is based on the same language translation and NLP technology that powers SYSTRAN’s enterprise offering used by Symantec, Cisco, Airbus, Ford, Toyota, BNP Paribas, Daimler, Barclays, defense and security organizations such as the U.S. intelligence community, NATO, Interpol and language service providers. It is equally robust, but the security responsibility falls on the developer of the particular application for anything beyond what is already built-in for the SYSTRAN.io aspect. The enterprise server, on the other hand, offers increased security as it can be installed behind an organization’s firewall. Also, the enterprise server offers 130 language pairs.

How does SYSTRAN.io compete against existing web service offerings from competitors?
We don’t know of any pure language technology companies that are offering free usage of multilingual development APIs to developers, do you? We’ve seen technology companies attempt to enter the language translation technology space but they do not have the content necessary to accomplish viable translations. Language translation technology is easy to talk about but extremely difficult to accomplish.

For SYSTRAN, this has been our 100 percent focus for nearly half a century. Now we are opening those decades of language translation content to developers. These databases have been contributed from linguistic and intelligence knowledge workers who have compiled learnings and optimizations from trillions of translations served dating back to our first client – the US Air Force in 1968 during the Cold War – to today. Our translation databases are deep and robust.

I believe you are offering a million characters of translation for free per month – is that correct? How long will this offer last?
That is correct. Once you sign up you have a million free characters (plus free usage levels for the other API’s) per month, every month; we want to encourage people to use our tools and not burden them with a cost at the development stage. The end date is open for now.

SYSTRAN.io

Intel: The best global enterprise technology website of 2016

For the 2016 Web Globalization Report Card, we studied 11 enterprise technology websites:

  • Autodesk
  • Cisco Systems
  • EMC
  • IBM
  • Huawei
  • Intel
  • Oracle
  • SAP
  • Texas Instruments
  • Xerox
  • VMware

With support for 23 languages, Intel is not the language leader in this category; Cisco Systems leads with 40 languages.

But Intel leads in other ways.

Such as global navigation. First and foremost, Intel has embraced country codes, such as:

  • www.intel.de
  • www.intel.co.jp
  • www.intel.cn

On the China home page, the global gateway is perfectly positioned in the header. Also, note the globe icon — which makes this global gateway easy to find no matter what language you speak:

intel_cn

Selecting the globe icon brings up this “universal” global gateway menu:

intel_gateway_2015

Universal means this menu can be used across all localized websites — because the locale names are presented in the local languages and scripts (for the markets in which they are supported). 

Unfortunately, on the mobile website the globe icon is demoted to the footer. Shown here is the Polish home page:

Intel Poland mobile

Intel supports strong global consistency across its many local websites. Depth of local content varies and there are gaps in support content across a number of languages.

But Intel is making smart use of machine translation  to allows users to self-translate content into their target language. Shown here an excerpt from the Brazil website.

Intel Brazil Machine Translation

The button near the top of the page is what users select to self-translate content. Too few companies are making use of machine translation currently.

One concern, looking ahead, is that the .com design has very recently demoted the global gateway icon to the footer.

Intel global gateway in the footer

Ironically, it is the .com website that most requires a global gateway in the header because more than half of all visitors to the .com website originate outside of the US.

For more information, check out the Web Globalization Report Card.

The humans behind machine translation

Google Translate is the world’s most popular machine translation tool.

And, despite predictions by many experts in the translation industry, the quality of Google Translate has improved nicely over the past decade. Not so good that professional translators are in any danger of losing work, but good enough that many of these translators will use Google Translate to do a first pass on their translation jobs.

But even the best machine translation software can only go so far on its own. Eventually humans need to assist.

Google has historically been averse to any solution that required lots and lots of in-person human input — unless these humans could interact virtually with the software.

Behind Google’s machine translation software are humans.

In the early days of Google Translate, there were very few humans involved. The feature that identified languages based on a small snippet of text was in fact developed by one employee as his 20% project.

Google Translate is a statistical machine translation engine, which means it relies on algorithms that digest millions of translated language pairs. These algorithms, over time, have greatly improved the quality of Google Translate.

But algorithms can only take machine translation so far.

Eventually humans must give these algorithms a little help.

Google Translate Community

So it’s worth mentioning that Google relies on “translate-a-thons”  to recruit people to help improve the quality.

According to Google, more than 100 of these events have been held resulting in addtion of more than 10 million words:

It’s made a huge difference. The quality of Bengali translations are now twice as good as they were before human review. While in Thailand, Google Translate learned more Thai in seven days with the help of volunteers than in all of 2014.

Of course, Google has long relied on a virtual community of users to help improve translation and search results. But actual in-person events is a relatively new level of outreach for the company — and I’m glad to see it.

This type of outreach will keep Google Translate on the forefront in the MT race.

If you want to get involved, join Google’s Translate Community.

Web globalization predictions for 2014

Globe

I’m optimistic about the year ahead.

I base this optimism in part on discussions I’ve had this year with dozens of marketing and web teams across about ten countries. While every company has its own unique worldview and challenges, a number of patterns have emerged. And I can tell you that there is a great deal of enthusiasm for web globalization — backed by C-level investments.

And this enthusiasm is not simply driven by China any longer — which is a healthy thing to see. Executives have a more realistic and sober view of China, and this has resulted in smarter and longer-term planning and investments. That’s not to say China won’t continue to dominate the headlines in 2014, as it most certainly will. But companies are now taking a closer look at countries such as Thailand, Indonesia, Turkey, India, and much of the Middle East.

As I look ahead, here are a few other trends I see emerging in the year ahead:

  • Machine translation (MT) goes mainstream. I’ll have much more to say about this in future (you can subscribe to updates on the right) but suffice it to say, MT is not just for customer support anymore. Companies are looking to use MT as a competitive differentiator, and we’re going to see more real-world examples on customer-facing websites. And customers around the world will love it. (And, no, I’m suggesting that human translators are in any danger of losing their jobs; quite the opposite!)
  • Responsive global websites also go mainstream. True, there are valid reasons for NOT embracing responsive websites, but for most companies, this is a clear path forward. It helps manage the chaos internally and frees up resources for mobile apps — which are becoming, for some of us, more important than the website itself.
  • Language pullback. What? Companies are going to drop languages? That’s right. Some that I’ve spoken to already have dropped a language or two, and others are considering following along. I’m never a fan of dropping languages for budgetary reasons, as this is almost always a shortsighted decision, but it’s a fact of life as companies learn to align their language strategies with their budgets. In the end, pullbacks are far from ideal but probably a sign that companies are no longer making blind assumptions that adding languages will automatically increased sales (this isn’t always the case). So even this trend, while minor, is ultimately going to be a positive one.
  • Privacy becomes a selling point. The “NSA-gate” scandal is only just beginning to be felt around the world. And the threat to American-based tech companies is very real. I will not be surprised if Google or Microsoft announces non-US hosted services (to bypass the NSA’s grip and attempt to rebuild trust with consumers). And there are already a number of startups emerging in various countries promising to keep user data safe from the “evil” American intelligence agencies. You know this is a serious issue when Apple and Google and Microsoft (and other tech companies) all agree on something.
  • A non-Latin gTLD awakens American companies. I’ve long written about why I think the Internet is still broken for non-English speakers. But now that ICANN is moving ahead with delegation of generic TLDs, I believe that one (or more) of these domains will act as a wake-up call to those companies that have long overlooked them — and I’m including a number of Silicon Valley software companies as well. I don’t want to predict what domain I think it will be (they are all available for you to see) — let me know if you have a candidate.
  • Apple drops flags from its global gateway. True, this is not my first prediction along these lines. But do I think 2014 will be the year. And this will make my life a bit easier because I won’t have to respond to any more “But Apple is using flags so why can’t we” questions.

So what do you think about the year ahead?

If you have any predictions to share, please let me know.