The fact that neural machine translation has the word “machine” in it should tell you that this technology has been evolving for a very long time, to back when computers were known as machines and were the size of Sprinter Vans.
The goal of leveraging the power of a computer to translate human speech has been with us from the very beginning, and yet the dream has remained elusive, until recently.
And we have neural machine translation (NMT) to thank for this.
Back in 2007 I wrote that the technologists were going to disrupt the translation industry:
While linguists focus on the “art” of translation, technologists focus on the “science” of translation. And that’s why we’re seeing the rebirth of machine translation as statistical machine translation (SMT). SMT brings the power of brute force computing to translation, to a degree that the pioneers of machine translation could have only imagined forty years ago.
And disrupt it has. First there was Google Translate which remains the world’s most popular translation engine.
Google Translate now handles more than 100 languages and, yes, quality varies significantly depending on which language pair you choose. But for the most well-traveled language pairs, quality has improved dramatically over the past decade.
And this is a generic language translation product. Imagine what you could do if you took a machine translation engine and optimized it for your company’s industry, product mix, and knowledgeable. The quality should rise measurably and, for those who have done just that, it has. Microsoft was a pioneer in using machine translation to unlock vast quantities of knowledgeable articles without any human post-editing.
In the 2022 Web Globalization Report Card, I specifically highlight a number of websites that have placed machine translation in the hands of users, allowing us to self-translate content on demand.
Like Airbnb, which defaults to using it:
I found this article by Reinhard Rapp to be an excellent overview of NMT as well as how to spin up your own machine translation engine (I haven’t yet gotten around to doing this). And there is ample reason for building your own translation engine — because there are so many languages in need of translation.
He writers:
It should be noted that it is almost impossible to beat the best commercial translation systems such as Google Translate, Microsoft Bing Translator, or DeepL, in the field they were designed for – namely the translation of general language. The reason is that these companies have invested considerable efforts not only in the underlying technology but also in searching the web for human translations and therefore have huge amounts of training data at their disposal for popular language pairs. But this is not the case for a plethora of lesser-used language pairs, many of which are not even covered by the big players. For such language pairs, you might have no other choice than to develop your own system.
And then there is real-time translation of speech. There is so much happening here it’s impossible to write about it all, but you’ve probably seen examples on a Microsoft or Google mobile app. Here is an example from Microsoft of how the app might work in translating a tour guide’s talk.
Though machine translation has been in development for over 40 years now, we’re in many ways just getting started. From translation and interpretation to transcription and subtitles, machine translation is becoming a critical layer of the multilingual “stack” that is powering global commerce and communications.
And things are starting to get very interesting.
To learn more about how websites are using machine translation, check out the 2022 Web Globalization Report Card.