Lior Libman of One Hour Translation has released a web tool that you can use to quickly determine if text was translated by one of the three major machine translation (MT) engines: Google Translate, Yahoo! Babel Fish, and Bing Translate.
It’s called the Translation Detector.
To use it, you input your source text and target text and then it tells you the probability of each of the three MT engines being the culprit.
How does it know this? Simple. Behind the scenes it takes the source text and runs it through the three MT engines and then compares the output to your target text. So the caveat here is that this tool only compares against those three MT engines.
Being the geek that I am, I couldn’t help but give it a test drive.
It correctly guessed between text translated by Google Translate vs. Bing Translate (I didn’t try Yahoo!). Below is a screen shot of what I found after inputing the Google Translate text:
Next, I input source and target text that I had copied from the Apple web site (US and Germany). I would be shocked if the folks at Apple were crunching their source text through Google Translate.
And, sure enough, here’s what the Translation Detector spit out:
So if you suspect your translator is taking shortcuts with Google Translate or another engine, this might be just the tool to test that theory.
Though in defense of translators everywhere, I’ve never heard of anyone resorting to an MT engine to cut corners.
I actually see this tool as part of something bigger — the emergence of third-party tools and vendors that evaluate, benchmark, and optimize machine translation engines. Right now, these three engines are black boxes. I wrote awhile back of one person’s efforts to compare the quality of these three engines. But there are lots of opportunities here. As more people use these engines there will be a greater need for more intelligence about which engine works best for what types of text. And hopefully we’ll see vendors arise that leverage these MT engines for industry-specific functions.
UPDATE: As the commenters noted below, there are limits to the quality of results you will get if you input more than roughly 130 words. The tool is limited by API word-length caps.