Google Translate turned 10 years old this year, and today the company announced the Google Neural Machine Translation system (GNMT), which utilizes “state-of-the-art” neural network training techniques to break the record for machine translation quality.
Phrase-Based Machine Translation (PBMT)
Ten years ago, Google started by using “Phrase Based Machine Translation” (PBMT) as the key algorithm that the company used for state-of-the-art (at the time) machine translation. However, since then, there have been major advances in machine intelligence, and Google has kept improving its techniques.
Neural Machine Translation (NMT)
A few years ago, Google started using Recurrent Neural Networks (RNNs) to learn the mapping between an input sentence (the sentence to be translated in another language) and an output sentence (the translated sentence).
Unlike the PBMT method, which breaks the input sentence into multiple phrases and then translates them independently of each other, the Neural Machine Translation (NMT) method works with the whole input sentence.
When NMT was first used, it showed similar accuracy to the PBMT method on small data sets. The big advantage was that the NMT method significantly simplified the translation system, requiring fewer engineering design choices. However, neural network-based techniques need significantly more processing power, and Google couldn’t use the NMT system in production for large data sets.
Google Neural Machine Translation (GNMT)
Google’s new paper, titled "Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation," describes how the company was able to overcome the many challenges required to make NMT work on large data sets. It also talks about how Google built a system that was fast enough to be used in Google Translate in production.
Google said that it's new technique is not only faster and more efficient, but it also achieves almost human levels of performance for translations. The company said that it reduced the translation errors by 55-85% for several language pairs, when rated by bilingual human translators.
How GNMT Works
Google showed the process for how its new GNMT technique works in one example, which consisted of translating a Chinese sentence to English. The method encodes the Chinese words as vectors, where each vector represents the meaning of all the words read so far.
When the entire sentence is read, the decoder begins, generating one English word at a time. Each vector is given a different “weight” in the translation process, and the ones that are found to be most relevant are the ones to be decoded.
Google Translate's Chinese To English Now Uses GNMT 100%
Google said that all Chinese to English translations - about 18 million translations per day - are now using the new GNMT system. The company said this was made possible by using its open sourced Tensorflow neural network framework and its custom TPU chip. The TPUs, which promise an order of magnitude higher efficiency compared to GPUs, seem to also have enough performance to handle such large data sets.
Google noted that the new GNMT system is still far from achieving perfect translation, and it can still make errors a human would never make. For instance, it can completely drop some words, or mistranslate proper names or rare words. It may also still translate words in isolation rather than in the context of the sentence, despite the fact that the systemnow takes the whole sentence into account when translating it.
Chinese to English is just one of the more than 10,000 language pairs that Google Translate supports, and the company said it will work on supporting as many of them as possible over the coming months.