New Microsoft Tech Translates Speech in Near Real Time
We're one step closer to a universal translator thanks to the efforts of Microsoft Research.
On October 25 during Microsoft Research Asia’s 21st Century Computing event in Tianjin, China, the company's Chief Research Officer Rick Rashid demonstrated new speech-to-speech translation technology that's capable of not only converting English into spoken Mandarin Chinese in real time, but keeps the user's voice intact as well.
In a blog posted on Thursday, Rashid said Microsoft's new software translator is based on a new technique called Deep Neural Networks, or DNN. It ditches the currently-standard "hidden Markov modeling" technique (which is based on training data from several speakers) in favor of human brain behavior in order to better recognize and mimic proper speech patterns.
By taking the gray matter route, Rashid said his team has seen a 30-percent reduction in translation errors when compared to the older Markov method. That means only one out of seven or eight words are incorrect compared to the old method's one in every four or five words error rate.
"While still far from perfect, this is the most dramatic change in accuracy since the introduction of hidden Markov modeling in 1979, and as we add more data to the training we believe that we will get even better results," he said in the blog.
The demonstration consisted of two steps. As he spoke to the audience, the system converted his speech into text. It then located the Chinese equivalent of each word (the easy part, he said) and reordered them to be appropriate for Chinese dictation – an extremely important step for correct translation between languages, he said.
"Of course, there are still likely to be errors in both the English text and the translation into Chinese, and the results can sometimes be humorous. Still, the technology has developed to be quite useful," he said.
In the next step, the text was quickly converted into spoken Chinese while retaining the properties of his own voice. "It required a text to speech system that Microsoft researchers built using a few hours speech of a native Chinese speaker and properties of my own voice taken from about one hour of pre-recorded (English) data, in this case recordings of previous speeches I’d made," he added.
Despite the team's achievements thus far, Rashid acknowledged that the results still aren't perfect – there's much work that still needs to be done in order to reach a Star Trek level of quality. "The technology is very promising, and we hope that in a few years we will have systems that can completely break down language barriers," he said.
To see and hear how this new translation system works, check out his presentation below.

They gave us the tablet, pc, universal translator, stun gun,etc
Let's not get carried away here. There's a huge gulf between painting a wooden block to look like a stun gun and actually making one. Yeah, ST had ideas ahead of it's time, but that's all they were. . . ideas and grown men playing pretend.
It took an actual smart guy to make those things real.
Good job!!!
but it's a console so that's kind of iffy.
Because Star Trek gave us the concept of the universal translator, a universal translator could never be patented - at least in the US. Any device described or shown in fictional media instantly becomes unpatentable. Another example is Dick Tracey's video phone watch. If anyone realizes such a watch in real life, they could not get a patent on it because it appeared in Dick Tracy.
IMHO, no worries about patent litigation with this one.
Microsoft does this, Nissan makes a car that can park and drive itself and NASA + numerous other companies do work on powered exoskeletons that allowed disabled people the luxury of walking.
Yet Apple comes out and markets a smart phone to non business types and makes a smaller version of a blown up iPod Touch and somehow they are named as the most innovative company in the world.
This world does not make sense sometimes.
probably better.