'Starship Commander' Leverages Microsoft Cognitive Services For Voice Command Recognition

Human Interact revealed Starship Commander, a virtual reality choose-your-own-adventure science fiction game that places you at the helm of an interstellar starship. But unlike any other game you’ve played, Starship Commander accepts only voice commands.

Virtual reality opens new doors to creative ideas. In the early days of this new medium, there are no norms and no rules. Guidelines for what makes a compelling virtual reality experience don’t yet exist; for now, imagination and technology are the only limitations. To that end, Alexander Mejia, Owner and Creative Director at Human Interact, sought to build something truly groundbreaking for his first VR project, Starship Commander.

Starship Commander is a first person VR narrative story set in deep space. You play as the commander of an XR71 space ship sent on a classified cargo transport mission to the “Delta system.” But more than the story, it's the input method that will raise your eyebrows. Starship Commander doesn't accept physical input commands. You must initiate all action by speaking to the computer. After all, you never see the captain of the Starship Enterprise manning the controls; commanders command.

Speech recognition is something you don't see often in games, and when you do, the implementation usually isn't all that great. Human Interact said that in its quest to bring voice commands to Starship Commander, it tried several “off-the-shelf” voice recognition technologies with little success. The team needed to be able to insert a custom dictionary to account for the made-up words in the game’s storyline, such as the names of alien races. Human Interact was also looking for a solution that could interpret natural speech so that players wouldn’t be limited to specific scripted phrases.

Human Interact turned to Microsoft’s Cognitive Services and used the Custom Speech Service to insert the custom dialect from the game’s storyline into the AI’s dictionary. During Microsoft Build 2016, Microsoft introduced 22 Cognitive Services APIs, which allow developers to integrate technologies derived from Cortana into their applications. The company demonstrated how its technology could be used to interpret the speech of a young child or to automatically identify objects in a photo and create captions to describe them. It was only a matter of time before someone found a reason to use this technology in a game.

YouTube

Watch On

Mejia noted that the Custom Speech Service understands how people talk and automatically generates additional recognized phrases after it receives a handful of options. He also said that Custom Speech Service cut the word recognition errors in half compared to other speech recognition services that he and his team tried.

“We were able to train the Custom Speech Service on keywords and phrases in our game, which greatly contributed to speech recognition accuracy,” said Adam Nydahl, Principal Artist at Human Interact. “The worst thing that can happen in the game is when a character responds with a line that has nothing to do with what the player just said. That’s the moment when the magic breaks down."

Human Interact said that Microsoft’s speech recognition lets you feel like you are part of the story. Instead of following a set script, you get to add your own personality to the dialog of the game. Virtual reality sells the promise of immersion, and how better to feel immersed in an experience than to feel like you’re having a real dialog with characters in the game?

Starship Commander is coming to Oculus Rift on the Oculus platform and HTC Vive on the SteamVR platform. Human Interact has not yet announced a release date for the game.

YouTube

Watch On

TOPICS

Kevin Carbotte is a contributing writer for Tom's Hardware who primarily covers VR and AR hardware. He has been writing for us for more than four years.

8 Comments Comment from the forums

Achoo22

As far as I can tell, every single one of the Cognitive APIs is a black-box, online implementation. The last thing I want in an entertainment product is an open mic piped into Microsoft's servers. No thanks.
Reply
uglyduckling81

It's an interesting idea. It will be terribly implemented though with only a few actual key words I'm guessing. Also if Cortana is anything to go by with an Australian accent I will say
Me: "Take them out"
Cortana: "Sorry you can't buy 5 puppies in this game today"
Me: "Ah for f**k sake, shoot something"
Cortana: "It would be 35 and sunny at the lake today"
Me: "Sh** game, give me a refund"
Cortana: "Your credit card has been charged for 75 copies of Windows 10, enjoy your purchase, and thanks for buying with Microsoft"
Me: "God f*****g damn it, how the f***............"
Reply
AndrewJacksonZA

*chuckle* Thanks for the smile uglyduckling81! :-)
Reply
JakeWearingKhakis

nice uglyduckling XD

But yeah, I was at MAGFest this year and I saw people playing a role-playing tabletop game in teams of 5 pretty much doing this exact type of game. I consider myself a nerd, but man was I holding in the laughs. More power to the people who like to do this kind of thing, there is definitely a market for it.
Reply
sadsteve

If this is a single player game, no thanks. I don't want to play a single player game that requires an internet connection for basic functionality. Plus, VR/AR has no attraction for me at this time. Once the goggles are about the size of a pair of sunglasses I may have an interest.
Reply
LORD_ORION

>>"As far as I can tell, every single one of the Cognitive APIs is a black-box, online >>implementation. The last thing I want in an entertainment product is an open mic >>piped into Microsoft's servers. No thanks."

Unfortunately this is how it has to work for the average user.
If you have the ability to install and properly configure a speech rec server yourself, you also have the knowledge to configure your router and send game utterances back to yourself.

Maybe I'll take a look and see if I can do it and then write a guide, if the game is any good. ;)
Reply
cryoburner

19268417 said:
As far as I can tell, every single one of the Cognitive APIs is a black-box, online implementation. The last thing I want in an entertainment product is an open mic piped into Microsoft's servers. No thanks.

Yes, because they might be listening to you... giving commands to your starship. >_> I'm not fond of the idea of using online voice recognition for general computer input, but using it in the context of this game seems like much less of a concern.

It would be nice if there were an offline backup option for the voice recognition though, even if it weren't as accurate. It seems like only a matter of time before Microsoft changes their online voice recognition servers in a way that breaks the game, and the game becomes completely non-functional at that point, due it being no longer supported by the developer.

Of course, that's assuming the game is actually something you will want to replay years down the line. If current VR games are anything to go by, this game will probably have a short run time and limited replayability, and the gameplay probably won't extend much beyond being a shiny tech demo.
Reply
Achoo22

Unfortunately this is how it has to work for the average user.
Nah, I've seen really dumb people use Dragon Dictate since the 1990s. That these APIs are strictly server-side has absolutely nothing to do with technical limitations or inept end-users.

I'm not fond of the idea of using online voice recognition for general computer input, but using it in the context of this game seems like much less of a concern.
The concern is the depth of the profile Microsoft, or anyone Microsoft supplies data to, can generate on you. It's disgusting and every little bit makes it worse. There is no context under which I wish to assist them in this endeavor.
Reply

Show more comments