Behind the scenes and hands on with Microsoft's personal digital assistant.
While it is tempting to dismiss Cortana, the new personal digital assistant for Windows Phone 8.1, as Microsoft's answer to Apple's Siri for the iPhone or Google Now for Android, as yet another spin on the company's old tactic of being the fast follower and not the innovator, keep in mind that in the hotly contested mobile arena, imitation is a considerable art form. See: Apple vs. Samsung, rounds one and two; Microsoft's initial dismissive stance on tablets; Apple's early dismissive stance on small tablets; Apple Maps; Google Play Music; Apple iTune Radio. Etc.
Cortana will ship as part of Windows Phone 8.1 in the coming months, which is about as precise as Microsoft will get on the timing. Microsoft announced and demonstrated the new technology, emerging from the company's Bing platform division, at Build, the company's annual developer conference, where attendees had an opportunity to get more of a hands-on experience with Cortana, and listen to Microsoft take a few swings at Apple and Google.
Read more: Windows Phone 8.1 Introduces Cortana
Most technology derivations and imitations either attempt differentiation or build on the prior work, but differentiation doesn't necessarily beget improvement. What sounds good in focus groups, on marketing slides, or upon carefully staged reveals frequently fails to reach the cold ear of reality.
It is wise, then, to be skeptical of Cortana, a product that won't ship for several weeks, and even then will likely maintain its beta status, much as Siri did when it arrived in 2011. While my own brief experience with Cortana was hardly flawless, and I witnessed Microsoft's own team fail to get the results from Cortana they expected, this product does appear to both differentiate and build upon prior efforts. Cortana, and the claims Microsoft made regarding it, even pushed me to revisit both Siri and Google Now with a renewed purpose.
Granted, Windows Phone only enjoys only 3.3% worldwide marketshare (now ahead of BlackBerry, which sits at 1.9% but below iOS at 15.2% and Android at 78.6%), according to IDC's 2013 estimates, but that market share is growing (Windows Phone shipments increased 90.9% in 2013, 46.7% in Q4). Cortana may well give the Windows Phone converts bragging rights on the personal assistant front. And while it's difficult to say whether a successful Cortana will be enough to earn Microsoft new converts, it's certainly not out of the question.
In addition to making Cortana available for some limited hands-on testing, Microsoft hosted a small Bing platform break-out session for the press and analysts, during which Bing executives held forth solely on Cortana. Afterward, I sat down with Stefan Weitz, director of search for Microsoft Bing to learn more about what powers Cortana, and to dive into the areas that make the technology sound so promising.
How Bing Powers Cortana
Cortana began 18 months ago as a collaboration between Microsoft's Windows Phone and Bing groups, when Michael Calcagno, previously in Microsoft's natural language group, took on the role of architect for the Bing Information Platform. The Bing service already contained all of Cortana's key pieces, like visual recognition (recognition of physical objects) and the ability to make inferences. Microsoft just needed to build a platform to bring it together.
Microsoft had previously built an entity database, the technology that understands people, places, and things, and their relationships to other entities. The company dubs that technology "Satori," and it's what powers a search result that provides not just the simple answer to the question you asked, but all related information around it.
Microsoft has also been working on speech recognition, using deep neural networks (DNNs) to accomplish pattern recognition based on the way human brains process information. Waveforms are translated into bits and given to a speech recognition system, where natural language processing starts to make inferences about the user's intent.
Apple and Google each perform natural language processing also, although each company employs some intellectual property to the process.
With Cortana, most of the processing happens in the cloud, but there is also some on-device speech recognition and information processing. That is, a Cortana query gets distributed to both the device and the cloud, with the results coalescing back on the device. Some of Cortana's functions can take place entirely off line, Weitz said.
The combination of speech recognition, the use of DNNs, entity understanding, and inference comes together in a powerful way. Cortana parses what the user is talking about into a particular domain: Is it a device function? A reminder? A calendar entry? And within that domain, Cortana determines the intent of the utterance: What does the user want to do?
If one key differentiating theme emerged from Microsoft's breakout session it was that Cortana was built around the concept of task completion, rather than knowledge derived from search or voice-assisted search. "We've become beaten down by the search model," Weitz said, "so task completion has been lost." Search, he added, has "evolved to be a noun-based retrieval system for [web] pages."
The notion of task completion, or getting things done, was also the basis for Apple Siri when it first launched.
Cortana appears on the home screen as Windows Live Tile, but like Google Now, it is also powered through the device search function. It contains what Microsoft terms a proactive canvas, which is the information it gives you based on what it infers, and the reactive canvas, which responds to queries.
One important distinction the Microsoft team pointed out was that Cortana gives the user confirmation that it understood the request, and that it was finding the information; that is, not just a simple "OK" but a contextual, confirmative prompt.
Cortana frequently delivers query results by voice, but also in rich data presentations, rendered from the cloud. You can address Cortana by voice, but also from the keyboard, another difference from Siri. However, like Siri, Cortana has something of a personality (based on the personal assistant in Halo, voiced by Jen Taylor), with built-in, snarky answers to silly questions and a similar conversation-oriented approach. Let's not mince words: Microsoft has borrowed liberally from the 2011 Siri playbook here.
5 Key Cortana Differences
1.) Context. One of the key things Microsoft's Cortana brings to task completion is context, meaning that you can get a result from a query and ask further questions pertaining to that result.
When Weitz demonstrated with a search for good restaurants nearby, Cortana triangulated on ratings and proximity. He then asked if any in the result set were vegetarian and Cortana returned a subset of the first list. From that result, he could ask for information on one of them ("how far to the first one?" or "make a reservation with the second one"), meaning that Cortana understood that it had provided a list, and the request being made was on that list. Cortana adjusts the request vocabulary to what's on the result page.
When I tried this with Apple's Siri, it got stuck on the second step, looking up the word "vegetarian" instead of finding a vegetarian restaurant, let alone one from the list of nearby restaurants.
That doesn't mean Siri lacks contextual awareness. When I asked Siri which restaurant was closest, it did re-sort the list by distance. When I asked Siri to make a reservation at one of them (I called it by name), Siri was able to determine that the restaurant didn't take reservations and gave me the information to call on my own. Upon finding a list of restaurants, you can ask Siri "is it OK for kids?" When finding out what movies are playing, you can follow that up by saying "with Russell Crowe" or "buy tickets," and Siri will follow the right path.
When I asked it what the weather was and got my reply, I then asked "what about New York," Siri knew I was still talking about the weather and provided it. When I asked Siri "what about this weekend?" she gave me the upcoming weekend forecast. But when I asked Siri "what about Big Sur?" she reverted back providing me with web pages about Big Sur. Siri is being tuned for what it hears the most.
Similarly, Google Now also provides context for some things that it already knows. For example, if you ask it for pictures of the Space Needle, and then ask "how tall is it?" Google Now knows that "it" is the Space Needle. However, when I asked Google Now to show me pictures of the Hollywood sign, and then asked "where is it?" I got directions to the Space Needle. When I asked Google Now for a list of nearby restaurants, and then asked "what about Italian?" it gave me a list of nearby Italian restaurants. For some restaurants, you can even ask "show me the menu" and it will do so.
Without a longer list of examples, it's difficult to determine just yet how much more powerful Cortana will be, but you can start to see some of the promising subtleties here if everything works as promised.
2.) Inference. Cortana is like Google Now in that it mines device signals so that it can do a better job of understanding your habits, interests and priorities. At the hardware level, it's looking at location, battery state, movement (or lack of movement), and from those it might infer where home or work is. It tracks search history, looks at your calendar, and even into your e-mail. For example, it might notice flight information within an e-mail, and ask you if you want to track that flight.
Obtaining these signals requires the user to grant Cortana permission, which Microsoft believes is an important distinction.
Siri adds some of this inference-based personalization also, but it's a bit more limited. It figures out where home and work are, or you can tell it. If you ask Siri to call your brother, it will prompt you to tell it who your brother is.
Cortana promises to take things one step further. Ask it to remind you to ask your brother if you can borrow his truck, and it will provide a reminder prompt on your next interaction with him, no matter what form that interaction takes. You have to tell Siri to send you a specific reminder during a specific event (as in, "remind me to call my brother when I get home").
Google Now uses a combination of customizable cards (there are now dozens of them, including commute, flight delays, reservations, travel helping cards for things like translation and currency conversion, smart reminders for things like store chains, so you can set reminders to buy an item when you enter a particular store); and inference based on signals and information gleaned from the Google services (calendar, mail) running on your device. This is a very powerful combination, but while the cards are pretty powerful, the Google Now inferences don't seem to go as far as Cortana's.
Cortana will also work across services, even non-Microsoft ones. For example, it can read what's in your Google Mail. "[Google Now] is magic if you're all in on Google," Weitz said. If something syncs to your Windows Phone device (meaning the information is on the device), Cortana can use it, he added.
3.) Transparency & Customization. Transparency is one of the key hallmarks of Cortana, and an area Microsoft emphasized heavily. Executives seemed determined not to allow Cortana's omniscience to be mistaken for creepy. If Cortana infers that a location is home, it will ask you to confirm it. Because the devices Microsoft made available weren't meant for us to personalize, it's difficult to say how far these confirmations reach, but Microsoft implied that all inferences require user acceptance.
What's more, Cortana includes a "notebook" on the device user, essentially a collection of the things it learns, infers and tracks. This concept stemmed from Microsoft personnel going out and talking to real-life personal assistants and asking them what made them good at their jobs. One key finding: the assistants kept all sorts of information about their client in a notebook.
In Windows Phone 8.1, you can actually go into that notebook and edit or add information. This includes information about your interests, places of importance, music preferences, reminders, settings and even your "inner circle."
Your Inner Circle are those people with whom you have some heightened relationship, say a close colleague, a sibling, a friend. This Inner Circle function pulls information from what's on your phone, from the People app or Microsoft Lync or even from Facebook. You can go into the Inner Circle entry in the Notebook and assign relationships or nicknames (up to three), and you can even tell the phone's "quiet hours" mode that some of those people are allowed in (this doesn't extend to those you've pulled into the Inner Circle from Facebook, Weitz said).
Google Now has a similar concept to Cortana's Notebook. Your personal settings, or what Google determines about your actions with the device and its services, are easily accessible. While you can make some alterations and additions, those are fairly limited compared to what we saw with Cortana. For example, in Google Now, there are two Places that matter: Home and Work. That's it. In Cortana, you can add favorite places manually.
In Google Now you can add sports teams, and you can tell it your preferred mode of transportation, but you have to pick only one. You can give it stocks to follow, and what TV and video streaming service you prefer (Hulu, Amazon Prime, Netflix, etc.). There's also a bucket for everything else, a little hodgepodge of what it infers your interests are, but you can't manually add to that inferred list, as we're promised you can do in Cortana.
4.) Self-tuning. There's a great deal of behind-the-scenes work constantly going on in Cortana, especially getting it to understand a user's intent and self tuning based on user behavior. For instance, if you do a voice-based search, and it provides the wrong results, which it did during some of our brief testing, it recognizes that it has made a mistake when you ask the question a second time. On the back end, the platform then adjusts.
Or when a query ends in a web results page, that might be a signal to Cortana that it has failed to properly provide a more precise result, and it learns and adjusts, listening more intently to the next query to see what you meant, assuming you ask the question in a slightly different way. Weitz said that the Cortana/Bing service combines a certain level of human, or manual modeling when mistakes reach particular threshold, in addition to the more automated, machine-based learning.
One example Weitz used to illustrate Cortana's self-tuning nature was a request for the location of a good "BBQ joint," which Cortana didn't initially understand as a request for a restaurant. In quick time, based on his followup question, Cortana learned and added that term to its vernacular.
As a simple test, I asked Siri: "how chilly is it?" Siri gave me the weather. Google Now didn't even understand the question. When I asked Siri and Google Now if I should wear a jacket, both services gave me the weather. In other words, all of these services are working on this at some level. What's more, both Apple and Google have had years and hundreds of millions of customers using the services to help tune the services.
Apple has also fine-tuned its algorithms to understand regional accents, both in the U.S., and for a variety of other languages (and the subsequent dialects). That's right: Siri can alter its language model based on whether you have a Boston or a Texas accent. You can also teach Siri how to pronounce names. And just hit the little "?" upon entering Siri mode and there's a pretty incredible list and drill-down into the many things you can extract from Siri. Ask Siri what planes are above you, to find gas stations along your route, or to dictate a message.
One thing that would be useful for all of these services is the ability to watch the decay of your interest -- say in basketball once March Madness ends -- and start to move that out of your information stream. With Cortana and Google Now, you can manually tell the system you're no longer interested in a topic.
5.) Cortana APIs. Microsoft is also providing Cortana APIs, so developers can give the digital assistant direct access to the databases and processes within an application. Weitz said that if the app's web service can support a deep call, there's an almost limitless amount of access Cortana can be granted. Cortana can check on your Facebook status, but there's also no reason you couldn't ask Cortana to search for a particular Facebook post.
There's a new version of Skype in Windows Phone 8.1, and Cortana can access it if you ask it to "get me [person's name]." You can add content to a Hulu queue. Only this small handful of apps will be part of the Cortana rollout (those just mentioned, plus Flixster and Twitter), but other apps could easily start to take advantage of this as well with some simple API calls.
Google has not provided API access to Google Now.
Apple doesn't provide APIs for Siri, either, but the list of applications and services that Siri supports is pretty robust, including Facebook, Twitter, OpenTable (Siri will use your credentials with the on-device app to make restaurant reservations), Fandango, MLB and Yahoo. Oh, and don't forget Microsoft's Bing.