We have seen Microsoft’s HoloLens in action many times, enough to know that it works. But until now, we didn’t know exactly how it worked.
We knew that it was a self-contained augmented reality (or “mixed reality,” as Microsoft calls it) HMD. We knew it had fundamentally mobile hardware inside (an Intel SoC) that ran Windows 10, and that it had sensors and an inertial measurement unit (IMU). We also knew it had a “Holographic Processing Unit” (HPU), but without any details, Microsoft might as well have called it the “magic black box thingy.” Other than the fact that optical system was projection-based, we were in the dark on that, too.
Finally, Microsoft has now revealed more details about HoloLens--what’s under the hood, what are the system components, and how it all works together.
At a presentation over the weekend, we got the goods. “HoloLens enables holographic computing natively,” began the presenter. “You don’t have to place any markers in the environment, no reliance on external cameras or other devices. No tethers. You don’t need a cell phone, you don’t need a laptop. All you need to interact with a fully built-in Windows 10 mobile computer is 'GGV'--gaze, gesture and voice.”
And then we got down to the nitty gritty details.
[Update: You can also read our follow-up piece, Microsoft HoloLens: HPU Architecture, Detailed.]
Optics
One of the biggest mysteries of the HoloLens has been the optics. Unlike VR headsets, which produce visuals via OLED displays that are situated right in front of your eyes that you view through glass lenses, HoloLens is a passthrough device. That is, you see the real world through the device’s clear lenses, and images (holograms) are projected out in front of you, up to several meters away.
The components of the HoloLen’s optical system break down as follows: Microdisplay → imaging optics → waveguide → combiner → gratings (that perform exit pupil expansion).
That’s a lot of vocabulary words. Hang tight, we’ll explain.
Working backwards, we must understand “exit pupil” versus “entrance pupil.” In this case, the entrance pupil is your eye, and the exit pupil is the projection. The key to the whole display system working correctly is exit pupil expansion, using what’s called an “eye box.” In order for a device like HoloLens to work, you need a large and expandable eye box.
VR enthusiasts have certainly encountered the term interpupillary distance (IPD), which is, simply put, the distance between your pupils. This distance is different for everyone, and that’s problematic in VR and AR. You have to have a way to mechanically adjust for IPD, or the visuals don’t work very well. HoloLens has two-dimensional IPD, which means that you can adjust the eye box horizontally and vertically. Microsoft claimed that this capability gives HoloLens the largest eye box in the industry.
So that’s the goal. To get there, Microsoft started with what it calls “light engines,” or more simply, “projectors.” In the HoloLens, these are tiny liquid crystal on silicon (LQoD) displays, like you’d find in a regular projector. There are two HD 16:9 light engines mounted on the bridge of the lenses (under the IMU,which we’ll discuss later). These shoot out images, which must pass through a combiner; this is what combines the projected image and the real world.
HoloLens uses total internal reflection (TIR), which, depending on how you shape the prism, can bounce light internally or aim it at your eye. With IR, this can be leveraged for eye tracking. According to the presenter on stage, “The challenge doing it this way is that the volume gets large if you want to do a large FoV.” The solution Microsoft employed is to use waveguides in the lenses. He said it’s difficult to manufacture these in glass, so Microsoft applied a surface coating on them that allows them to create a series of defraction gratings.
It’s crucial, he noted, to get all of this right. Otherwise, the holograms will “swim” in your vision, and you’ll get nauseous.
He elaborated on how it all works: There’s a “display at the top going through the optics, coupled in through the diffraction grating, [and] it gets diffracted inside the waveguide.” Then it gets “out-coupled.” “How you shape these gratings will determine if you can do two-dimensional exit pupil expansion,” he added. You can use a few different types of gratings to make RGB color holograms. (If you’ve looked closely at a HoloLens HMD, you may have noticed these layered plates that form the holographic lenses.)
(Very) simply put: With a VR HMD, you essentially have two tiny monitors millimeters from your face, and you view them through glass lenses. By contrast, HoloLens spits out a projected image, which is then combined, diffracted and layered to produce images you can see in space.
Sensors Sensing Sensibly
When it comes to XR (that is, any kind of virtual, augmented or mixed reality), sensors are paramount. Whether it’s head tracking, eye tracking, depth sensing, room mapping or what have you, the quality and even the speed of the sensors can make or break the XR experience.
Considering the difficulty of doing what we’ve seen HoloLens do, understanding what sensors are on the device has been a subject of much interest.
The sensor bar on the HoloLens comprises four “environment understanding cameras,” two on each side; a depth camera; an ambient light sensor; and a 2MP photo/HD video camera. Some of these are off-the-shelf parts, whereas Microsoft custom-built others.
The environmental sensing cameras provide the basis for head tracking, and the (custom) time of flight (ToF) depth camera serves two roles: It helps with hand tracking, and it also performs surface reconstruction, which is key to being able to place holograms on physical objects. (This is not a novel approach--it’s precisely what Intel is doing with its RealSense 400-series camera on Project Alloy.)
These sensors work in concert with the optics module (described above) and the IMU, which is mounted on the holographic lenses, right above the bridge of your nose.
Said the presenter, “Environment cameras provide you with a fixed location in space and pose,” and the IMU is working fast, “so as you move your head around...you need to be able to feed your latest pose information into the display as quickly as possible.” He said that HoloLens can do all of this in <10ms, which, again, is key to preventing “swimming” and also to ensuring that holograms stay locked to their position in the real world space.
The Heart Of The System
Although the optics and sensor bar have been more mysterious than the how the system works together as a whole--we had previous sussed out that HoloLens had an Intel Cherry Trail chip, an IMU and the Holographic Processing Unit (HPU)--we had no details until now.
Obviously, the HoloLens has a custom mainboard. (Note the absence of fans or heatsinks, by the way.) The system is essentially mobile hardware with a 64 GB eMMC SSD and 2GB of LPDDR3 RAM (1GB each for the SoC and the HPU). It’s based on x86 architecture and runs Windows 10.
The Cherry Trail SoC (although precisely which one is still unknown) does much of the heavy lifting, but it’s aided by the HPU, which is currently at “HPU 1.0.” As you may have surmised, the SoC handles the OS, applications and shell, and the HPU’s job is offloading all the remaining tasks from the SoC--all of the sensors connect to the HPU, which processes all the data and hands it off as a tidy package to the SoC.
Looking at the system architecture diagram, you can see it all nicely laid out. Note especially which components are custom and which are off-the-shelf. In particular, it's interesting that the battery is custom--actually, it’s “batteries,” plural. What that means exactly, we don’t know, but it’s an intriguing detail.
You can also see that there are four microphones, and they can benefit from noise compression. Further on the audio side, there are stereo speakers mounted on the headband, right where your ears would be (imagine that). Some have criticized the HoloLens speakers as being too weak, with the audio swallowed up by ambient noise. That may be true, but they do offer impressive spatial (and directional) sound, and you get that without headphones (although you can add some cans if you like). In demos, the spatiality is almost creepy.
For connectivity, there’s Wi-Fi and Bluetooth--although, again, Microsoft has not specified which types of either.
HPU 1.0
A true deep dive into the architecture of the HPU is beyond the scope of the data we have presently, but Microsoft did provide some illuminating details.
For starters, HPU 1.0 is a TSMC 28nm HPC, and it has 65 million logic gates, 8MB SRAM, and the aforementioned 1GB LPDDR3 RAM, on a 12x12 mm BGA package. It has its own I/O, as well as PCIe and MIPI, and 12 “compute nodes.” Wrapping up his description, the presenter deadpanned, “The rest of the chip comprises the on-chip fabric, some shared fixed-function accelerators, the logic block, [and] a bunch of SRAM.” He also noted that the compute data it outputs is “very compact,” and the chip consumes less power than the SoC.
We will continue to press Microsoft for more information about HoloLens. But the above is a great, long-awaited start.