Spatial Hearing, Surround Sound & A Lot Of Voodoo
What Is Spatial Hearing?
A true reproduction of the frequency spectrum is only part of what we're looking for. We also need to assess how well individual sound sources are resolved, and how easily they may be spatially located. This, in turn, represents the playback's precision, in which spatial hearing plays an important role.
The human body has two ears. Between them, the head serves as an acoustic barrier. But how do we actually perceive spatial information relayed through acoustic signals, and what enables us to attribute acoustic events to a specific source in space? This is based on two factors: the respective time differences (meaning when a specific sound reaches both ears) and the difference in intensity (that is, the difference in sound pressure level).
Usable information about the spatial position of a sound source due to differences in intensity and delay time can only be recognized and processed by the ears and brain if the sound has distinct characteristics (sudden occurrence, spectrum, pressure level, etc.). For example, it is almost impossible to spatially separate sources of background noise, such as in the woods or in a city, when the observer is surrounded by a large number of similar sources. Faster and higher-contrast changes in tonal characteristics make it easier to localize the distinct source of a sound.
Transmission Delay Differences
Transmission delay refers to the time difference between the moment of a sound event and the time needed for that noise's sound wave to reach both ears. If the source is not directly in front of the observer (with a deviation of at least 3° or more), logically, the sound reaches the ear closer to the source before reaching the other one. The resulting time delay depends on the difference in distance the sound wave needs to travel to reach the respective ear. Our ears are capable of detecting time differences as small as 10 to 30 microseconds!
Difference In Intensity
A difference in intensity (or loudness) occurs when the wavelength of a sound wave compared to the size of the head is small enough to allow for reflections to occur, with the wave breaking at the head as an obstacle. As the picture shows, this scenario creates a so-called sound shadow on the opposite side. However, it occurs only for frequencies above approximately 2 kHz and grows with rising frequency. For the longer wavelengths of lower-pitched sounds, the head is not really an obstacle.
Localization Of Acoustic Events
If an acoustic event occurs outside the head (for example, generated by loudspeakers), we are confronted with a process called localization. Interpreting data provided by the ears enables our brains to locate the event's origin spatially.
Incidentally, the head is always in motion to provide more precise spatial localization, since turning, raising, lowering, and tilting facilitates localization across all three planes (X, Y, and Z). In this case, and only in this case, the result would be true three-dimensional sound. But this cannot be achieved using a normal speaker setup, which positions all sound sources at more or less the same height.
Special Properties Of Headphones
When you put headphones over your ears, perception of the stimulus always occurs right at the head! And when sound waves produced by the headphones are in sync, the sound's source is always perceived to be in the middle of the head.
Lateralisation is the sound source's apparent movement from the middle of the head to one side. This headphone-specific perception of a supposed sound source "wandering" is either due to a time difference (signals are played back with an offset in time) or intensity difference.
Ears are almost never completely identical. So, with a different sensitivity for both ears, a lateralisation towards the better ear may also occur without a difference in stimulus! This is why an exact adjustment of sound balance is always the first step in optimizing the listening experience.
Surround Sound With Headphones
However, lateralisation is much more! The phase shift, or the waveform's offset as soon as it reaches the ear, may also result in a perceived change in alleged position of the sound source. This is because the auricle itself is of great importance for locating the source of acoustic events; it simultaneously serves as both a sound receptor and filter by linearly distorting impacting sound waves in different ways, depending on the direction of and distance from the source.
Each person's ear cups are unique. Hence, everybody perceives sound a little differently. The shape of the auricle alone influences how a sound wave bounces off the exterior, how it enters the ear canal, and how it is transported to the eardrum. Even hair on the auricle plays a role in this process!
What does all of this have to do with headphones? True three-dimensional hearing requires movement of the head along all axes, which isn't technically possible with tightly seated binaural headphones.
Spatial representation of a sound source's actual position along the Z-axis, either in front or behind the head, or on the Y-axis, below or above the head, are impossible to convey with just one-dimensional headphones!
Even if the software works with phase shifting, manipulates the interaural time difference and intensity, and modifies the frequency spectrum so that sounds behind the head are a little deeper or duller, it simply cannot create a real spatial sound experience. After all, it's using experiences collected from real life sound events to compensate, and thus, from time to time, may be open to manipulation.
What Is The Benefit Of Several Drivers Per Earpiece?
Headphones with several drivers installed at different angles can help with two-dimensional representation because, depending on the auditory sensation and hearing experience, sound that reaches the auricle at different angles can give rise to something like a spatial sound sensation.
The downside of such systems, however, is the uncertainty inherent to multi-source sound creation. The various drivers may coincidentally influence each other in disadvantageous ways, whether that's phase shifting or sound cancellation, due to their close proximity to each other. Enjoying music is not really possible with such a configuration. Even linear reproduction across a wide spectrum is hard to achieve.
Nevertheless, at least some viable multi-driver headsets have appeared on the market, some of which convey a quite convincing illusion. Even in these cases, though, this sensation is interpreted subjectively and can never be transferred from one person to another.
5.1- or 7.1-channel sound delivered through headphones is always imaginary, and can only be achieved by the brain relying on previous experience. Even then, the result will only be two-dimensional at best. Not even proper loudspeakers are able to convey real 3D; they're unable to reproduce more than the X- and Z-axis.
So What Gives Us The Best Performance?
Personally, I prefer a good set of stereo headphones with detail-rich sound reproduction. Not only the reproducible frequency range and its linear character play an important role, but also the system's ability to accurately reproduce several acoustic events overlapping or occurring simultaneously.
The separation of individual sources and their precise spatial location within a big acoustic picture is often referred to, in typical hi-fi jargon, as the so-called sound stage. Its width describes the good spatial reproduction capability of headphones and their respective audio resolution. Without it, the result sounds murky and undifferentiated. As an example, if sounds and noises mix down into an indistinguishable acoustic mess, spatial representation collapses like a house of cards.
Is The Difference Apparent To Everyone?
The answer can be yes and no. We repeated all of our tests several times using recorded surround material with a total of six test subjects (three male and three female) aged between 16 and 50 years. They auditioned different headsets in a randomized order. Only two of our test subjects were able to spatially locate sound sources correctly using the "real" surround headphones. Three people got it right in at least some of the test cases, while one person could only guess. Using the virtual surround headsets with one driver per earpiece, only two candidates reported to have noticed anything at all. But nobody managed to get a 100% score.
More complex and louder ambient noise increased our margin of error. Furthermore, during our blind tests, none of the test candidates were able to tell whether or not the headset used was equipped with just one or up to three drivers (+bass) per earpiece. Interestingly, two people even claimed to have perceived surround sound while using just the stereo reference headphones. This, in turn, goes to show how all of this is actually the human brain at work. So, we can't definitively answer the original question with a yes or no.
Beyond their supposed claims of surround sound or the marketing-friendly Dolby certificates they get pinned with, headphones should first of all have some basic qualities like good sound resolution for individual acoustic events across the widest possible frequency spectrum, linear sound reproduction across said spectrum, inconspicuous transient response, and high sound level stability. With these, everything else falls into place.
Very good 5.1-channel systems with multiple drivers per earpiece may be able to provide the illusion of spatial representation, with some additional help provided by the human brain, but in turn they usually suffer from a lack of proper sound resolution and clean reproduction of complex scenarios with many overlapping sound sources.
MORE: Best Deals