Posted on

What Is… Stereophony?

This post is part of my What Is… series that explains spatial audio techniques and terminology.

OK, you know what stereo is. Everyone knows what stereo is. So why bother writing about it? Well, because it allows us to introduce some links between the reproduction system and spatial perception before moving on to systems which use much more than 2 loudspeakers.

Before going any further, this post will deal with amplitude panning. Time panning will be left for another day. I also won’t be covering stereo microphone recording techniques because that could fill up its own series of posts.

The Playback Setup

A standard stereo setup is two loudspeakers placed symmetrically at (pm30^{circ}) to the left and right of the listener. We will assume for now that there is only a single listener equidistant from both loudspeakers. The loudspeaker basis angle can be wider or narrower but if they get too wide there is a hole-in-the-middle problem. Too narrow and we reduce the range of positions at which the source can be placed. Placing the loudspeakers at (pm30^{circ}) gives a good compromise between these two, balancing sound image quality with potential soundstage width.

A standard stereo listening arrangement.
A standard stereo listening arrangement.
The tangent law prediction of perceived source angle for different level differences
The tangent law prediction of perceived source angle for different level differences

Placing the Sound

Amplitude panning takes a mono signal and sends copies to the two output channels with (potentially) different levels. When played back over two loudspeakers the level difference between the two channels controls the perceived direction of the sound source. With amplitude panning the perceived image will remain between the loudspeakers. If we know the level difference between the two channels then we can predict the perceived direction using a panning law. The two most famous of these are the tangent law and the sine law. The tangent law is defined as
begin{equation}
frac{tantheta}{tantheta_{0}} = frac{G_{L} – G_{R}}{G_{L} + G_{R}}
end{equation}
where (theta) is the source direction, (theta_0) is the angle between either loudspeaker and the front (30 degrees in the case illustrated above) and (G_{L}) and (G_{R}) are the linear gains of the left and right loudspeakers.

The ITD produced for a source panned with loudspeaker level differences generated by the tangent law.
The ITD produced for a source panned with loudspeaker level differences generated by the tangent law.

How It Works

Despite being simple conceptually and very common, the psychoacoustics of stereo are actually quite complex. We’ll stick to discussing how it relates to the main spatial hearing cues.

As long as both loudspeakers are active, signals from both loudspeakers will reach both ears. Due to the layout symmetry, both ears receive signals at the same time but with different intensities corresponding to the level differences of the loudspeakers. Furthermore, since it has further to travel, the signal from the left loudspeaker will reach the right ear slightly later than the signal from the right loudspeaker. The opposite is true for the right ear. This time difference combined with the intensity difference gives rise to interference that generates phase differences at the ears. These phase differences are interpreted as time differences, moving the sound between the loudspeakers.

The ITD (below 1400 Hz) is shown in the figure and is roughly linear with panning angle. This is pretty close to exactly what we see for a real sound source moving between these angles. This works pretty well for loudspeakers at (pm30^{circ}) or less, but once the angle gets bigger the relationship becomes slightly less linear.

These strong, predictable ITD cues mean that any sound source with a decent amount of low frequency information will allow us to place the image pretty precisely. Content in higher frequency ranges won’t necessarily be in the same direction as long frequency content because ILD becomes the main cue.

Even though stereo gives rise to interaural differences that similar to those of a real source, that does not mean it is a physically-based spatial audio system (like HOA and WFS). The aim is to produce a psychoacoustically plausible (or at least pleasing) sound scene. Psychoacoustically-based spatial audio systems tend to use the loudspeakers available to fit some aim (precise image, broad source) without regards to if the resulting sound scene ressembles anything a real sound source would emit. 

So, there you have a quick overview of stereo from a spatial audio perspective. There are other issues that will be cover later because they relate to other spatial audio techniques. For example, what if I’m not in the sweet spot? What if the speakers are to the side or I turn my head? What if I add a third (or forth or fifth) active loudspeaker? Why do some sounds panned to the centre sound elevated? All of these remaining and non-trivial points shows just how complex perception of even a simple spatial audio system can be.

Posted on

What Is… Spatial Hearing?

This post is part of the What Is… series that explains spatial audio techniques and terminology.

Spatial hearing is how we are able to locate the direction of a sound source. This is generally split in to azimuth (left/right) and elevation (vertical) localisations. Knowing how we localise is essential to understanding the spatial audio technologies. Human spatial hearing is a complex topic with lots of subtleties so we’ll ease in with some of the main concepts.

Interaural Time Difference (ITD)

ITD for a Neumann KU100 dummy head averaged across all frequencies below 1400 Hz
ITD for a Neumann KU100 dummy head averaged across all frequencies below 1400 Hz

Consider a single sound source near to a listener. The sound source will radiate sound waves that will travel through the air to listener. These waves will reach the nearer (ipsilateral) ear of the listener earlier than the further (contralateral). This produces a time difference between the signals at both eardrums known as the interaural time difference (ITD). The brain can extract the time difference by comparing the two signals and will use this as an estimate of the direction of the sound. Whichever ear is leading in time dictates whether the sound is heard to the left or the right. The graph shows the average ITD for frequencies up to 1400 Hz. It has a clear sinusoidal shape that varies predictably with azimuth, making it a useful localisation cue.

ITD cues are mainly evaluated at low frequencies (below approximately 1400 Hz). This is the frequency range at which the wavelength of the sound is long enough when compared to the size of the head to avoid phase ambiguity. Above this frequency the phase can “wrap” around and it not possible to tell if there have been, say, 0.5 cycles, 1.5 cycles etc.

Luckily, we can use another method to localise in higher frequencies.

Interaural Level Difference (ILD)

Interaural level difference (ILD) for a Neumann KU100 dummy head in the horizontal plane up to 15 kHz

As frequency increases and the wavelength becomes shorter than the size of the listener’s head, acoustic shadowing becomes important, producing an interaural level difference (ILD). The shadowing causes the level at the contralateral ear to be reduced compared to the ipsilateral. This is in contrast to low frequencies where the wavelengths are so large that the level differences to not vary significantly with source direction (unless the sound source is very close!).

Where ITD exhibits a sinusoidal shape, making direction estimation relatively simple, ILD can vary in a complex manner with source direction. This is due to how the sound waves interact with the head and doesn’t mean that the biggest level difference happens as \(pm90^circ\). In fact, this ILD is actually lower at \(pm90^circ\) than at some less lateral positions. This is known as the acoustic bright spot. The complex ILD patterns are shown in the graph where the more yellow/blue the colour the larger the ILD. Yellow means the left ear is greater than the right and blue the right is greater than the left.

ITD and ILD are work well for differentiating between left and right. But imagine a sound source starts directly infront of you, moves in an arc over your head to finish directly behind you. At no point do ITD and ILD have any value other than zero but we can still perceive the elevation of the sound source. How are we able to do this?

Spectral Cues

The frequency spectra for a sound source directly in front of and above the listener. Note the significant notch at 8 kHz for the frontal source that is missing in the elevated source.
The frequency spectra for a sound source directly in front of and above the listener. Note the significant notch at 8 kHz for the frontal source that is missing in the elevated source.

The outer ears (pinnae) are a very complex shape. They cause the sounds to be filtered in a way that is highly direction dependent. This leads to peaks and notches in the frequency response of the source spectrum that can be used to evaluate the direction, primarily for elevation. The frequencies of the peaks and notches are highly individual, depending strongly on the shape of the outer ears. This is something that the brain learns and it can use this internal template to incoming sounds and give an estimate of localisation.

For example, the graph to the left shows the frequency spectra for a sound source at two different positions: in front and above. The frontal source has a deep notch at 8 kHz which is not the case for the elevated source. This could be used to differentiate between the two elevations, even though the signals at the left and right ears would be (nearly) identical.

Localisation accuracy tends to be much less accurate for elevation than it is for azimuthal (left/right) judgements. This can have implications for how we might design a spatial audio system or on how well they can work.

Is that it?

Not by a long shot! We haven’t covered things like interaural envelope difference, distance estimation, the effect of head movement, the precedence effect, the ventriloquist effect but these are the main principles we need to understand to get to grips with the basics of spatial audio.