Posted on

What’s Missing From Your 3D Sound Toolbox?

Audio for VR/AR is getting a lot of attention these days, now that people are realising how essential good spatial audio is for an immersive experience. But we still don’t have as many tools as are available for stereo. Not even close!

This is because Ambisonics has to handled carefully when processing in order to keep the correct spatial effect – even a small phase change between channels significantly alter the spatial effect – so there are very few plugins that can be used after the sound has been encoded.

To avoid this problem we can apply effects and processing before spatial encoding, but then we are restricted in what we can do and how we can place it. It is also not an option if you are using an Ambisonics microphone (such as the SoundField, Tetra Mic or AMBEO VR), because it is already encoded! We need to be able to process Ambisonics channels directly without destroying the spatial effect.

So, what is missing from your 3D sound toolbox? Is there a plugin that you would reach for in stereo that doesn’t exist for spatial audio? Maybe you want to take advantage of the additional spatial dimensions but don’t have a tool to help you do that. Whatever you need, I am interested in hearing about it. I have a number of plugins that will be available soon that will fulfil some technical and creative requirements, but there can always be more! In fact, I’ve already released the first one for free. I am particularly interested in creative tools that would be applied after encoding but before decoding.

With that in mind, I am asking what you would like to see that doesn’t exist. If you are the first person to suggest an idea (either via the form or in the comments) and I am able to make it into a plugin then you’ll get a free copy! There is plenty of work to do to get spatial audio tools to the level of stereo but, with your help, I want to make a start.

Posted on 1 Comment

Free Ambisonics Plugin: o1Panner

I am working on some spatial audio plugins to provide some more tools for VR/AR audio and I am kicking things off with a freebie: the o1Panner. It is free to download from the Shop.

What is it?

The o1Panner a simple first-order Ambisonics encoder with a width control.

How to use it

There are two display types: top-down and rectangular. The azimuth, elevation and width are controlled in different ways in each of these views. The views are selected by right clicking on the display.

For the top-down view, azimuth is controlled by clicking and dragging on the main display, the elevation is controlled by holding shift and dragging up/down and width is controlled by holding ctrl and dragging up/down.

For the rectangular view, azimuth and elevation correspond to the x- and y-coordinates respectively and width is controlled by holding ctrl and dragging up/down.

What does it output?

The output is AmbiX (SN3D/ACN) Ambisonics. This is the format used by Google for YouTube 360 and is quickly being adopted as the standard for Ambisonics and HOA.

What’s coming up?

I am working on several Ambisonics and HOA plugins that will be available in 2018. Some of them will do things that other plugins do, but most of them should do something new. Some of them will do something more creative and experimental. If you want to see a certain effect for spatial audio, just get in touch and let me know what you want. If you’re the first person to suggest a plugin that gets developed then you will get a free copy to say thanks!

What about HOA?

The industry is rapidly moving on from first-order Ambisonics and embracing HOA. For example, ProTools recently added support up to third-order Ambisonics. Higher order tools are in the pipeline, so check back soon.

Stay Up To Date

If you want to keep current with upcoming plugin news and about updates to the o1Panner, subscribe to the mailing list:

Posted on

Fundamentals of Communication Acoustics

MOOCs can be a great way of following a world-class course on a topic without having to enrol in a university and pay the associated fees.

For those interested in spatial audio (and audio more generally) there is a MOOC starting on 23rd October called the Fundamentals of Communication Acoustics that looks like it covers a lot of important topics. It’s followed up by Applications of Communication Acoustics.

I’m considering auditing these, because it never hurts to refresh the basics and the course is taught by some very talented people so I’ll probably even learn plenty of new things!

The MOOC is available on the EdX platform: here.

Posted on

What Is… Higher Order Ambisonics?

This post is part of a What Is… series that explains spatial audio techniques and terminology.

The last post was a brief introduction to Ambisonics covering some of the main concepts of first-order Ambisonics. Here I’ll give an overview of what is meant by Higher Order Ambisonics (HOA). We’ll be sticking to the more practical details here and leaving the maths and sound field analysis for later.


Higher Order Ambisonics (HOA) is a technique for storing and reproducing a sound field at a particular point to an arbitrary degree of spatial accuracy. The degree of accuracy to which the sound field can be reproduced depends on several elements: the number of loudspeakers available at the reproduction stage, how much storage space you have, computer power, download/transmission limits etc. As with most things, the more accuracy you want the more data you need to handle.

Encoding

Spherical harmonics used for third-order HOA (image by Dr Franz Zotter https://commons.wikimedia.org/wiki/File:Spherical_Harmonics_deg3.png)

In its most basic form, HOA is used to reconstruct a plane wave by decomposing the sound field into spherical harmonics. This process is known as encoding. Encoding creates a set of signals that depend on the position of the sound source, with the channels weighted depending on the source direction. The functions become more and more complex as the HOA order increases. The spherical harmonics are shown in the image up to third-order. These third-order signals include, as a subset, the omnidirectional zeroth-order and the first-order figure-of-eights. Depending on the source direction and the channel, the signal can also have its polarity inverted (the darker lobes).

An infinite number of spherical harmonics are needed to perfectly recreate the sound field but in practice the series is limited to a finite order \(M\). An ambisonic reconstruction of order \(M > 1\) is referred to as Higher Order Ambisonics (HOA).

An HOA encoded sound field requires \((M+1)^{2}\) channels to represent the scene, e.g 4 for first-order, 9 for second, 16 for third, etc. We can see that very quickly we require a very large number of audio channels even for relatively low orders. However, as with first-order Ambisonics, it is possible to do rotations of the full sound field relatively easily, allowing for integration with head tracker information for VR/AR purposes. The number of channels remains the same no matter how many sources we include. This is a great advantage for Ambisonics.

Decoding

The sound field generated by order 1, 3, 5 and 7 Ambisonics for a 500 Hz sine wave. The black circle in the middle is approximately the size of a listener’s head.

The encoded channels contain the spatial information of the sound sources but are not intended to be listened to directly. A decoder is required that converts the encoded signals to loudspeaker signals. The decoder has to be designed for your particular listening arrangement and takes into account the positions of the loudspeakers. As with first-order Ambisonics, regular layouts on a circle or sphere provide the best results.

The number of loudspeakers required is at least the number of HOA encoded channels coming in.

A so-called Basic decoder provides a physical reconstruction of the sound field at the centre of the array. The size of this physically accurately reconstructed area increases with increasing order but decreases with frequency. Low frequency ranges can be reproduced physically (holophony) but eventually the well-reproduced region becomes smaller than the size of a human head and decoding is generally switched to a max rE decoder, which is designed to optimise psychoacoustic cues.

The (slightly trippy) animation shows orders 1, 3, 5 and 7 of a 500 Hz sine wave to demonstrate the increasing size of the well-reconstructed region at the centre of the array. All of the loudspeakers interact to recreate the exact sound field at the centre but there is some unwanted interference out of the sweet spot.

Why HOA?

Since the number of loudspeakers has to at least match the number of HOA channels the cost and practicality are often the main limiting factor. How many people can afford 64 loudspeakers needed for a 7th order rendering? So why bother encoding things to a high order if we are limited to lower order playback? Two reasons: future-proofing and binaural.

First, future-proofing. One of the nice properties of HOA is that you can select a subset of channels to use for a lower order rendering. The first four channels in a fifth-order mix are exactly the same as the four channels of a first-order mix (see the spherical harmonic images above). We can easily ignore the higher order channels without having to do any approximative down-mixing. By encoding at a higher order than might be feasible at the minute you can remain ready for a future when loudspeakers cost the same as a cup of coffee (we can dream, right?)!

Second, binaural. If the limiting factors to HOA are cost and loudspeaker placement issues then what if we use headphones instead? A binaural rendering uses headphones to place a set of virtual loudspeakers around the listener. Now our rendering is only limited by the number of channels our PC/laptop/smartphone can handle at any one time (and the quality of the HRTF). The aXMonitor is an example of an Ambisonics-to-binaural decoder that can be loaded into any DAW that accepts VST format plugins, plug Pro Tools | Ultimate in AAX format.

The Future

As first-order Ambisonics makes its way into the workflow of not just VR/AR but also music production environments, we’re already seeing companies preparing to introduce HOA. Facebook already includes a version of second-order Ambisonics in its Facebook 360 Spatial Workstation [edit: They are now up to third order!]. Google have stated that they are working to expand beyond first-order for YouTube. I have worked with VideoLabs to include third-order Ambisonics in VLC Media player. This is in the newest version of VLC.

Microphones for recording higher than first-order aren’t at the stage of being accessible to everyone yet, but there are tools, like the a7 Ambisonics Suite that will let you encoded mono signals up to seventh-order, as well as to process the Ambisonics signal. There are also up-mixers like Harpex if you want to get higher orders from existing first-order recordings.

All of this means that if you can encoded your work in higher orders now, you should. You do not want to have to go back to your projects to rework them in six months or a year when you can do it now.

Posted on

What Is… Stereophony?

This post is part of my What Is… series that explains spatial audio techniques and terminology.

OK, you know what stereo is. Everyone knows what stereo is. So why bother writing about it? Well, because it allows us to introduce some links between the reproduction system and spatial perception before moving on to systems which use much more than 2 loudspeakers.

Before going any further, this post will deal with amplitude panning. Time panning will be left for another day. I also won’t be covering stereo microphone recording techniques because that could fill up its own series of posts.

The Playback Setup

A standard stereo setup is two loudspeakers placed symmetrically at \(\pm30^{\circ}\) to the left and right of the listener. We will assume for now that there is only a single listener equidistant from both loudspeakers. The loudspeaker basis angle can be wider or narrower but if they get too wide there is a hole-in-the-middle problem. Too narrow and we reduce the range of positions at which the source can be placed. Placing the loudspeakers at \(\pm30^{\circ}\) gives a good compromise between these two, balancing sound image quality with potential soundstage width.

A standard stereo listening arrangement.
A standard stereo listening arrangement.

The tangent law prediction of perceived source angle for different level differences
The tangent law prediction of perceived source angle for different level differences

Placing the Sound

Amplitude panning takes a mono signal and sends copies to the two output channels with (potentially) different levels. When played back over two loudspeakers the level difference between the two channels controls the perceived direction of the sound source. With amplitude panning the perceived image will remain between the loudspeakers. If we know the level difference between the two channels then we can predict the perceived direction using a panning law. The two most famous of these are the tangent law and the sine law. The tangent law is defined as
\begin{equation}
\frac{\tan\theta}{\tan\theta_{0}} = \frac{G_{L} – G_{R}}{G_{L} + G_{R}}
\end{equation}
where \(\theta\) is the source direction, \(\theta_0\) is the angle between either loudspeaker and the front (30 degrees in the case illustrated above) and \(G_{L}\) and \(G_{R}\) are the linear gains of the left and right loudspeakers.

The ITD produced for a source panned with loudspeaker level differences generated by the tangent law.
The ITD produced for a source panned with loudspeaker level differences generated by the tangent law.

How It Works

Despite being simple conceptually and very common, the psychoacoustics of stereo are actually quite complex. We’ll stick to discussing how it relates to the main spatial hearing cues.

As long as both loudspeakers are active, signals from both loudspeakers will reach both ears. Due to the layout symmetry, both ears receive signals at the same time but with different intensities corresponding to the level differences of the loudspeakers. Furthermore, since it has further to travel, the signal from the left loudspeaker will reach the right ear slightly later than the signal from the right loudspeaker. The opposite is true for the right ear. This time difference combined with the intensity difference gives rise to interference that generates phase differences at the ears. These phase differences are interpreted as time differences, moving the sound between the loudspeakers.

The ITD (below 1400 Hz) is shown in the figure and is roughly linear with panning angle. This is pretty close to exactly what we see for a real sound source moving between these angles. This works pretty well for loudspeakers at \(\pm30^{\circ}\) or less, but once the angle gets bigger the relationship becomes slightly less linear.

These strong, predictable ITD cues mean that any sound source with a decent amount of low frequency information will allow us to place the image pretty precisely. Content in higher frequency ranges won’t necessarily be in the same direction as long frequency content because ILD becomes the main cue.

Even though stereo gives rise to interaural differences that similar to those of a real source, that does not mean it is a physically-based spatial audio system (like HOA and WFS). The aim is to produce a psychoacoustically plausible (or at least pleasing) sound scene. Psychoacoustically-based spatial audio systems tend to use the loudspeakers available to fit some aim (precise image, broad source) without regards to if the resulting sound scene ressembles anything a real sound source would emit. 

So, there you have a quick overview of stereo from a spatial audio perspective. There are other issues that will be cover later because they relate to other spatial audio techniques. For example, what if I’m not in the sweet spot? What if the speakers are to the side or I turn my head? What if I add a third (or forth or fifth) active loudspeaker? Why do some sounds panned to the centre sound elevated? All of these remaining and non-trivial points shows just how complex perception of even a simple spatial audio system can be.