Posted on

## aXMonitor Update: Personalised Binaural with SOFA Support

The aXMonitor plugins are today updated to version 1.3.2. If you have already bought one of the aXMonitor plugins, you can download the update from your account.  You should remove any old versions of the plugin from your system to avoid any conflicts.

Today’s update is all about getting more flexibility and personalisation for binaural rendering of Ambisoinics. This is probably the most requested feature update for any of my plugins, so I am very happy to be able to announce the new feature:

• Load an HRTF stored in a .SOFA file for custom binaural rendering.

This allows you to produce binaural rendering for up to seventh order Ambisonics with whatever HRTF you want, providing you with the flexibility you need to produce the highest quality spatial audio content possible.

If you aren’t sure why so many people want personal HRTF support, keep reading.

Binaural 3D audio can be vastly improved by listening with a personalised HRTF (head related transfer function). It’s the auditory equivalent of wearing someone else’s glasses vs wearing your own. Sure, you can see most of what is going on with someone else’s glass, but you lose detail and precision. Wear your own and everything comes into focus!

With that in mind, the aXMonitor plugins have been updated to allow you to load a custom HRTF that is stored in a .SOFA file. Now you can use your own individual HRTF (if you have it) or one that you know works well for you. Once an HRTF has been loaded it will be available across to all instances of the plugin in other projects.

## What is a .SOFA file?

A .SOFA file contains a lot of information about a measured HRTF (though it can be used for other things as well). You can read more about them here.

## Where to get custom HRTFs

You can find a curated list of .SOFA databases here. The best thing to do is to try a few of them until you find one that gives you an accurate perception of the sound source directions. Pay particular attention to the elevation and front-back confusions, since these are what personalised HRTFs help most with.

If you want an HRTF that fits your head/ears exactly then your options are bit more limited. Either you can find somewhere, usually an academic research institute, that has an anechoic chamber and the appropriate equipment. Then you put some microphones in your ears and sit still for 20-120 minutes (depending on their system). Once it’s done, you have your HRTF!

But if you don’t fancy going to all of that trouble, there are some options for getting a personalised HRTF more easily. A method by 3D Sound Labs requires only a small number of photographs and they claim good results. Finnish company IDA also offers a similar service.

## Get the aXMonitor

So if you like the sound of customised binaural rendering then you can purchase the aXMonitor from my online shop. Doing so will help support independent development of tools for spatial audio.

Posted on

## 50% Discount on aXPanner and aXMonitor

Today I’m having April sale and putting a 50% discount on my Ambisonic panning and decoding plugins for Windows (VST) and MacOS (VST/AU): aXPanner and aXMonitor. This offer runs until the 30th April 2018.

The aXPanner converts mono and stereo signals to YouTube360 compatible AmbiX-format Ambisonics. The aXMonitor decodes these Ambisonic signals to two-channel stereo and binaural (3D audio over headphones) formats to allow easy monitoring. Together they form the essential signal chain for spatial audio and are a great way to get started with Ambisonics.

You can check out my short tutorial on getting started with a basic Ambisonics chain here.

The aXPanner and aXMonitor available for three levels of spatial resolution: first, third and seventh order Ambisonics. Higher orders increases the spatial fidelity of the sound scene.

This 50% discount can be combined with additional 20% bundle discounts for additional savings.

You can read more details about them in my web store :

Posted on

## aXMonitor Update: Google Resonance Audio HRTFs

• The HRTFs used for binaural 3D sound have been regenerated using Google’s own Resonance Audio toolkit for VR audio. These are the same HRTFs used by Google in YouTube 360. The code released by Google is only up to 5th order, but was actually quite simple to extend to 7th order.
• A gain control has been added to boost or cut the overall level for convenience.

The minor update is a fix to make sure the plugin reports the correct latency to the host when using the Binaural or UHJ Super Stereo (FIR) methods.

Google have just open sourced their Resonance Audio SDK, including all sorts of tools for spatial audio rendering. This update to ensures that you can aXMonitor ensures that you can mix your content on HRTFs that will be widely used across the industry.

The aXMonitor is available in 3 versions, providing up to first, third and seventh order Ambisonics-to-binaural decoding.

So if you’d like to start mixing your VR/AR/MR audio content just head over to my store. With your support, I can continue to update the aX Ambisonics Plugins to bring you the tools you want and need.

Posted on

## Product Spotlight: aXCompressor

The aXCompressor is a compressor VST plugin (Windows and Mac) made specifically for Ambisonics signals. It comes in three variations: first order (a1), third order (a3) and seventh order (a7), allowing you to process . They accept any Ambisonics format that has the W channel as the first channel. This means it works for the more modern AmbiX and legacy FuMa format.

There are plenty of Ambisonics encoders and decoders but not so many things to process between these two points on the signal chain. I wanted to help bring some of the tools we take for granted when working in stereo to VR/AR and immersive audio, hence the aX Plugins. If you’re interested in trying out any of the plugins, including the aXCompressor, you can download the demo versions. You can support future development by making a purchase at from my web shop.

Posted on

## What’s Missing From Your 3D Sound Toolbox?

Audio for VR/AR is getting a lot of attention these days, now that people are realising how essential good spatial audio is for an immersive experience. But we still don’t have as many tools as are available for stereo. Not even close!

This is because Ambisonics has to handled carefully when processing in order to keep the correct spatial effect – even a small phase change between channels significantly alter the spatial effect – so there are very few plugins that can be used after the sound has been encoded.

To avoid this problem we can apply effects and processing before spatial encoding, but then we are restricted in what we can do and how we can place it. It is also not an option if you are using an Ambisonics microphone (such as the SoundField, Tetra Mic or AMBEO VR), because it is already encoded! We need to be able to process Ambisonics channels directly without destroying the spatial effect.

So, what is missing from your 3D sound toolbox? Is there a plugin that you would reach for in stereo that doesn’t exist for spatial audio? Maybe you want to take advantage of the additional spatial dimensions but don’t have a tool to help you do that. Whatever you need, I am interested in hearing about it. I have a number of plugins that will be available soon that will fulfil some technical and creative requirements, but there can always be more! In fact, I’ve already released the first one for free. I am particularly interested in creative tools that would be applied after encoding but before decoding.

With that in mind, I am asking what you would like to see that doesn’t exist. If you are the first person to suggest an idea (either via the form or in the comments) and I am able to make it into a plugin then you’ll get a free copy! There is plenty of work to do to get spatial audio tools to the level of stereo but, with your help, I want to make a start.

Posted on

## Ambisonics to Stereo Comparison

In my last post I detailed two methods of converting Ambisonics to stereo. Equations and graphs are all very good, but there’s nothing better than being able to listen and compare for yourself when it comes to spatial audio.

With that in mind, I’ve made a video comparing different first-order Ambisonics to stereo decoding methods. I used some (work-in-progress) VST plugins I’m working on for the encoding and decoding. I recommend watching the video with the highest quality setting to best hear the difference between the decoders.

There are 4 different decoders:

• Cardioid decoder (mid-side decoding)
• UHJ (IIR) – UHJ stereo decoding implemented with an infinite impulse response filter.
• UHJ (FIR) – UHJ stereo decoding using a finite impulse response filter.
• Binaural – Using the Google HRTF.

The cardioid decoder more quickly moves to, and sticks in, the left and right channels as the source moves, while this is more gradual with the UHJ decoder. To me, the UHJ decoding is much smoother than the cardioid, making it perhaps a bit easier to get a nice left-right distribution that uses all of the space, while cardioid leads to some bunching at the extremes.

The binaural has more externalisation but pretty significant colouration changes compared to UHJ and cardioid decoding, but also potentially allows some perception of height, which the others don’t.

The VSTs in the video are part of a set I’ve been working on that should be available some time in 2018. If you’re interested in getting updates about when they’re release, sign up here:

Posted on

## Ambisonics Over Stereo

Ambisonics, especially Higher Order Ambisonics, is great for 3D sound applications. But what if you have spent a long time mixing for a 3D audio format but want to share it with listeners who are only listening on stereo?

The first thing depends if they’re going to be using headphones or loudspeakers. If they’re using headphones then you can create a binaural mix in the usual way. If they are using loudspeakers then binaural is no longer an option (unless you want to go down the fragile transaural route). In this post we will focus on how you can decode from first order Ambisonics to stereo using one of two common options.

## Mid-Side Decoding

The first option is probably the simplest – treat the Ambisonics signal as a mid-side recorded scene by taking the W and Y channels, with W being the mid and Y being the side. Then you can make your left and right (L and R) stereo playback channels using \begin{eqnarray} L = 0.5(W+Y),\\ R = 0.5(W-Y) \end{eqnarray}

This is effectively the same as recording a sound field with two cardioid microphones pointing directly left and right. Sounds panned to 90 degrees will play only through the left loudspeaker and those at -90 degrees through the right.

The advantage of this sort of decoding is that it is very conceptually simple and, as long as your DAW can handle the routing, it is even possible to do without any dedicated plugins. It also results in pure amplitude panning, meaning that it has all of the advantages and disadvantages of standard intensity-stereo. However, we’ve got another option to choose from when we want to play back over a stereo system that has some advantages.

## UHJ Stereo

A more complex and interesting technique is UHJ. We’re only going to go over how UHJ for stereo listening, but it is worth noting that UHJ is mono compatible and that a 4-channel version exists from which full first-order Ambisonics information that can be retrieved via correct decoding. 3-channel UHJ can get you a 2D (horizontal) decoder by retrieving the W, X and Y channels. A nice property of the 3- and 4-channel versions is that they contain the stereo L and R channels as a subset. This means, importantly, 2-channel UHJ does not require a decoder when played back over two loudspeakers. All you need to do is take the first two channels of the audio stream.

The stereo L and R channels can be calculated using the following equations:\begin{eqnarray} \Sigma &=& 0.9397W + 0.1856X \\ \Delta &=& j(-0.3430W + 0.5099X) + 0.6555Y\\ L &=& 0.5(\Sigma + \Delta)\\R &=& 0.5(\Sigma – \Delta)\end{eqnarray} where $$j$$ is a 90 degree phase shift.

You can see from these equations, converting to UHJ from first-order Ambisonics results in signals with phase differences between the L and R channels. This creates quite a different impression to the kind of mid-side decoding mentioned above. There will obviously be some room for personal taste as to whether or not UHJ is actually preferred to mid-side decoding. Sound sources placed to the rear of the listener are more diffuse when reproduced of a stereo arrangement than those at the front, while for mid-side decoding there is no sonic distinction between a sound panned to 30 degrees or to 150 degrees.

Beyond front-back distinction, UHJ can actually result in some sounds appearing to originate from outside the loudspeaker pair by a small amount. This is why it is sometimes referred to as Super Stereo. In my experience, this effect is very dependent on the sound being played, both its frequency content and how transient it is.

Because UHJ stereo relies on phase differences between the two channels, any post-processing or mastering applied should preserve the phase relationship between L and R, otherwise there is a very real risk that the final presentation will be phase-y and spatially blurred.

Figure 1 shows the localisation curves for a sound played back over a stereo system where the signal in the Ambisonics domain is panned fully round the listener. Obviously the sound stays to the front, but the actual trajectories between UHJ and mid-side decoding are quite different. (These localisation curves were calculated using the energy vector model of localisation, so they are most appropriate for mid/high frequencies and broadband sounds).

Which of the two stereo loudspeaker decoding strategies you’ll want to use will depend on the needs of your project. Mid-side decoding is simpler and results in pure amplitude panning. UHJ can result in images outside of the loudspeaker base, but relies on the phase information being preserved. If you want to retrieve any spatial information then UHJ is absolutely the way to go.

## Tools for Stereo Decoding

I have an old Ambisonics to UHJ transcoder VST that you can download here, but they are old and I am not sure how compatible they are with newer version of Windows and Mac OSX. To remedy that, I’ve been working on an updated version that will provide simple first-order to stereo decoding. Just select which method you want to use and pump some Ambisonics through it. Keep an eye out in the near future for when it is made available!

I’m curious to hear from anyone who has used both techniques what you prefer. Leave a comment below!

Posted on

## What Is… Stereophony?

This post is part of my What Is… series that explains spatial audio techniques and terminology.

OK, you know what stereo is. Everyone knows what stereo is. So why bother writing about it? Well, because it allows us to introduce some links between the reproduction system and spatial perception before moving on to systems which use much more than 2 loudspeakers.

Before going any further, this post will deal with amplitude panning. Time panning will be left for another day. I also won’t be covering stereo microphone recording techniques because that could fill up its own series of posts.

## The Playback Setup

A standard stereo setup is two loudspeakers placed symmetrically at $$\pm30^{\circ}$$ to the left and right of the listener. We will assume for now that there is only a single listener equidistant from both loudspeakers. The loudspeaker basis angle can be wider or narrower but if they get too wide there is a hole-in-the-middle problem. Too narrow and we reduce the range of positions at which the source can be placed. Placing the loudspeakers at $$\pm30^{\circ}$$ gives a good compromise between these two, balancing sound image quality with potential soundstage width.

## Placing the Sound

Amplitude panning takes a mono signal and sends copies to the two output channels with (potentially) different levels. When played back over two loudspeakers the level difference between the two channels controls the perceived direction of the sound source. With amplitude panning the perceived image will remain between the loudspeakers. If we know the level difference between the two channels then we can predict the perceived direction using a panning law. The two most famous of these are the tangent law and the sine law. The tangent law is defined as

\frac{\tan\theta}{\tan\theta_{0}} = \frac{G_{L} – G_{R}}{G_{L} + G_{R}}

where $$\theta$$ is the source direction, $$\theta_0$$ is the angle between either loudspeaker and the front (30 degrees in the case illustrated above) and $$G_{L}$$ and $$G_{R}$$ are the linear gains of the left and right loudspeakers.

## How It Works

Despite being simple conceptually and very common, the psychoacoustics of stereo are actually quite complex. We’ll stick to discussing how it relates to the main spatial hearing cues.

As long as both loudspeakers are active, signals from both loudspeakers will reach both ears. Due to the layout symmetry, both ears receive signals at the same time but with different intensities corresponding to the level differences of the loudspeakers. Furthermore, since it has further to travel, the signal from the left loudspeaker will reach the right ear slightly later than the signal from the right loudspeaker. The opposite is true for the right ear. This time difference combined with the intensity difference gives rise to interference that generates phase differences at the ears. These phase differences are interpreted as time differences, moving the sound between the loudspeakers.

The ITD (below 1400 Hz) is shown in the figure and is roughly linear with panning angle. This is pretty close to exactly what we see for a real sound source moving between these angles. This works pretty well for loudspeakers at $$\pm30^{\circ}$$ or less, but once the angle gets bigger the relationship becomes slightly less linear.

These strong, predictable ITD cues mean that any sound source with a decent amount of low frequency information will allow us to place the image pretty precisely. Content in higher frequency ranges won’t necessarily be in the same direction as long frequency content because ILD becomes the main cue.

Even though stereo gives rise to interaural differences that similar to those of a real source, that does not mean it is a physically-based spatial audio system (like HOA and WFS). The aim is to produce a psychoacoustically plausible (or at least pleasing) sound scene. Psychoacoustically-based spatial audio systems tend to use the loudspeakers available to fit some aim (precise image, broad source) without regards to if the resulting sound scene ressembles anything a real sound source would emit.

So, there you have a quick overview of stereo from a spatial audio perspective. There are other issues that will be cover later because they relate to other spatial audio techniques. For example, what if I’m not in the sweet spot? What if the speakers are to the side or I turn my head? What if I add a third (or forth or fifth) active loudspeaker? Why do some sounds panned to the centre sound elevated? All of these remaining and non-trivial points shows just how complex perception of even a simple spatial audio system can be.

Posted on

## What Is… Spatial Audio?

This post is the first in a What Is… series. The idea is to explain different techniques, terminology and concepts related to spatial audio. This will range from the most common terms right through to some more obscure topics. And where better to start than “spatial audio” even means!

Spatial audio (with some exceptions) has generally been confined to academia but is rapidly finding applications in virtual reality (VR). There are even moves to bring it to broadcasting so it can be enjoyed by people in the comfort of their living rooms. As spatial audio moves from labs to living rooms it is worth exploring all of the different techniques that have been developed up to this point.

However, defining spatial audio can quickly become rather philosophical. For example, is a mono recording spatial audio? If I take a single microphone to a concert hall and record a performance then I have captured the sense of space, through echoes and reverberation, not just the performances themselves. This means that the space is encoded into the signal – we can tell if a recording is made in a dry studio or a cathedral. For the purposes of this series I will not be considering this to be spatial audio. Instead, I will be defining spatial audio as any audio encoding or rendering technique that allows for direction to be added to the source. How well this is reproduced to the listener will depend on the encoding and playback system but, in general, a spatial audio system will allow different sounds placed in different positions to be directionally differentiated.

There are a large number of different spatial audio techniques available and which one you want to use will depend on the final use. These techniques include (but are in no way limited to):

• Stereophony
• Vector Base Amplitude Panning (VBAP)
• Ambisonics and Higher Order Ambisonics (HOA)
• Binaural rendering (using HRTFs over headphones)
• Wave Field Synthesis (WFS)
• Loudspeaker diffusion
• Discrete loudspeaker techniques

Each of these will be explained in more detail in future posts but you can see from this non-exhaustive list that there are already quite a few techniques to choose between. To further complicate things, some of these techniques can be combined in order to take advantage of different properties of both. For example, Ambisonics and binaural can be combined in VR and augmented reality (AR) to give a headphone-based rendering that can be easily rotated (a nice property of Ambisonics).

Spatial audio techniques can also be divided between those that aim to produce a physically accurate sound field in (at least some of) the listening area and those that are not concerned with matching a “real” sound field. HOA and WFS can both be used to recreate a holophonic sound scene using an array of loudspeakers. Meanwhile, stereo and VBAP do not recreate any target sound field but are still able to produce sounds in different directions.

Whether or not the spatial audio technique is physically-based or not, we also have to consider the potentially most important element in the whole chain: the listener! All of these techniques rely on how we perceive the sound and there are any number of confounding factors that can take our nicely defined (in a mathematical sense) system and throw many of our assumptions out the window. Therefore, this What Is… series will also include elements of spatial hearing and psychoacoustics that are essential to consider when working with spatial audio.

So, spatially audio can take a number of forms, each with their own advantage, disadvantages, limits and creative possibilities. It is these, along with the technical and psychoacoustic underpinnings, that I will expand upon in upcoming blog posts.

If there are any aspects of spatial audio that you’d like to have explained then leave a comment below.