By Paul Power, Canford Technical Support Engineer
Background
Currently there is a large amount of research into new audio reproduction formats in order to achieve a 3D listening experience. Yet there has been interest in 3D surround systems for many years now, one of the first 3D systems was proposed as early as the 1970’s by Michael Gerzon using a spatial audio rendering technology called Ambisonics. This was a complete system allowing the recording, transmission and reproduction of 3D audio. However Ambisonics was introduced during the era when quadraphonic systems were being developed and subsequently failed to catch on, this was because of the extra expense incurred at having to buy extra equipment to play quadraphonic recordings. Dolby, who began in the early 70’s marketing noise reduction systems and also developing methods of encoding and decoding three screen channels and a mono surround channel, prevailed and developed their surround systems which ultimately led to their 5.1 discrete system used today. Dolby are now marketing their 3D ‘Atmos’ system which including the normal surround channels also include height channels which can be found in select cinemas worldwide utilising up to 64 channels in total in larger cinemas. There are also numerous home surround receivers which are now Dolby Atmos enabled, utilising for example a 9.1 system which encompasses the 5.1 system but also includes four height channels.
Psychoacoustics
In order to understand the influence of 3D surround systems on our perception it is necessary to look into how our perception and hearing works. Humans use two main mechanisms in order to locate sound sources in the horizontal plane, one is the time difference of arrival between the ears, which works for frequencies up to 800Hz and the other is the level difference between the ears and works for frequencies above 1200Hz, for frequencies between 800-1200Hz it is thought a combination of both is used. However when sound sources are located in the vertical plane the listener relies more on the spectral filtering of the sound source caused by the listener’s pinna, head and shoulders combination, known as a head related transfer function or HRTF for short. This works with high frequency sounds because the wavelength is smaller therefore the head, pinna and torso are an appreciable barrier and cause the sound to be filtered. Our ability to locate a sound source in the vertical plane is not as accurate as the horizontal plane and sound sources placed symmetrically with respect to the head at an elevation directly in front of a listener the time and level differences will be the same, therefore the listener will be relying solely of the influence of the head, pinna and shoulder combination to locate the sound source, however psychoacoustic research has shown that sounds with certain frequency content will be perceived as coming from a certain elevated position because of the filtering effect of the head, pinna and shoulder combination, this being down to what has been termed directional bands.
Production
Of course with a new reproduction format then comes the challenges of how it can be utilised effectively taking into account psychoacoustics. For example when a mix engineer creates a stereo mix they have limited space in which to place sounds, which is within the stereophonic angle, sometimes slightly out with this depending on the sound source, this means that the sound image created will be confined in front of the listener and that the engineer will have to use equalisation, panning and effects like reverb in order to create space and depth for sounds so that they are not masked. Put simply, masking occurs when two sounds of either similar frequency, or one sound is louder than another occur at the same time it covers up (masks) the other sound. Masking can also occur with sounds which have just disappeared, known as post masking and pre masking can occur when a sound is masked by another sound which occurs after it. When surround systems were introduced this provided the ability, due to the fact that the sound image was spread around the listener, for the sounds to have more space and in a word be unmasked. In everyday environmental situations sounds arrive at us from many different directions, this gives rise to us being able to be selective and hone in on certain sounds. The classic example is when you are at a party and you are interested in another person’s conversation, you are able to filter out the noise around you and hone in on the conversation of interest, this is classically known as the cocktail party effect. Therefore, the move from 2D surround to 3D in theory should provide further unmasking and give us the ability to listen in more detail.
Workflow
However with new reproduction systems comes the question of how to create content for these systems, in terms of recording techniques, workflows and mixing methods and also how archived material, created in 5.1 for example could be repurposed for 3D reproduction. Currently for recording of live performances a number of 3D microphone techniques are being developed, some build on current 2D surround microphone arrays by introducing microphones placed above the standard surround array, which generally utilise one microphone to cover each speaker channel including the height channels of the 3D surround system. Other 3D recording solutions utilise for example the Ambisonic Soundfield microphone, this microphone includes three figure of eight microphones and an omni directional microphone, essentially four microphones in one, this allows a one point 3D capture solution which is independent of the reproduction format, the output of which can be decoded to virtually any speaker configuration within reason. The soundfield microphone was one of the technologies conceived by Michael Gerzon in the 70’s and has been used extensively in broadcast for several years now to create content for 5.1 systems. A new higher resolution microphone which houses thirty two microphone capsules in one microphone has also been introduced called the Eigenmike, however at the moment this has only been used for research, but follows the Ambisonic principles. Currently workflows for 3D sound reproduction are being researched for broadcast as it is necessary for broadcasters to consider enhancement of the end users experience but also to cater for legacy formats for example stereo and 5.1, therefore a workflow is necessary which is independent of reproduction format. Currently in the film world 3D production utilises Dolby Atmos in order to create content for cinema and home use which utilises what is called an object based workflow, this does not rely on a set speaker layout instead each sound source is tagged using metadata which includes information about the sound source position and sound level, therefore allowing both cinemas with Atmos systems and also cinemas with traditional surround systems to play the content without problems of creating for individual systems. Object based production workflows is currently part of broadcasters research into a new audio formats as there is a push to enhance the end users listening and viewing experience. As a result several universities across the UK are in partnership with the BBC in order to research and develop these new methods. It does seem therefore that 3D audio has come of age and could soon become part of an everyday production and reproduction format.