Meters, Mixing Levels, Loudness (and the new ITU spec)
In Lecture 1 you were introduced to the complex relationship between sound level and perceived loudness. Understanding this relationship turns out to be very important to us as designers of sound for a number of reasons:
- Our perception of loudness is not constant (or even remotely linear) at different frequencies, so it is possible to have high level signals that nonetheless sound weak (and vice versa) depending on their frequency content.
- The levels at which we monitor in the studio have an impact on how we hear our work, and consequently on how the work translates to different spaces and systems, because our relative perception of frequency is not constant across different levels.
- Our judgement of frequency balance and loudness is not constant with time. This is particularly true if we tire our ears out with working – it becomes harder to make reasonable decisions.
- To get the best out of our equipment, we need to understand how it works and interconnects; this means knowing about the various different dB scales we will encounter, and how they should align.
- As of September 2012 a number of areas have adopted a recent ITU recommendation on a loudness (rather than level) based form of metering and specification for broadcast. Other sectors are investigating and following suit, so it is quite likely that this will be a standard and required practice for many of you in your professional dealings.
Insofar as the ITU recommendation arose as an attempt to circumvent the sonic race to the bottom of the ‘loudness war’, this whole issue of how we relate the levels of our equipment, our perceptions of loudness and our working practices occurs at a complex intersection of technology, psychology, aesthetics, economics, politics, philosophy, etc…
Levels and dB (again)
So, to recap, in Lecture 1 you were introduced to the decibel (dB) and the notion of sound pressure level. Specifically,
dB SPL = 20log10(p/p0) where
p0 is the pressure that represents the nominal threshold of human hearing (0.00002 Pascals), and p is our measured pressure.
You also met dBfs, the decibel scale for digital signal levels. In this case the reference quantity is whatever the maximum sample value for a given word length (the number of bits being used; 16, 24, etc.) is (i.e. how many different numbers you can represent with a given number of binary bits). You can work this out simply by
max value = 2word length - 1. So for 8 bits, 255; 16 bits, 65535, and so on.
This means that the maximum possible ratio is 1 (i.e.
max value / max value), and therefore the top of the scale is always 0 dBfs. (The log 1 – of any base – always equals zero; just as any number raised to the power 0 always = 1). The bottom of the scale is indicated by the dynamic range afforded by a particular word length, which can be worked out by
dynamic range = 20log(2word length * √(3/2)) = (6.02 * word length) + 1.76.
So the bottom of the digital scale is approximately -98dB for 16 bits, and approximately -146 dB for 24 bit. In practice the noise floor will be above this, of course (dithering raises it slightly, and there will be a noise floor inherited from the recording environment). (If you see -96 and -144 dB around, this is because the above is often approximated without the √(3/2), or simply by 6 times the number of bits). Be aware that things are slightly different for floating-point formats, such as we use in most DAWs (although the converters are still fixed-point 16 or 24 bit).
Finally, we need to meet the electrical decibel. Somewhat confusingly, there are (at least) three different references we might encounter. Most commonly, we come across two: dBu and dBv. Here, we are dealing with ratios of voltages to some reference. For dBu, this is 0.775v RMS unloaded; for dBv, 1v RMS.
High end audio gear is normally referenced to a standard line level of +4 dBu. Cheaper, ‘semi-pro’ and consumer equipment uses -10dBv. Be aware that because these use different references, the difference between them is not 14dB! When you work it out properly, it is closer to 12 dB.
So, we have three different ‘realms’ to consider in our signal chains, and correspondingly three different types of decibel:
digital (dBfs) ↔ electrical (dBu / dBv) ↔ acoustic (db SPL)
Meters: Peak and Averaging
In Lecture 1 you were also introduced the notion that our perception of loudness is not based so much upon instantaneous peak values in a signal, but more closely resembles the average energy over time.
The meters on a lot of our equipment, and in our DAWs (at least at the moment) tend to be peak meters (PPMs or similar); that is, they report peak values of the signal. However, it is important to realise that neither digital nor analogue PPMs are necessarily instantaneous. Analogue variants tend to be quasi-PPMs because they are still performing some (albeit very quick) averaging operation on the signal. Similarly, digital PPMs (obviously) aren’t updating at 22.5kHz and, moreover, may only report a clip if some number of successive samples exceeds 0 dBfs.
On some other gear (and available as plug-ins) we can get meters with a slower averaging process, most commonly the VU (Volume Unit) meter. These use a moving average of the signal over 300 milliseconds (which is still quite quick). Whilst these correlate more closely with subjective loudness than PPMs, it is still pretty approximate; the same deflection from a 30Hz signal and a 1kHz signal will have markedly different perceived volumes, because the meter is still a linear measure of (average) amplitude.
You will notice also that all VU meters and most PPMs have a dB scale; ‘dB with reference to what?’ you may well ask. The answer is: it kinda varies; where we set the 0 dB point on a VU meter or PPM is not relative to a fixed physical quantity as with SPL or dBu, but is a matter of convention, and not everyone uses the same convention! What fun!
Reference Levels, Alignment and Monitoring Levels
This always gives people trouble but is incredibly important, so lets pause and review the story so far:
- We have a number of different dB scales for describing levels of phenomena (pressure, voltage, digital amplitude) relative to some reference quantity.
- We know that human hearing is significantly non-linear, and that our perception of loudness in particular is a complex function of sound pressure level, frequency content, time and other factors.
- We have a working practice that encompasses digital, electrical and acoustic realms, and a need to produce work that translates well between diverse variants of this basic system recipe.
- The most common instrumentation we are offered for monitoring signal levels in our equipment corresponds, at best, approximately to our perception of actual volume.
Moreover, we know from experience that our ears can be fooled; with fatigue we are liable to start making misjudgements and might (for instance) start to over emphasise high frequency content, whilst our auditory system is busy defensively damping our high frequency sensitivity; we might also (and often do) mistake something louder for something better and over-work our audio. A lot of this can be brought under control, at least to some extent, by being systematically rigorous in how we set our equipment up: how the levels of components relate to each other (alignment) and at what level we do our work (monitoring level).
Why is this important? It does, after all, seem kind of dull. Well, here’s the thing, and it’s so important it gets red letters:
The crucial skill in doing good work with sound is trusting your ears. However, there are so many interfering factors and uncertainties that, in order to stand any chance of this happening, we have to bring as many of those factors under control as we can.
So, first is the matter of how we line up our digital and electrical realms. Admittedly, this will often be the area in which you have least control (at least until you have your own studios) and some cheaper equipment doesn’t even offer the opportunity to try. Nonetheless, it is good to be aware of best practices.
The process is actually pretty simple. All you need to do is decide upon (sensible) reference levels for your electrical and digital equipment, and then make sure they line up. Likewise, given that there is (sometimes) some degree of choice over where the 0 point on meters might sit, this too needs to be lined up with your sensible reference level.
Fortunately, there are standard practices for this. Less fortunately, there are different standards depending on where / for whom you are working (and furthermore, not all gear is well designed enough to either behave in spec, or allow calibration.). In Europe, the EBU R68 recommendation uses a reference level of -18 dBfs = 0 dBu; in European and UK film post, 0 dB VU is often set to 0 dBu. In the US, SMPTE SR155 uses a reference of -20 dBfs = +4 dBu = 0 dB VU. There are, however, other references in use; apparently -15 dBfs is common in Germany, for instance.
The basic goal, however, is the same: to ensure that there is sufficient headroom throughout the chain. This, in turn, is based on some set of assumptions about the range of the material (the EBU spec, for instance, is predicated on having a peak level of -9 dBfs, thus allowing 9 dB of headroom before clipping) and about the clipping point of the analogue equipment relative to these levels. High quality equipment will operate comfortably with any of these conventions, whereas cheaper gear (where the manufacturer is more likely to have skimped on quality components and adequate supply voltages) could start to become nonlinear (i.e. distort) before these limits are reached. The solution in this case (short of buying some better equipment) is to move the reference down (e.g. 0 dBu becomes -8 dBVU), albeit whilst grumbling.
Ok, so we’ve lined up our digital and analogue levels (or it’s been done for us). Now what? We calibrate our amplifiers such that our reference levels are associated with some reference SPL.
Now, this is murkier territory than the alignment of our dBfs and dBu levels, and it is worth explaining both why the numbers that are commonly used are chosen, but also why there will also necessarily be some variation. The numbers you see most commonly are in the range 79-85 dB SPL for a -20 dBfs reference (which is quite loud if you’re listening to it constantly). The broad reasoning is that it is in this region that the ear tends to exhibit the least spectacular frequency variation, and it is loud enough that loud passages have impact, without being dangerously loud for prolonged exposure. The variation comes about in part due to individual difference and preference, and also because the upper end of this range feels generally too loud in smaller spaces, particularly those without acoustic treatment.
Setting this up is pretty easy. You need some pink noise at -20 dBfs, and a sound level meter (set to C weighting, slow). Put the meter where your head would be at the sweet spot and measure the level coming from each speaker on its own; adjust to make each speaker equal. There are obviously more involved extensions to this, including placing and tuning sub-woofers, looking at level at different frequencies, adding acoustic treatment etc., but that would take us too far off-topic.
The mastering engineer Bob Katz made a proposal a few years ago that is, in some ways, a precursor to the the new ITU recommendation described below. Katz suggested that at lower monitoring levels than the ~83 dB SPL region, the inclination of the engineer is to make it louder, and that this translates to greater use of dynamic range compression (so as to lower the peaks and enable raising the average level). In an attempt to resist the tendency of the loudness war to produce recordings with less and less dynamic range, and more and more distortion, Katz suggested that one could set fixed attenuation points of one’s master volume in relation to a 0 dB point set at -20 dBfs = 83 dB SPL. Leaving the volume at 0 dB would work for the most dynamic material (similar to established practice in the film industry), and by turning it down by 6-8 dB people could work on more dynamically constrained material.
The K-System made no particular recommendation about metering, beyond Katz’s suggestion that a move to slower meters may be beneficial and that the system was amenable to more sophisticated loudness models than RMS (such as A-, B- or C-weighted measures, LAeq or Zwicker’s). However, despite the emphasis on trying to reclaim dynamics and on coupling reference level to monitoring level, there was nothing really to stop people using the K-system as a slightly more sophisticated way of targeting RMS levels with their masters.
Those links above illustrate some different attempts to take account of the complexity of how we hear in relation to linear measurements of physical phenomena, albeit with different levels of sophistication and with different goals in mind. Below I will introduce a new international recommendation for loudness based metering that is being adopted in many areas of audio production, and in some countries is becoming a legal requirement for the delivery of certain types of content. Before proceeding, however, it is worthwhile to emphasise just how complex and partially understood our hearing is, even for such a fundamental and basic aspect as the perceived loudness of a sound.
Whilst there is a reasonable level of understanding about what happens in the outer, middle and inner parts of the ear (‘the periphery’), much of what goes on in the brain remains poorly understood. There are a couple of things worth noting:
- In the inner ear, sound is decomposed into different frequency bands. However, these bands are not linear in frequency (nor even, it seems, linear in pitch), but form a set of non-uniform, highly overlapping zones referred to as critical bands.
- Sounds within the same critical band will give rise to a number of psychoacoustic phenomena as they interfere with each other. For example we can get a sense of dissonance or ‘roughness’ from tones that are close together. Within these channels, there is also a mechanism that acts like a dynamic range compressor, so that the change perceived loudness will be different for two sounds in the same band than it would be for two sounds in distant bands.
- Sounds in neighbouring bands of different levels can obscure each other, something called masking. This forms part of the basis for lossy compression formats like MP3 and AAC.
- Once sounds leave the periphery and enter the central nervous system all kinds of quite radical information reduction takes place. Moreover, there are feedback channels to the periphery that seem to affect, at a physical level, how the ear behaves. Remarkably, it seems that our sense of expectation plays a role in how this information reduction takes place, which is important both for our ability to focus on particular sounds in complex auditory scenes, and gives an indication of how tired or confused ears can be fooled into making strange mixing decisions!
James Johnston, one of the researchers behind the AAC format, has a series of relatively accessible blog posts on the complexities of hearing (although orientated towards stressing the importance of proper blind testing procedures):Part 1, Part 2, Part 3, Part 4.
So, the ear is complex and adaptive. Meanwhile, the loudness war has given rise to a number of practical problems in almost all fields of audio distribution: in music recordings are produced with ever dwindling dynamic range and ever greater distortion, in broadcast viewers are subjected to radical level jumps between segments, and so on.
The New Loudness Standards, and What They Might Mean for Us
Depending on the field in which one worked, standard production practices have tended to be orientated around peak normalisation of material. That is, lining up recordings by their peak level; in music, recordings are now often lined right up 0 dBfs; in broadcast a common specification is to peak at -9 dBfs (which allows some headroom in recognition of the fact that analogue QPPMs don’t truly measure signal peaks). As is hopefully clear by now this has almost no bearing on how loud we perceive the material to be.
A set of new recommendations has been developed since 2006, and are now becoming standard or legally stipulated practice in some industry areas and countries. As such, there is a high likelihood that you will encounter them in your forthcoming professional practices and it is therefore worthwhile to get a handle on what they mean, and how to use them as soon as possible.
The core proposal comes in the form of a recommendation from the ITU (International Telecommunications Union) on how to measure the loudness of a signal (ITU BS.1770-2). Local bodies, such as the EBU in Europe and ATSC in the US, have then published (pretty similar) guidelines on working practices derived from the ITU spec. The common thrust of these proposals (actually, now rules in some places!) is that they enable a move away from normalising by peaks to normalising by (approximate) loudness:So, material that has been more compressed in dynamic range (so as to appear louder when peak normalised) loses its ‘advantage’ by being reduced in gainso that its perceived loudness is on par with more dynamic material.
ITU BS.1770: Measuring Loudness
As is evident from the above, measuring perceived loudness has the potential to be a highly complex affair. One of the things that is remarkable about ITU 1770 is that the scheme it proposes is very simple! It turned out from the ITU’s (apparently quite rigorous) research that this simple model performed about as well as considerably more complex models based on psychoacoustics. Simplicity has a number of advantages, particularly insofar as the resulting scheme is extremely easy to implement, and computationally cheap.
The first thing that happens is that a frequency weighting curve is applied to the signal by filtering. It looks like this, and is called the ‘k-weighting’ (but has nothing to do with Bob Katz):
So what happens is that low frequencies are de-emphasised (register as less loud) and high frequencies register as louder. This weighting is followed by an averaging process similar (as with a VU meter). The respective loudnesses from different channels are then summed and converted to a logarithmic scale (like dB) to give a loudness figure.
Loudness Units and Programme Loudness
That means we have yet more units of measurement! The ITU define a relative unit of loudness called the LU (very much like VU above), which can be used for setting reference points on meters and describing ranges or differences in loudness. There is also an absolute unit defined relative to 0 dBfs; the ITU and the US standard currently call this LKFS (Loudness, k-weighted, relative to 0dBfs), whereas the EBU use LUFS. They are completely equivalent (and hopefully one or the other will be dropped for clarity’s sake).
The idea is that adjusting the gain of your mix by n dB should result in a corresponding change of n LU.
The fundamental measurement that the recommendation is concerned with is the loudness of a whole item – a programme, a commercial, a trailer, a movie, etc. This measure is derived by tracking the average LU over a whole programme (called the integrated measure).
Gating: Dealing with Dynamics
There remained a problem, however. Material with a high dynamic range (i.e. lots of quiet with a few noisy episodes) would end up being measured as artificially quiet, and then would not line up properly when loudness-normalised. The solution to this problem was to introduce a mechanism such that relatively quiet sections would not contribute to the loudness measure for the whole segment. What has been (quite recently) agreed upon is that material that is 10 LU or more below the average loudness of the programme will not be measured (so it is adaptively gated out). Furthermore, there is an absolute gate at -70 LUFS that stops extraneous background noise contributing to the loudness figure.
True Peak Levels
Remember how I said that 0 dBfs was the top of the digital scale? Well that’s only kind of true. Sorry.
It turns out that it is possible, upon signal reconstruction (digital to analogue conversion) to end up with levels that are effectively above this nominal maximum. This is because digital audio is just a kind of model of the eventual signal that will come from the loudspeakers – consider that it would be possible to sample either side of a peak, and thus get digital peak readings that were actually too low.
This means that the peak meters in our DAW are even less useful than we thought, as they could miss situations that would actually cause clipping upon playback. Whether this happens or not is largely down to whether or not the equipment manufacturer has built-in headroom for signals that overshoot. As normal, more expensive equipment tends to do this, cheaper gear doesn’t; so a signal peaking around 0dBfs could sound fine in the studio, but distort on consumer equipment.
It turns out that the lower our sample rate, the more inaccurate our DAW peak meters are at gauging the true peak level. Again, the ITU opted for a simple approach, and mandate the measuring of peak values with a meter that oversamples (converts the signal to a higher sample rate – 192kHz, in this case) and measures the peaks of this up-sampled signal. The idea is that measuring peak values relative to this (plus a bit of headroom) will be more accurate, and cause less unintended distortion.
This gives us another unit! dBTP – dB true peak. This just describes a peak value measured with a compliant peak meter.
Working Practices: EBU R128
On the basis of the ITU recommendation, the EBU (as well as other organisations worldwide) have developed a set of working practices. These are principally aimed at broadcast, but elements are being considered in other areas (such as game design, h/t Varun) and are of general value, so worth understanding!
The fundamental aspect of the EBU (and related) specifications is to establish a reference loudness level as measurable by a ITU 1770 compliant meter. In Europe this level is -23 LUFS. They also mandate that the peak level of a programme should be no higher than -1 dBTP, and suggest a statistical mechanism for determining the range of loudness of a programme – i.e. how dynamic it is.
Furthermore, they establish a specification for metering, based on the ITU 1770 measurement. As well as an overall programme level and a true peak meter, EBU compliant meters have a ‘momentary’ loudness measure based on a 400ms averaging window (slightly slower than a VU meter), and a ‘short-term’ loudness based on a 3-second window.
There is a great deal more material available about this system:
This video presentation by Florian Camerer is well worth setting aside an hour to watch in order to get acquainted with these ideas:
So, like, what?
Now, why have I gone into such detail? Some of you do / will work in broadcast or similar, and therefore, this will affect you directly. But those who do not may be wondering at the relevance of all this. Well, two things: even outside broadcast, this is quite likely to have a bearing on the distribution or production of your work; second, it is should be of intrinsic interest as it represents a wholehearted effort to develop coherent working practices that should make our audio sound better.
If you deal mainly with music production, it is worth bearing in mind that many media players are starting to adopt some form of loudness normalisation, based upon BS.1770. iTunes has its own proprietary (of course) ‘sound check’ system; others make use of ‘ReplayGain’ meta-data. They both work in the same way, however, which is to scan your library with a loudness measure, and then simply turn down material that is above reference (-18 LUFS for Sound Check, -16.5 LUFS seems to be standard in ReplyGain based players).
Off the back of all this, I have some suggestions for your studio practices that I hope will help you deliver optimal work, as well as get you acquantied with the ITU/EBU/ATSC workflows:
- Get hold of an EBU compliant meter. The Melda one is free in its basic form; there are other free cheap ones as well (as some very expensive ones with fancy logging facilities and what have you).
- Practice working at -23 LUFS. Don’t worry about getting individual pieces / sounds to hit -23 on the integrated measure (just yet); this makes more sense for broadcast in any case. Just get used to having -23 LUFS as your meter 0-point that you mix around – i.e treat -23 as a maximum level (or thereabouts).
- With this in mind, get to know your monitoring level at -23 LUFS – borrow a sound level meter from the office even.
- If you’re producing music, then you might consider -18 or -16.5 as alternative references; it shouldn’t matter though, as you can always just turn it up 😉
- DON’T whatever you do succumb to the temptation to try and make your submitted work louder with aggressive dynamic range manipulation; we’d much, much rather hear the most dynamic, detailed and spacious work you can do and leave the loudness war out of it. If you can submit work at -23 LUFS, all the better (though it is worth noting that you have done so, in order that who ever is marking it is aware). (In any case, get in the habit of annotating your work with its absolute loudness value).