2020 API AWARDS WINNER - BEST IN MEDIA APIs LEARN MORE

Understanding Loudness

This tutorial explores background concepts and useful tips for making the best use of the Dolby Media Processing APIs to measure and correct loudness in media.

Introduction to Loudness

The audience that will listen to your media has control over the volume, i.e. they can turn it up or down and set it at their desired listening level. What platforms should want to avoid is requiring consumers to adjust their volume every time there is a transition from one piece of media to the next.

For example:

  • from one song to another
  • from one sitcom episode to a commercial break
  • from a podcast into a sponsor pitch
  • from social media updates with multimedia ads in the timeline

Loudness is the subjective perception of the level of sound. The loudness level should be kept consistent for the best listener experience. The reliable control of audio loudness has long been an issue for content creators, producers, and broadcasters. There have been various approaches to the normalization of loudness used in broadcasting which resulted in inconsistencies.

The fundamental problem comes down to a standardized method of measuring loudness of an audio program. There have historically been some objective measures, but beginning in the early 2000s the International Telecommunication Union (ITU) started work to develop a loudness metric for use in broadcasting. The published recommendation ITU-R BS.1770 "Algorithms to measure audio programme loudness and true-peak level" describes a measurement algorithm that has become the "standard" for loudness of media in broadcast and online streaming services.

Measuring Loudness

How do you measure loudness?

The algorithm specified in Recommendation ITU-R BS.1770 provides an objective measure that estimates the subjective loudness of an audio signal. This algorithm can be used by content providers, distributors, and hardware devices to reproduce audio programs at a similarly perceived loudness (with or without video). The objective loudness measure can provide a single loudness value for the program. This value can be used to calculate a global offset that can be applied to a program to align media with a similar loundess target.

How does it work?

The ITU-R BS.1770 loudness algorithm integrates the frequency-weighted power of all audio channels over a finite time period, typically from the beginning to end of the audio media or program. The weighted power of all channels is then windowed and summed, using a blocksize of 400 ms with a 70% overlap. An absolute level gate, set at -70, removes all the silent of low level blocks, and then a -10 (dB) relative level gate is applied, keeping only the "loudest" blocks. The result is a single loudness value in units of LKFS, loudness K-weighted full-scale. It should be noted, that the unit of LKFS is equivalent to a decibel scale, such that a 1 dB increase will cause the loudness reading to increase by 1 LKFFS. This is often referred to as a 1 LU increase in loudness. The EBU uses LUFS as an equivalent to LKFS, but they are identical measures.

When the entire program, i.e. all channels from beginning to end, are measured using the ITU-R BS.1770 algorithm, the result represents the loudness of the entire program and is often referred to as full program mix or program loudness in units of LKFS. The loudness of the dialog or speech is also used and measured, where only the parts of the media or program that contain speech/dialog is measured using ITU-R BS.1770. This is sometimes referred to as anchor-based loudness, with speech/dialog being the most common anchor that is used, and is referred to as the Dialog Loudness.

Aligning media/content using Dialog Loudness ensures that the level of the dialog level in media is consistent across programs and/or channels. This is commonly used in broadcast, especially with wide-dynamic range content, such as movies and premium episodic content. Having access to the isolated dialog of a program is not always possible, therefore a method to measure the loudness of the speech/dialog in already mixed media is required.

Gating Techniques

Gating improves the reliability of loudness estimation by controlling when and to what degree audio is filtered.

Dolby Dialog Intelligence is a technology that detects speech/dialog, extracts it and measures the extracted speech using the ITU-R BS.1770 algorithm. It is an industry-proven method for dialog/speech loudness measurements. The process of using an algorithm to extract and measuring the loudness of the speech, is commonly referred to as a speech-gated loudness measurement. This is compared the BS.1770 measurement which employs a relative-level gated or level-gated loundess measurement which only measures the "loudest parts" of the program.

The program loudness and dialog loudness measurements are based on the entire program/media and meant to provide an estimate of the loudness for the program/media as a whole. Short-term loudness is another loudness metric based on BS.1770, but using a 3-second window instead of the entire program and a relative-level gate is not used. It provides more of a localized measure of loudness in the progrma for the time the short-term loudness is measured, and deffined in recommendations.

Loudness Range (LRA) is often use dto gauge the dynamics off a program or media. It is a statistical metric based on the short-term loudness values of the program, and meant to give an indication of the range of loudness a program has measured in LU.

This is useful in cases where media has multiple components, say background music and dialog. It's reasonable that a human listener would expect loudness of the dialog to be the anchor of how loud media is judged to be.

True Peak Measurement

Audio signals can be represented in the analog domain by a continuous waveform, or in the digital domain by discretely sampled sequence of values. Knowing the maximum value/level of the audio signal is important to avoid clipping of the signal in downstream devices, thus potentially compromising the user experience. The typical way of measuring the maximum level a digital signal is by indicating the sample peak which is the maximum absolute value of the sampled audio. This value may not be the maximum peak of the signal in the analog domain, where the true peak may exceed the sample peak. Therefore, estimating the true peak of the audio signal in the digital domain is useful and details of how to measure it is contained in the second part of Recommendation ITU-R BS.1770. The units for sample peak are dBFS whereas for true peak they are dBTP.

true peak and sample peak illustrated

Recommendations and Standards

These are some standard approaches which are reviewed as they impact measurement and conformance.

ITU-R BS.1770

As stated above, this is the basis for most of the broadcasting and streaming loudness recommendations around the world. It has been revised a few times since inception:

BS.1770-0/1: The original recommendation which did not describe any form of gating mechanism to remove quiet or silent passages from the loudness measurement.

BS.1770-2: Addition of an absolute-level gate and relative-level gate to the loudness algorithm.

BS.1770-3: Optional emphasis and DC blocking of the true peak algorithm were removed. They were rarely used in industry implementations.

BS.1770-4: Annex 3 was added that includes an extension to the original loudness algorithm for the measurement of immersive audio channel configurations, such as 5.1.2. Note that the extended algorithm in Annex 3 is backward compatible with Annex 1 for stereo and 5.1 channel configuration.

It should be noted that for loudness measurements, a BS.1770 2, 3, and 4 compliant meter should and will produce identical results for mono/stereo and 5.1 channel content.

ATSC A/85

Advanced Television Systems Committee (ATSC) A/85.

See the Specification for more details.

EBU R.128

The European Broadcasting Union (EBU) R.128. It should be noted that in R128, the units of loudness that are used is LUFS.

See the Specification for more details.

Metering Functions

A metering function is used to measure loudness and can be used to check for compliance as required or recommended by various global and regional loudness recommendations.

This includes:

  • Dialog/Speech loudness based on Dolby Dialog Intelligence using ITU-R BS.1770
  • Full-program mix/integrated loudness using ITU-R BS.1770-4
  • True peak as defined in ITU-R BS.1770-4
  • Short-term and momentary loudness defined in ITU-R BS.1771
  • Loudness Range as defined in EBU R.128
  • sample peak

Profiles

The following profiles are defined for identifying the constraints used in recommendations for different platforms and standards.

Loudness profileMaximum loudness (LKFS)Minimum loudness (LU)Maximum True Peak (dBTP)
standard_a85-22-26-2
standard_r128-22.5-23.5-1
service_amazon-13-15-1
service_apple-15-17-1
service_facebook-15-17-1
service_pandora-13-15-1
service_spotify-13-15-1
service_soundcloud-13-15-1
service_vimeo-15-17-1
service_youtube-12-14-1
playback_laptop-14-18-1
playback_mobile-15-17-1

Dolby Media Processing APIs

Support for loudness recommendations and standards ensure consumers have a pleasant end-user experience. You can use Dolby Media Processing APIs to learn about and correct the loudness of your media.

With these APIs you can answer questions like:

  • Do I conform to a specific broadcating standard recommendation?
  • Will my media be adjusted by a platform once uploaded?
  • What percentage of media is attributable to dialog?
  • What is the peak loudness level (in decibels) for user-generated content?
  • How do I fix the loudness so that content will be accepted?

Analyze Loudness

The Analyze API can be used to get information about the loudness of a media file without making changes to the audio. This can be useful for QCing content,

Sample output from a request to /media/analyze on a mono audio file:

"result": {
    "audio": {
        "loudness": {
            "gating_mode": "speech",
            "measured": -20.55,
            "range": 5.85,
            "sample_peak": 0,
            "true_peak": 0.01
        }
    }
}

The data returned in this sample inddicate the dialog/speech loudness using a speech gating mode is -20.55 LKFS and has a range of 5.85 LU. The sample peak is 0 dBFS and the true peak is 0.01 dBTP.

Some of the parameters you can use to change how loudness is measured.

metering_mode

The metering_mode specifies the perception model to use. It can be any one of the ITU-R BS.1770 models mentioned earlier.

  • "1770-1"
  • "1770-2"
  • "1770-3"
  • "1770-4"

dialog_intelligence

If you want to disable Dolby Dialog Intelligence algorithms and dialog-gated loudness metering you can set this value to false. When disabled, the loudness estimate will be level-gated.

speech_threshold

When dialog_intelligence is enabled you can control the speech_threshold for determining whether a segment is classified as dialog or not. This value is specified as a percentage from 0 to 100 with the default value of 15%. That means if dialog is recognized to be greater than 15% it will be classified as dialog and be dialog gated while anything less will be ungated.

Example

This example measures loudness in its simplest form, just an input file.

curl -X POST https://api.dolby.com/media/analyze \
  --header "x-api-key: $DOLBYIO_API_KEY" \
  --data '{
    "input": "dlb://in/example.mp3"
  }'

Validation

Loudness validation can check if the audio complies with a specific loudness profile. The loudness profile can be specified as in the input parameters. Additional loudness constraints may be provided for custom requirements. You can find the available pre-defined options in the profiles table.

Here's an example for how to do a simple validation for conformance to the published loudness standards for Amazon's platform:

curl -X POST https://api.dolby.com/media/analyze \
  --header "x-api-key: $DOLBYIO_API_KEY" \
  --data '{
  "input": "dlb://in/example.mp3",
  "loudness": {
    "profile": "service_amazon"
    }
  }'

You can specify your own custom profile for validation as well.

curl -X POST https://api.dolby.com/media/analyze \
  --header "x-api-key: $DOLBYIO_API_KEY" \
  --data '{
    "input": "dlb://in/example.mp3",
    "loudness": {
      "profile": "custom",
      "custom": {
        "metering_mode": "1770-2",
        "dialog_intelligence": true,
        "speech_threshold": 50
      }
    },
    "validation": {
      "loudness": {
        "loudness_max": -20
      }
    }
  }'

In the response, you will find a pass/fail validation result to help give insight into how to correct the issue. For example the Analyze API may return the following response:

"result": {
    "validation": {
        "loudness": {
            "detail": "measured loudness exceeds maximum specified (-20.55 > -22); measured true peak exceeds maximum specified (0.01 > -2)",
            "pass": false
        }
    }
}

If you media doesn't pass validation, you can use the Enhance API to correct it.

Controlling Loudness with Enhance

The Enhance API allows you to correct and update your media to conform with loudness recommendations. If you deliver content to a popular platform they may follow standards that enforce a limit to the loudness in your media. It could be a brick wall adjustment without thinking about the creativity that happened during the content creation process. To protect your content you can control the loudness in a way that still conforms while taking into consideration preferences.

When you make a request to the /media/enhance endpoint you can specify custom parameters. Let's look at an example:

curl -X POST https://api.dolby.com/media/enhance \
  --header "x-api-key: $DOLBYIO_API_KEY" \
  --data '{
    "input": "dlb://in/example.mp3",
    "output": "dlb://out/example-spotify.mp3",
    "content": {
      "type": "podcast"
    },
    "audio": {
      "loudness": {
          "target_level": -18,
          "speech_threshold": 15,
          "peak_reference": "sample",
          "peak_limit": "-2"
      }
    }
  }'

The target_level in this example is set to -18. The value you use will depend on the type of content you are producing, how it will be delivered, and what devices it will be played back on. You want to choose a loudness target to match the desired outcome. Some rules of thumb:

  • Podcasts are often targeted to be between -16 to -18 LKFS
  • Music streaming services usually require content to conform to -14 LKFS
  • Television playback often limits loudness to -23/-24 LKFS based on regional standards

The pre-defined loudness profiles can be a useful reference if trying to conform to a specific platform or device.

The speech_threshold is a way to tune the Dolby Dialog Intelligence speech-gated loudness measurement. See Gating Techniques for more background. The reported loudness measurement could be slightly off if you have mixed-media content so explicitly setting this value based on your understanding of the content can yield better results.

Setting the correct peak_limit reduces clipping and distortion. Many standards and delivery specifications with platforms specify a maximum sample-peak / true-peak level. If you exceed that level it will be rejected, so setting this value pro-actively helps insure successful delivery.

Finally, peak_reference gives control over whether to use the sample peak or the true_peak as the highest level of the waveform. True peak is a more accurate representation for the audio but sometimes delivery specs will call for a sample peak limit.