NEW

Join our product team on August 6th to find out all of the latest Dolby Millicast product releases Learn more

Developer

Create Audio Podcasts from Video By Using a Transcode API in Post-Production

SUMMARY

Learn to produce audio-only mp3s from recorded mp4 videos as a podcasting tool use case.


The popularity of podcasting has created new opportunities for many brands, artists, communities, and organizations to share their stories. Getting noticed with as many options as there are for listeners can be challenging. In order to maximize visibility for content discovery, creators need to be efficient in producing and delivering content where and how the audience wants to listen to it.

Dolby.io offers a collection of APIs that allow developers to build tools for production workflows that podcasters can use to create all the variations needed for publishing on a variety of platforms with a consistently high quality.

Record Content Once, Reuse Everywhere

Among the tips & tricks shared by experienced streamers and podcasters you’ll find a suggestion to create content that can be captured once and then reused and shared in multiple forms. To maximize this opportunity of create once and use everywhere, a Transcode API can help by creating variations of an asset that can be used in various ways.

– Host a live stream on Twitch, Facebook, and/or YouTube
– Produce an audio-only podcast of it delivered on Apple, Spotify, and/or Stitcher
– Package the recording to make it available on-demand from YouTube or Vimeo
– Create short promo-videos or clips from the session to share on social media

Pro-Tip: Record a live stream and then reuse the content for other purposes.

A software developer can build an application to create all these assets with a customized workflow to achieve these outcomes efficiently using Dolby.io APIs.

✓ Use the Communications SDKs to host an interactive session with the host and guests
✓ Use a Custom Mixer Layout for views in a style similar to Open Broadcast Studio (OBS)
✓ Broadcast the session with RTMP to various live streaming platforms
✓ Use the Recording API to store and retrieve a copy

You can find details on these first few steps from other posts such as Set up Live Stream to Twitch, Creating Custom Layout for Streaming, Deliver Video Conference Recordings with Webhooks, and many more.

Once you have a recording from the interactive session, the Media APIs become powerful tools for the post-production workflow.

✓ Use the Transcode API to extract audio-only media
✓ Use the Enhance API to improve the audio quality
✓ Use the Transcode API to create streaming formats

Let’s look more closely at this first example.

How-to Create an Audio-Only Podcast from a Video Recording

Depending on the tools you use to capture a video and stream may constrain the type of file format you created. For instance, if you recorded a session with the Communications SDK you would have an MP4 using the H.264 codec with a 1080p (1920 x 1080) at 30fps resolution. If you are using QuickTime or another tool you may end up with a MOV using the H.265 / HEVC codec. The Transcode API helps you produce the variations you need from any supported format you use as an input regardless of what format you started with.

Let’s look at an example of extracting only the audio stream from such a video file. Using the Spotify Podcast Delivery Specification as an example:

We recommend either high bitrate (128kbps+) MP3 (only MP3 is supported for passthrough) or MP4 with AAC-LC. A maximum duration of 12 hours (roughly 2GB @ 320 Kbps) is recommended/supported.

Spotify Podcast Delivery Specification v1.9

Working with REST APIs

The Dolby.io Media APIs are fundamentally REST endpoints that take a JSON body payload. We’ll focus on how to construct the elements of a request rather than the mechanics of how to interface from your preferred programming language (Python, JavaScript, Java, Golang, etc.) and environment.

(1) Define Your Storage

The Transcode API is frequently used to generate multiple output files. To avoid needing to specify your storage credentials multiple times, you define each location as a storage list. Each storage object has a unique id that can be used to correlate any listed inputs or outputs. For example, you could write the audio output into a different output folder from other variations you may want to create.

You can do this with your preferred cloud storage provider such as Microsoft Azure or Google Cloud. In this example, I’ll use AWS S3, but see the How-to Transcode with Your Own Cloud Storage guide which explains some of these alternatives.

I’m using a bucket I created called dolbyio. The id can be anything I want it to be, in this case I matched the name of the bucket itself. The url is smart and recognizes the s3:// pattern. Finally for auth, I’m using an IAM user with permissions set to read and write to this bucket.

"storage": [
    {
        "id": "dolbyio-storage-id",
        "bucket": {
            "url": "s3://dolbyio/tests/transcode/",
            "auth": {
                "key": "...",
                "secret": "..."
            }
        }
    }
]

Note how the url contains the folder path as well so you may need more than one storage location depending on how you want separate data.

(2) Correlate Inputs with Storage

With the storage defined, you can use the id there and set the corresponding ref used in defining the source (or destination) to use that storage location. This ability to correlate inputs with storage means that you can take any number of inputs from any number of storage locations to fit the use case.

"inputs": [
    {
        "source": {
            "ref": "dolbyio-storage-id",
            "filename": "videocast-original.mov"
        }
    }
]

The filename is appended to the url defined in the storage section. If you want to store inputs and outputs in different folder locations you will need to use more than one storage location.

(3) Define Outputs Needed

The Transcode API supports different kind of files to help streamline the result you want. In our case, if we just want an MP3 with only the audio, we just need to specify that kind of result.

"outputs": [
    {
        "destination": {
            "ref": "dolbyio-storage-id",
            "filename": "podcast-audio-only.mp3"
        },
        "kind": "mp3",
    }
]

This produces an MP3 with sample rate of 44100 and bitrate of 128k by default.

If instead we needed to provide an MP4 we could do that as well. Since MP4 is a type of container, we’ll want to specify that we want to remove the video from the original and then can provide specifics if we want a particular codec or bitrate_kb in the result.

"outputs": [
    {
        "destination": {
            "ref": "dolbyio-storage-id",
            "filename": "podcast-audio-only.mp3"
        },
        "kind": "mp3",
    },
    {
        "destination": {
            "ref": "dolbyio-storage-id",
            "filename": "podcast-audio-only.mp4"
        },
        "kind": "mp4",
        "video": "remove",
        "audio": [{
            "codec": "aac_lc",
            "bitrate_kb": 320
        }]
    }
]

The defaults happen to be aac_lc and 320 kbps but this demonstrates some of the flexibility. To package this up you’ll make a POST https://api.dolby.com/media/transcode request with this combined JSON as the body.

The Getting Started with Transcoding Media guide walks through the steps of formulating that request with sample code for some popular environments such as Python, JavaScript, or even cURL from a command line shell.

Building Podcast Applications

Dolby.io provides several API that can help build applications that support podcasting use cases. If you are looking to create a webapp, a mobile app, or a post-production media workflow the Transcode API provides a way to automate generating variations of your user-generated media. In this project we demonstrated how to create an audio-only variation of a video file, but we’ll revisit this topic again to discuss other ways of improving the content creation experience for your users.

Jayson DeLancey

Developer Relations

Jayson DeLancey leads the Developer Relations team for Dolby.io. With 20+ years of software development experience, he is inspired by the blend of creativity and technology he sees from our customers. He devotes himself to improving the everyday developer experiences so that developers can focus their attention on the fun parts of writing code.

Get Started

Drive real-time interactions and engagement with sub-second latency

We are more than just a streaming solutions provider; we are a technology partner helping you build a streaming ecosystem that meets your goals. Get started for free and as you grow, we offer aggressive volume discounts protecting your margins.

Developer Resources

Explore learning paths and helpful resources as you begin development with Dolby.io.

Copy link
Powered by Social Snap