The technology today for modern video capture has never presented more choices. Camera hardware and sensors record clearer images with deeper color and brightness data. Artificial Intelligence (AI) and Machine Learning (ML) are allowing us to create new and novel video experiences. Video capture that used to cost tens of thousands of dollars has now been democratized to fit inside of your pocket as an add-on feature to your cell phone.
If you’re like me, the opportunity for high-quality, lower cost content-creation gets you excited! But all of this modern complexity comes at a cost.
In order to preserve all of this data, we use larger video resolutions and video bitrates to preserve the details of what we’re looking at. We use higher (and variable) framerates to more finely represent time. Video and audio codecs are constantly evolving to become more efficient and offer better quality for each bit spent on storing all of this information and new standards and technologies emerge to define how we collectively read, write, and transmit this data in an interoperable manner.
Evolution of Post-Production
While the technology may change, one rule of thumb remains true over the years: capture and post-production should be done in the highest quality possible. You want to derive your delivery encodes, or the compressed formats that you transmit to your audience over the internet, from a high-quality input.
Keeping up with all of the details of what device can capture which format and how to provide the best viewing experience to a wide audience can be a challenge to wade through, let alone implement. Post production is no longer just a single computer with editing software as we leverage online services to scale our workflows.
The Dolby.io Media APIs are a solution to the problems posed by some of these modern challenges, and they are particularly powerful when used together. The Transcode API can solve not only the compression and conversion piece but in some cases can replace an editing system altogether, allowing you to programmatically edit, cleanup, and make your content ready for distribution at any scale.
Modern Video Workflows at Scale
What is modern video production today, anyway? If we think about big Hollywood blockbusters, we know studios spend millions of dollars on perfecting every last detail all the way through the workflow.
But for consumer applications, the landscape is shifting. Audio and video are becoming an increasingly critical part of connecting with our audiences and reaching customers. Applications today are making it effortless to broadcast ourselves in a few clicks, while video personalization is being used to deliver customized experiences leading to higher engagement rates.
Developers are looking for services and solutions that make these processes easier, more automated, and more scalable.
What is Transcoding?
Let’s examine transcoding. Transcoding is the conversion of one format to another. Common use cases for transcoding include:
- Compressing a large bitrate capture, or “mezzanine”, format and making it smaller and more cost-efficient to send to someone else,
- Taking a format that can’t be played in a particular browser or on a particular device and converting it to something that is compatible,
- “Normalizing” a diverse set of ingest media files to a single format (think user-uploads),
- Creating a keyframe-aligned multi-bitrate and resolution set of videos for delivery with an Adaptive Bitrate (ABR) format, such as HTTP Live Streaming (HLS) for a better quality of experience (QoE) for playback,
- Converting a source video to match the delivery specifications of a delivery platform or service,
- Converting an older file format to a more modern one, or
- Creating a low-bitrate proxy to preview the contents of a larger media file or a mezzanine file that can’t be played over the web.
As our workflows move to the cloud, we look to combine steps. Combining steps reduces API calls and manual intervention, data egress, and code complexity, freeing up resources to focus on the strategic and interesting part of running your business.
Transcoding is no longer just about file conversion, we extend it to make media more “programmatic”. Examples of combining functionality with transcoding include:
- Adding an introduction video to the beginning of your main content, removing the need to use a non-linear-editor,
- Adding text and graphical overlays for video personalization,
- Audio normalization and leveling for a consistent, high-quality playback experience, and
- Audio cleanup and enhancement.
Real-World Transcoding Applications
Our Transcode API is designed to take the complexity out of media conversion and handling user-generated-content (UGC) for web and app playback. Let’s examine a couple of real-world use cases and the process of adding video to your app.
I. Testimonial App
You design an app where users record or upload testimonial videos reviewing products and you are publishing those on an app or website. You may encounter some of the common challenges faced in collecting user generated content:
- Not all devices are the same. iOS, Android, and web browser apps can record in different audio and video codecs which can lead to playback compatibility issues,
- Accepting web-uploads can increase the diversity in ingested formats,
- Some devices record in high-dynamic-range (HDR) video, which can introduce further incompatibility in playback, and
- Some files contain rotational metadata and variable frame rates, which can be challenging to handle correctly.
You have a few options for processing these files:
- Manually download them and edit them in a non-linear-editor,
- Build your own solution utilizing software such as FFMPEG, or
- Use an online service such as Dolby.io.
While manually processing files can yield excellent results, it can be ruled out fairly quickly – this process doesn’t scale well and is expensive to hire out. Free tools such as FFMPEG are powerful and can give you a lot but they are complicated to install and maintain, costly and complex to scale and support and near impossible to string together complex operations such as stitching, overlays, and audio processing.
Your other option is to use an online service for processing these videos. The Dolby.io Transcode API abstracts away the complexity of all of these operations into simple, programmatic, job-based REST API processing and conversion. It makes smart default decisions for you, optimizing for quality and speed yet it allows for more advanced configuration when required. It’s built on a modern, distributed, scalable infrastructure so that when your app takes off, Dolby.io can handle the spikes of activity seamlessly.
II.Web Conferencing or Broadcast App
Let’s explore an example app that handles conferencing or broadcast situations where your customers take calls and give advice live on your platform. Perhaps your broadcast uses the Dolby.io Communications or Streaming APIs for the back-end, or maybe it comes from another provider such as Zoom. When the show ends, you want to quickly and automatically enable a replay for viewers who missed the broadcast or who want to re-watch.
You will want to use a technology that scales well to a minimum of hundreds if not thousands of viewers seamlessly. Adaptive bitrate streaming (ABR), creates multiple resolutions and bitrates, often called a “ladder”, from your broadcast.
The technology seamlessly delivers an excellent viewing experience to users, even on less than ideal internet conditions. The HTTP Live Streaming (HLS) format of this technology provides a scalable, robust, high-quality experience.
Some challenges you may face for creating your replay:
- Creating keyframe-aligned, multi-bitrate and resolution ladders is complex to get right and can require multiple steps
- “Packaging” this ladder into the HLS format can require yet another tool to configure, support, and scale
- Adding more functionality such as burning in a company logo, stitching on an introduction, or trimming off the beginning or end of the show is difficult
The Dolby.io Transcode API has smart defaults and allows for almost no configuration, such as a simple “give me an HLS output”. Or, if you know that you only want to have 4 layers of video (eg, rungs on your ladder), you can specify 4 video layers and the API will determine what bitrates and resolutions make sense given your input video. If you’re a more advanced user and know exactly what rungs you want on your ladder down to the bitrate and resolution you also have full-control when needed.
One customer, Tractus Events, recently shared how they use the Transcode API to process their event recordings in this way and is considering a lightweight cloud-based editing engine built with our Transcode API.
Simplifying Workflows by Combining the Dolby.io APIs
As you can tell from these examples, media processing is not a straight-forward topic. I have been there personally, searching Stack Overflow for answers, seeking community advice, playing with a variety of video tools and trying to sort out “the right way” to perform these complex operations.
We’ve built the Transcode API to be simple to operate by default for the media novice, yet expose more advanced control for those who know exactly what they want. The Transcode API utilizes Dolby Hybrik which is Dolby’s enterprise grade, I like to think of it as an “industrial strength” transcoding engine. Hybrik powers some of the biggest online streaming platforms in the media and entertainment industry so you can be assured that our quality, scalability, and reliability are top notch.
Transcode + Analyze APIs
The Transcode API is even more impactful when not used in isolation. The Dolby.io Media APIs are meant to be a suite of tools that you can pick and choose from to achieve amazing media experiences.
Utilize our Analyze APIs to determine who is speaking when in a broadcast and get that information back as actionable, time-based metadata. Learn more about where periods of silence exist, loudness, quality, or what problems exist in your audio. You can use this machine-readable result information to trim out periods of silence with the Transcode API or extract a presentation from a specific person.
Transcode + Enhance APIs
Of course, if you’re dealing with user-generated-content (UGC) you will of course be dealing with background noise, inconsistent audio levels, speech artifacts such as pops, hums, and the general issues that come when dealing with mobile device microphones. Our Enhance API is developed from all of Dolby’s audio research and is designed to tackle all of these problems and deliver best-in-class audio cleanup and speech isolation.
It’s genuinely surprising to hear the difference it makes! I am happy to announce that we have combined the Transcode and Enhance APIs so you can process your audio in the same step as your file conversion.
This will help you get more done with fewer API calls – you’ll be able to call the Transcode API and apply the Enhance API configuration to each input file. Now it will be simple to take all of your testimonial videos, for instance, and normalize the format AND get clean and consistent user-audio, all in one step.
Dig in and Learn More
At Dolby.io, we’re on a mission to empower developers to create powerful, media-rich experiences without dealing with the complexity of the technology.
We leverage the decades-long experience and expertise that Dolby has built through scientific approaches leading to many innovations in the media domain. Our Communications APIs allow for real-time communication, interaction, and audio spatialization. Our Real-time Streaming API, through the acquisition of Millicast, allows for sub-second latency of broadcasts to massive audiences. Our Media APIs enable sophisticated file-based workflows providing deep analysis, the most advanced audio processing on the planet, with simple audio and video editing and conversion capabilities. This arsenal of media tools is all quite accessible with our straight-forward, pay-as-you-go pricing and our apps, samples, and guides make getting started a snap.
The Media APIs are really a suite meant to complement each other and be an extension to our Communication and Real-time Streaming APIs. I hope you’ll see the value in using them together as I do!