Audio research
The Dolby Applied AI team advances the state of the art in audio processing by
using and developing core AI techniques.
Our approach

We work in an open research environment, leveraging the latest deep learning technologies, and contribute to the advancement of AI for audio. Our inventions both fuel progress in the field and provide competitive advantage to Dolby’s products. Some of our research supports Dolby.io APIs.
Meet the Dolby Applied AI Researchers

Joan Serrà
Team Leader , Google Scholar

Jordi Pons
Researcher , Google Scholar

Gauri Jagatap
Researcher , Google Scholar

Gautam Bhattacharya
Researcher , Google Scholar

Xiaoyu Liu
Researcher , Google Scholar

Santiago Pascual
Researcher , Google Scholar

Roy Fejgin
Researcher

Publications
Full-band general audio synthesis with score-based diffusion
Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds. Such models operate on band-limited signals…
Quantitative evidence on overlooked aspects of enrollment speaker embeddings for target speaker separation
Single channel target speaker separation (TSS) aims at extracting a speaker’s voice from a mixture of multiple talkers given an enrollment utterance of that speaker. A typical deep learning TSS framework consists of an upstream model…
Adversarial permutation invariant training for universal sound separation
Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so. In this work, we complement PIT with…
Universal speech enhancement with score-based diffusion
Removing background noise from speech audio has been the subject of considerable effort, especially in recent years due to the rise of virtual communication and amateur recordings. Yet background noise is not the only unpleasant disturbance…
On loss functions and evaluation metrics for music source separation
We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation. To that end, we first survey the most representative audio source separation losses we identified…
Deep Performer: Score-to-audio music performance synthesis
Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer — a novel system for score-to-audio music…
Upsampling layers for music source separation
Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling. Also, depending on the used upsampling layer, such artifacts can either be tonal artifacts (additive high-frequency noise)…
Adversarial Auto-Encoding for Packet Loss Concealment
Communication technologies like voice over IP operate under constrained real-time conditions, with voice packets being subject to delays and losses from the network. In such cases, the packet loss concealment (PLC) algorithm reconstructs…
On tuning consistent annealed sampling for denoising score matching
Score-based generative models provide state-of-the-art quality for image and audio synthesis. Sampling from these models is performed iteratively, typically employing a discretized series of noise levels and a predefined scheme. In this note, we…
Multichannel-based learning for audio object extraction
The current paradigm for creating and deploying immersive audio content is based on audio objects, which are composed of an audio track and position metadata. While rendering an object-based production into a multichannel mix is…
Latest news and events
Join the Dolby Applied AI team
We are always looking to incorporate the greatest minds into our AI team.rnrn
Keep a good thing going
Get the latest news, events, and product updates from the Dolby.io team.
