While we were incorporating custom audio processing into the Web SDK of Dolby.io’s Communications API, we knew that we needed to use something to make the audio experience as great as possible to show off the Dolby Difference. Typically, web libraries for conference calls, video encoding, and audio transmission use WebRTC, which allows only the standard processing and encoding of media built into the browser. However, at Dolby we have our own technology for audio communications – Dolby Voice – which in many cases can provide a superior experience over open-source solutions built into browsers. This is provided as components for various mobile and desktop platforms and can already be experienced when using native Dolby.io applications.
A Browser Runtime for Dolby Voice
Although extensive work has been carried out by browser vendors into optimizing JavaScript runtimes, the JavaScript language imposes practical limitations and in particular is far from perfect for scenarios involving heavy numeric computations, of the kind that take up the bulk of media processing.
WebAssembly was developed specifically to fill this gap. It’s a web standard specifying a virtual machine that can be run in the browser and controlled by JavaScript APIs. When compared to previously available browser VMs like the JVM or in fact the JavaScript runtime itself, there are some notable differences here. First, WebAssembly has very limited access to external resources, effectively being able only to communicate with the controlling JavaScript code, in the form of exporting functions and registered callbacks. This is to avoid security problems experienced in the past where each new VM became another attack surface for malicious code. The second difference is that it’s what might be called a low-level VM, matching closely the typical native architectures of client machines. It operates on linear byte-oriented memory and internally uses only simple integral and floating point types. It does not interface directly with JavaScript’s object model, exposing its input and output only as numbers or byte buffers, leaving it to the JavaScript side to marshal and unmarshal structured data. For this reason it might not be a perfect solution to replacing your JavaScript application logic, but is just the right thing if you want to perform intensive arithmetic calculations on byte buffers containing numeric arrays. An additional benefit compared to JavaScript is that the low-level model makes it suitable for real-time processing, without having to contend with e.g. stop-the-world garbage collection cycles.

This design also allows the binary WebAssembly bytecode to be efficiently translated into the client machine’s native code, but also for that bytecode itself to be compiled from higher-level programming languages, in particular so-called native languages like C, C++ or Rust. Given that Dolby Voice code is written in C and C++, we can run our algorithms in the browser using this technology
Enabling this in Dolby.io
In order to use WASM technology in your Dolby.io Communications project, you must be using the WebSDK version 3.5 or greater. See what is featured in this release here, which includes some audio examples showcasing the difference of using standard WebRTC and Dolby Voice with WASM.
To install or upgrade the SDK, we suggest using NPM, running the following command:
npm i @voxeet/voxeet-web-sdk
Alternatively, you can use a CDN to run the SDK directly in your HTML. Note however that this URL does not contain any @ version designator, meaning that this will always use the latest version of the SDK, which may contain unexpected changes if not actively maintained.
<script type="text/javascript" src="https://unpkg.com/@voxeet/voxeet-web-sdk"></script>
After upgrading your SDK version, you will need to enable the “dvwc” parameter in the Conference Join Options:
// Create a Dolby Voice conference
const createOptions = {
alias: "wasm",
params: {
dolbyVoice: true
}
};
const conference = await VoxeetSDK.conference.create(createOptions);
// Join the Dolby Voice conference using Dolby Voice Codec
const joinOptions = {
dvwc: true
};
await VoxeetSDK.conference.join(conference, joinOptions);
Note that the DVC codec is supported only on Chrome and Edge on desktop operating systems. Otherwise, the SDK triggers the DolbyVoiceNotSupported exception. For more information, see the Supported Environments document.
Sharing Audio Data Within the Browser
Running the algorithms however is only half the work. We need to assure that they are hooked up to audio devices and exchange data with the server. Until recently the only browser API which allowed application code to access input and output audio data was the deprecated ScriptProcessorNode. This has the disadvantage of performing its processing in the browser’s main event loop, which could also be busy doing other things. If it’s sufficiently busy, the output device eventually runs out of audio to play, resulting in gaps and glitches. However in the last couple of months the multi-threaded alternative – AudioWorklet – got added to Safari, becoming thus available on all major browsers.

Rendering audio in a multi-threaded web application generates its own set of problems. Web APIs provide two methods of communicating between threads: SharedArrayBuffer (SAB) and message channels. SAB coupled with atomic operations provides a concurrency model similar to C or C++ multithreading, with various threads operating on common memory, which can be very efficient, if somewhat unsafe and hard to manage. The bigger problem is that it’s not universally available across browsers, after it was discovered to be vulnerable to Spectre and Meltdown exploits and disabled in most browsers. It has since come back, but not everywhere, and as a potential security risk is still subject to additional restrictions, namely requiring cross-origin isolation.
The message-passing API on the other hand is universally supported, but again comes with some problems. Sending and receiving objects between threads may cause the JavaScript heap to fill up, triggering unacceptably long pauses in audio processing. We manage these by careful memory management and some audio post-processing to get rid on unpleasant glitches which happen when audio playback drops immediately to silence. In other use cases, like streaming playback, many of these problems could simply be solved by buffering a few hundred milliseconds of audio, however for communication scenarios we want to keep latency to a minimum, as it negatively impacts the experience, especially for multi-person conference calls, so we have to resort to more sophisticated solutions instead.
Networking with WASM
On the network side, the main problem is that the preferred solution, which would be to allow the application code direct access to UDP or DTLS datagrams, simply isn’t there. Solutions typically used for browser-server communication, HTTP/2 and WebSockets, are based on TCP protocol, which means they suffer from head-of-line blocking problem – losing or delaying a packet means subsequent data on the connection is not available to the application until the missing packet is received, retransmitting it if necessary. This is the opposite of what we want for voice communications – Dolby Voice algorithms are designed to accept data from the server as soon as it arrives, concealing gaps if necessary, but keeping latency low.
Our best alternative so far is using WebRTC’s Data Channel feature, which allows transmission of arbitrary data, which can be configured to receive messages out of order and drop them rather than retransmit. This performs significantly better than TCP-based communication in general, but the protocol stack on which it runs isn’t really optimized for the kind of high-volume real-time data we are using it for, and can misbehave quite badly if the network becomes sufficiently lossy, even temporarily. This behavior is another thing we needed to compensate for.
Kranky Geek and Conclusions
The areas of media processing and real-time communication in the browser is experiencing a lot of interest right now and seeing many exciting innovations. For the upcoming implementation of Dolby Voice in our Communications API Web SDK, our priority was to create a solution that works today across all major browsers. There are alternatives that solve some of these problems and potentially allow an even better experience, but they are currently only available only on one or a subset of browsers, with Google Chrome typically taking the lead. These offer a way forward to further improve on our implementation. As things stand, we are seeing significant improvement over standard WebRTC audio quality with Dolby Voice in the Web SDK using the power of Web Assembly.
If you want to learn more you can see the replay of our talk at Kranky Geek 2021 below, and read our WebSDK 3.5 Announcement.