developer
media
Recording Audio on iOS with Examples
Casper Mok

There are a few different sets of iOS libraries that provide audio recording functionality. It can be confusing to a beginner which API to use. This article briefly shows the purpose of each of these APIs and their pros and cons.

  • AVAudioRecorder
  • AVCapture
  • AVAudioEngine
  • AudioToolbox

AVAudioRecorder

AVAudioRecorder is the simplest audio recording API in iOS. You basically tell iOS the path where you want to record to and any audio format settings, such as stereo/mono and AAC/WAV. You then can start and stop the recording at the time when you want.

The disadvantage of it is, it doesn’t provide you any audio sample buffer when it does the recording. That means you can’t analyze or process the audio before it writes to the target file. In addition, it also doesn’t support video recording.

AVAudioRecorder Example

This example startRecording(), needs to decide where to save the audio, and configure the recording settings before starting recording.

We also need to add a way to finishRecording().  We stop the recording then nullify the recording instance.

let filename = URL(string: "out.m4a")!
let settings = [AVEncoderBitRatePerChannelKey: 96000]
 
func startRecording()
    // This needs to decide where to save the audio,   
    configure the recording settings, then start
    recording.
 
    do {
        // See Apple API reference for details of settings 
        audioRecorder = try AVAudioRecorder(url: filename, settings: settings)
        audioRecorder.delegate = self
        audioRecorder.record()
    } catch {
        finishRecording()
    }
}
  
func finishRecording() {
    audioRecorder.stop()
    audioRecorder = nil
}

We assign self to be the delegate of the audio recorder.  This means you need to conform your class to the AVAudioRecorderDelegate protocol to handle callbacks from AVAudioRecorder.

func audioRecorderDidFinishRecording(_ recorder: AVAudioRecorder, successfully flag: Bool) {
    // Show in UI the audio recording has stopped
    audioRecordingStopped(successfully: flag)
}
  
func audioRecorderEncodeErrorDidOccur(AVAudioRecorder, error: Error?) {
    // Show in UI, error happened in audio recording
    audioReocrding(with error: error)
}

AVCapture

AVCapture consists of AVCaptureDeviceAVCaptureDeviceInputAVCaptureSession, and AVCaptureOutput. It provides a simple degree of input, output node graph like processing architecture. You’re able to intercept the sample buffer and do certain analysis and processing over it. It also provides the ability to do video capture.

However, it is still not designed for any serious signal analysis and processing. It is missing some desirable configurations.

AVCaptureSession Example

We need to prepare an AVCaptureSession for the audio input device. This example only shows audio capture.

self.captureSession = AVCaptureSession()
  
let queue = DispatchQueue(label: "AudioSessionQueue", attributes: [])
let captureDevice = AVCaptureDevice.default(for: AVMediaType.audio)
var audioInput : AVCaptureDeviceInput? = nil
  
do {
    try captureDevice?.lockForConfiguration()
    audioInput = try AVCaptureDeviceInput(device: captureDevice!)
    captureDevice?.unlockForConfiguration()
} catch {
    // Handle capture devices could not be set
}

We can then add input to the session.

// Add audio input
if captureSession.canAddInput(audioInput!) {
    captureSession.addInput(audioInput!)
} else {
    // throw inputsAreInvalid
}

We also need to prepare session output.  In this example, we use AVCaptureAudioFileOutput, so the output is written to a file. We could also use  AVCaptureAudioDataOutput, if we wanted to intercept the input audio sample buffer. The intercepted data can be used for any processing needs.

audioOutput = AVCaptureAudioFileOutput()
if captureSession.canAddOutput(audioOutput!) {
    captureSession.addOutput(audioOutput!)
} else {
    // throw outputIsInvalid
}
captureSession.startRunning()

With the capture session running, we can call functions to start and stop recording.

We assign our class as the recording delegate. This implements our class to the AVCaptureFileOutputRecordingDelegate protocol to handle the callback from AVCaptureAudioFileOutput.

func startRecording() {
    guard let captureSession = self.captureSession, captureSession.isRunning else {
        // Error, capture session is not yet ready
        return
    }
    let paths = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)
    let fileUrl = paths[0].appendingPathComponent("output.m4a")
    try? FileManager.default.removeItem(at: fileUrl)
    audioOutput!.startRecording(to: fileUrl, outputFileType: .m4a, recordingDelegate: self)
}
 
func stopRecording() {
    guard let captureSession = self.captureSession, captureSession.isRunning else {
        // Error, capture session is not yet ready
        return
    }
    audioOutput?.stopRecording()
}

AVAudioEngine

AVAudioEngine is a wrapper of the Audio Unit, a low-level software audio module in iOS/macOS. It provides a framework to manage the starting of an audio capture/recording or playback. It is highly recommended by Apple.

You dynamically create an audio graph and processing chain. You then use TAP to get a peek of the audio buffer and can analyze them. To do audio processing, you need to add a processing node to the graph. This library can also be used as an AudioUnit App Extension and act as an audio processing plug-in to other audio apps. It is highly configurable and there are some default processing nodes that are ready to be used.

The disadvantage of it is it doesn’t support video capture natively. It also requires more effort to implement and requires a deeper knowledge of audio engineering with buffer management.

AVAudioEngine Example

We need to prepare an AVAudioEngine instance and the corresponding AVAudioFile for output.

var engine = AVAudioEngine()
var file: AVAudioFile?
var player = AVAudioPlayerNode() // Optional
  
func prepareAudioOutputFile(for outputURL: URL) {
    file = AVAudioFile(forWriting: outputURL, settings: engine.inputNode.inputFormatForBus(0).settings, error: nil)
    engine.attachNode(player)
    engine.connect(player, to: engine.mainMixerNode, format: engine.mainMixerNode.outputFormatForBus(0)) //configure graph
    engine.startAndReturnError(nil)
}

We may startRecording() as the following . In this way, you may process the buffer before writing it to the output.

func startRecording() {
    engine.inputNode.installTapOnBus(0, bufferSize: 1024, format: engine.mainMixerNode.outputFormatForBus(0)) { (buffer, time) -> Void in
        file?.writeFromBuffer(buffer, error: nil)
        return
    }
}

To stopRecording() when complete you remove the input node.

func stopRecording() {
    engine.inputNode.removeTapOnBus(0)
}

Audio Toolbox

Audio Toolbox is a low-level, C programming interface in iOS/macOS. With it you can do recording, playback, and stream parsing.

It is highly configurable but can be difficult to use.  It is missing the ability to manage input, output, and processing nodes.  The Audio Unit Processing Graph Services are deprecated. In addition, it doesn’t support native video capture.

It doesn’t seem to be a framework that is promoted by Apple for most use cases.

Summary

If you only need to do an audio-only recording and the real-time audio analysis or processing is not necessary, then AVAudioRecorder is your best choice.

If you need some real-time audio capture analysis, but not significant processing, the AVCapture may be the best fit for audio and video.

If you need serious real-time audio capture analysis and processing. the AVAudioEngine is highly extensible and configurable.


Sample Buffer AccessVideoEffort
AVAudioRecorderNoNoLow
AVCaptureYesYesMid
AVAudioEngineYesNoHigh
Audio ToolboxYesNoVery High

Tags: ios
RELATED POSTS
INTERACTIVITY
MEDIA
PRODUCT
How to Remove Background Noise From Video and Communicate Better as a Team

Background noise is distracting and makes it difficult for your team to work together effectively. Learn more about the impact of audio quality on collaboration and get some tips along the way.

Jessica Zhu
|
audio
collaboration
DEVELOPER
INTERACTIVITY
Using Firebase Functions as a Conferencing Authentication Service

How to use the Google Firebase Functions to build an authentication service for the Dolby.io Interactivity API SDK.

Fabien Lavocat
|
android
firebase
ios
javascript
MEDIA
PRODUCT
Content Production in the COVID-19 Era: Video Game Audio

Video games are everywhere. Learn more about the video game audio creation process in a remote setting.

Dolby.io
|
gaming
We're happy to chat about our APIs, SDKs...or magic.