Recording Audio on iOS with Examples

There are a few different sets of iOS libraries that provide audio recording functionality. It can be confusing to a beginner which API to use. This article briefly shows the purpose of each of these APIs and their pros and cons.

AVAudioRecorder
AVCapture
AVAudioEngine
AudioToolbox

AVAudioRecorder

AVAudioRecorder is the simplest audio recording API in iOS. You basically tell iOS the path where you want to record to and any audio format settings, such as stereo/mono and AAC/WAV. You then can start and stop the recording at the time when you want.

The disadvantage of it is, it doesn’t provide you any audio sample buffer when it does the recording. That means you can’t analyze or process the audio before it writes to the target file. In addition, it also doesn’t support video recording.

AVAudioRecorder Example

This example startRecording(), needs to decide where to save the audio, and configure the recording settings before starting recording.

We also need to add a way to finishRecording(). We stop the recording then nullify the recording instance.

let filename = URL(string: "out.m4a")!
let settings = [AVEncoderBitRatePerChannelKey: 96000]
 
func startRecording()
    // This needs to decide where to save the audio,   
    configure the recording settings, then start
    recording.
 
    do {
        // See Apple API reference for details of settings 
        audioRecorder = try AVAudioRecorder(url: filename, settings: settings)
        audioRecorder.delegate = self
        audioRecorder.record()
    } catch {
        finishRecording()
    }
}
  
func finishRecording() {
    audioRecorder.stop()
    audioRecorder = nil
}

We assign self to be the delegate of the audio recorder. This means you need to conform your class to the AVAudioRecorderDelegate protocol to handle callbacks from AVAudioRecorder.

func audioRecorderDidFinishRecording(_ recorder: AVAudioRecorder, successfully flag: Bool) {
    // Show in UI the audio recording has stopped
    audioRecordingStopped(successfully: flag)
}
  
func audioRecorderEncodeErrorDidOccur(AVAudioRecorder, error: Error?) {
    // Show in UI, error happened in audio recording
    audioReocrding(with error: error)
}

AVCapture

AVCapture consists of AVCaptureDevice, AVCaptureDeviceInput, AVCaptureSession, and AVCaptureOutput. It provides a simple degree of input, output node graph like processing architecture. You’re able to intercept the sample buffer and do certain analysis and processing over it. It also provides the ability to do video capture.

However, it is still not designed for any serious signal analysis and processing. It is missing some desirable configurations.

AVCaptureSession Example

We need to prepare an AVCaptureSession for the audio input device. This example only shows audio capture.

self.captureSession = AVCaptureSession()
  
let queue = DispatchQueue(label: "AudioSessionQueue", attributes: [])
let captureDevice = AVCaptureDevice.default(for: AVMediaType.audio)
var audioInput : AVCaptureDeviceInput? = nil
  
do {
    try captureDevice?.lockForConfiguration()
    audioInput = try AVCaptureDeviceInput(device: captureDevice!)
    captureDevice?.unlockForConfiguration()
} catch {
    // Handle capture devices could not be set
}

We can then add input to the session.

// Add audio input
if captureSession.canAddInput(audioInput!) {
    captureSession.addInput(audioInput!)
} else {
    // throw inputsAreInvalid
}

We also need to prepare session output. In this example, we use AVCaptureAudioFileOutput, so the output is written to a file. We could also use AVCaptureAudioDataOutput, if we wanted to intercept the input audio sample buffer. The intercepted data can be used for any processing needs.

audioOutput = AVCaptureAudioFileOutput()
if captureSession.canAddOutput(audioOutput!) {
    captureSession.addOutput(audioOutput!)
} else {
    // throw outputIsInvalid
}
captureSession.startRunning()

With the capture session running, we can call functions to start and stop recording.

We assign our class as the recording delegate. This implements our class to the AVCaptureFileOutputRecordingDelegate protocol to handle the callback from AVCaptureAudioFileOutput.

func startRecording() {
    guard let captureSession = self.captureSession, captureSession.isRunning else {
        // Error, capture session is not yet ready
        return
    }
    let paths = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)
    let fileUrl = paths[0].appendingPathComponent("output.m4a")
    try? FileManager.default.removeItem(at: fileUrl)
    audioOutput!.startRecording(to: fileUrl, outputFileType: .m4a, recordingDelegate: self)
}
 
func stopRecording() {
    guard let captureSession = self.captureSession, captureSession.isRunning else {
        // Error, capture session is not yet ready
        return
    }
    audioOutput?.stopRecording()
}

AVAudioEngine

AVAudioEngine is a wrapper of the Audio Unit, a low-level software audio module in iOS/macOS. It provides a framework to manage the starting of an audio capture/recording or playback. It is highly recommended by Apple.

You dynamically create an audio graph and processing chain. You then use TAP to get a peek of the audio buffer and can analyze them. To do audio processing, you need to add a processing node to the graph. This library can also be used as an AudioUnit App Extension and act as an audio processing plug-in to other audio apps. It is highly configurable and there are some default processing nodes that are ready to be used.

The disadvantage of it is it doesn’t support video capture natively. It also requires more effort to implement and requires a deeper knowledge of audio engineering with buffer management.

AVAudioEngine Example

We need to prepare an AVAudioEngine instance and the corresponding AVAudioFile for output.

var engine = AVAudioEngine()
var file: AVAudioFile?
var player = AVAudioPlayerNode() // Optional
  
func prepareAudioOutputFile(for outputURL: URL) {
    file = AVAudioFile(forWriting: outputURL, settings: engine.inputNode.inputFormatForBus(0).settings, error: nil)
    engine.attachNode(player)
    engine.connect(player, to: engine.mainMixerNode, format: engine.mainMixerNode.outputFormatForBus(0)) //configure graph
    engine.startAndReturnError(nil)
}

We may startRecording() as the following . In this way, you may process the buffer before writing it to the output.

func startRecording() {
    engine.inputNode.installTapOnBus(0, bufferSize: 1024, format: engine.mainMixerNode.outputFormatForBus(0)) { (buffer, time) -> Void in
        file?.writeFromBuffer(buffer, error: nil)
        return
    }
}

To stopRecording() when complete you remove the input node.

func stopRecording() {
    engine.inputNode.removeTapOnBus(0)
}

Audio Toolbox

Audio Toolbox is a low-level, C programming interface in iOS/macOS. With it you can do recording, playback, and stream parsing.

It is highly configurable but can be difficult to use. It is missing the ability to manage input, output, and processing nodes. The Audio Unit Processing Graph Services are deprecated. In addition, it doesn’t support native video capture.

It doesn’t seem to be a framework that is promoted by Apple for most use cases.

Summary

If you only need to do an audio-only recording and the real-time audio analysis or processing is not necessary, then AVAudioRecorder is your best choice.

If you need some real-time audio capture analysis, but not significant processing, the AVCapture may be the best fit for audio and video.

If you need serious real-time audio capture analysis and processing. the AVAudioEngine is highly extensible and configurable.

	Sample Buffer Access	Video	Effort
AVAudioRecorder	No	No	Low
AVCapture	Yes	Yes	Mid
AVAudioEngine	Yes	No	High
Audio Toolbox	Yes	No	Very High

Recording Audio on iOS with Examples

AVAudioRecorder

AVCapture

AVAudioEngine

Audio Toolbox

Summary

Staff Engineer

Related Articles

Building a Real-Time Streaming App with WebRTC and Flutter 3

Enhance Audio by Removing Stationary Background Noise

How VEED Enabled One-Click Audio Enhancement in 20 Minutes with Dolby.io

Drive real-time interactions and engagement with sub-second latency

Developer Resources

AVAudioRecorder

AVCapture

AVAudioEngine

Audio Toolbox

Summary

Staff Engineer

Related Articles

Building a Real-Time Streaming App with WebRTC and Flutter 3

Enhance Audio by Removing Stationary Background Noise

How VEED Enabled One-Click Audio Enhancement in 20 Minutes with Dolby.io

Drive real-time interactions and engagement with sub-second latency

Developer Resources

Keep a good thing going

Platform

solutions

company

getting started