Hi everyone,
I’m working on a project that involves streaming audio over WebSockets, and I need to compress the audio to reduce bandwidth usage. I’m currently using AVAudioEngine to capture and process audio in PCM format (AVAudioPCMBuffer), but I want to compress the buffer into Opus (or another efficient codec) before sending it over the network.
Has anyone worked with compressing an AVAudioPCMBuffer into Opus format within a tap on the inputNode, or could you recommend the best approach for compressing the PCM buffer into a different format? I haven’t been able to find a working solution for this.
Any advice or code examples would be greatly appreciated!
Thanks in advance,
Ondřej
--
My current code without the compression:
inputNode.installTap(onBus: .zero, bufferSize: 1440, format: nil) { [weak self] buffer, time in
guard let self else {
return
}
// 1. Send data
// a) Convert the buffer into the desired format
if let outputBuffer = buffer.convert(toFormat: Self.websocketInputFormat) {
// b) Use the converted buffer
// TODO: compress it into a different format
if let data = outputBuffer.convertToData() {
self.sendAudio(data)
}
}
// 2. Get sound level
self.visualizeRecorderBuffer(buffer)
}
func convert(toFormat outputFormat: AVAudioFormat) -> AVAudioPCMBuffer? {
let outputFrameCapacity = AVAudioFrameCount(
round(Double(frameLength) * (outputFormat.sampleRate / format.sampleRate))
)
guard
let outputBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: outputFrameCapacity),
let converter = AVAudioConverter(from: format, to: outputFormat)
else {
return nil
}
converter.convert(to: outputBuffer, error: nil) { packetCount, status in
status.pointee = .haveData
return self
}
return outputBuffer
}
static private let websocketInputFormat = AVAudioFormat(
commonFormat: .pcmFormatInt16,
sampleRate: 16000,
channels: 1,
interleaved: false
)!
AVAudioEngine
RSS for tagUse a group of connected audio node objects to generate and process audio signals and perform audio input and output.
Posts under AVAudioEngine tag
55 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
Hi! I'm developing a music player app that interchanges between ApplicationMusicPlayer and AVAudioEngine. I'm facing an issue when switching from playback via ApplicationMusicPlayer to AVAudioEngine while the app is in background. Based on testing, it seems like the issue has to do with being unable to set audio focus in background, causing error AVAudioSessionErrorCodeCannotInterruptOthers.
I would like to check if ApplicationMusicPlayer has its own audio focus separated from the app's own audio focus. If it is, is there anything that I can do to ensure that ApplicationMusicPlayer returns focus to the app?
(I notice that the issue does not occur if we are moving playback from AVAudioEngine to ApplicationMusicPlayer. Not sure why the opposite does not work)
I have a visionOS app that plays audio using AVAudioEngine and presents both a window and an immersive space. If I close the window, the audio session gets interrupted and attempting to restart the session and audio engine has no effect. I need to dismiss the app, then reopen it, which reopens the main window, in order for audio to start playing again.
This is in all visionOS 2 betas. Note that I have background audio enabled for my app.
I'm experiencing stuttering every time I record something with my iOS app on iOS 18 beta. The code ran fine on previous iOS versions.
The stuttering occurs for the first 2 seconds. Here's an example:
https://soundcloud.com/thomas-walther-219010679/ios-18-stuttering
The way I set up AVAudioEngine and AVAudioSession was vetted quite thoroughly during sessions at WWDC '23. Here is how the engine and the tap is configured:
let engine = AVAudioEngine()
let recorderNode = AVAudioMixerNode()
engine.attach(recorderNode)
engine.connect(engine.mainMixerNode, to: engine.outputNode, format: engine.outputNode.inputFormat(forBus: 0))
engine.connect(recorderNode, to: engine.mainMixerNode, format: recordingOutputFormat)
engine.connect(engine.inputNode, to: recorderNode, format: engine.inputNode.inputFormat(forBus: 0))
let bufferSize: AVAudioFrameCount = 4096
recorderNode.installTap(onBus: 0, bufferSize: bufferSize, format: nil) { [weak self] buffer, time in
guard let self = self else { return }
do {
// Write recording to disk
try audioFile.write(buffer)
} catch {
// ...
}
}
I tried setting a different buffer size, but with no luck. I also can't see any hangs in Instruments. Do you have any pointers on how to debug this?
I need to find a way to allow recording from the mic while outputting two different sound streams to two different devices (speaker and headphones).
I've done a fair bit of reading around using AVAudioSession.Category.multiroute but haven't found any modern examples. @theanalogkid posted a nice example using obj-C nine years ago, but others have noted that the code isn't readily translatable to Swift.
To make matters worse, this is one of the very few examples on how to properly use multirouting. The official documentation is lacking, to say the least, and the WWDC 2012 session is, well, old enough to attend middle school and be a Taylor Swift fan, but definitely not in Swift. The few relevant forum posts here are spread over this middle schooler's life span and likely outdated, with most having no responses other than the poster's own plightful echo. They don't paint a pretty picture of .multiroute's health, with a recent poster noting that volume buttons don't work in this mode, contacting DTS and finding that there's no fix; another finding that it just doesn't work for certain devices, etc.
Audio is giving me enough of a headache so I'd like to avoid slogging through this if possible. .multiroute feels like the developer mode of AVAudioSession, but without documentation.
tl;dr - Without using .multiroute, is there a way to allow an app to output two different devices while simultaneously recording audio? If .multiroute is the only way to achieve this, can someone give me a quick rundown of how this category works?
I am developing a visionOS app that captions speech in real environments. Currently, I am using Apple's built-in speech recognizer. However, when I was testing the app with a Vision Pro, the device seemed to only pick up the user's voice (in other words, the voices of the wearer of the Vision Pro device). For example, when the speech recognition task is running, and another person in front of me is talking, the system does not pick up the speech well.
I tried to set the AVAudioSession to be equally sensitive to all directions:
private func configureAudioSession() {
do {
try audioSession.setCategory(.record, mode: .measurement)
try audioSession.setActive(true)
if #available(visionOS 1.0, *) {
let availableDataSources = audioSession.availableInputs?.first?.dataSources
if let omniDirectionalSource = availableDataSources?.first(where: {$0.preferredPolarPattern == .omnidirectional}) {
try audioSession.setInputDataSource(omniDirectionalSource)
}
}
} catch {
print("Failed to set up audio session: \(error)")
}
}
And here is how I set up the speech recognition and configure the microphone inputs:
private func startSpeechRecognition(completion: @escaping (String) -> Void) {
do {
// Cancel the previous task if it's running.
if let recognitionTask = recognitionTask {
recognitionTask.cancel()
self.recognitionTask = nil
}
// The AudioSession is already active, creating input node.
let inputNode = audioEngine.inputNode
try inputNode.setVoiceProcessingEnabled(false)
// Create and configure the speech recognition request
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create a recognition request") }
recognitionRequest.shouldReportPartialResults = true
// Keep speech recognition data on device
if #available(iOS 13, *) {
recognitionRequest.requiresOnDeviceRecognition = true
}
// Create a recognition task for speech recognition session.
// Keep a reference to the task so that it can be canceled.
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error in
// var isFinal = false
if let result = result {
// Update the recognizedText
completion(result.bestTranscription.formattedString)
} else if let error = error {
completion("Recognition error: \(error.localizedDescription)")
}
if error != nil || result?.isFinal == true {
// Stop recognizing speech if there is a problem
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
}
}
// Configure the microphone input
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
} catch {
completion("Audio engine could not start: \(error.localizedDescription)")
}
}
Description:
I am developing a recording-only application that supports background recording using AVAudioEngine. The app segments the recording into 60-second files for further processing. For example, a 10-minute recording results in ten 60-second files.
Problem:
The application functions as expected in the background. However, after the app receives an interruption (such as a phone call) and the interruption ends, I can successfully restart the recording. The problem arises when the app then transitions to the background; it fails to restart the recording. Specifically, after ending the call and transitioning the app to the background, the app encounters an error and is unable to restart AVAudioSession and AVAudioEngine. The only resolution is to close and restart the app, which is not ideal for user experience.
Steps to Reproduce:
1. Start recording using AVAudioEngine.
2. The app records and saves 60-second segments.
3. Receive an interruption (e.g., an incoming phone call).
4. End the call.
5. Transition the app to the background.
6. Transition the app to the foreground and the session will be activated again.
7. Attempt to restart the recording.
Expected Behavior:
The app should resume recording seamlessly after the interruption and background transition.
Actual Behavior:
The app fails to restart AVAudioSession and AVAudioEngine, resulting in a continuous error. The recording cannot be resumed without closing and reopening the app.
How I’m Starting the Recording:
Configuration:
internal func setAudioSessionCategory() {
do {
try audioSession.setCategory(
.playAndRecord,
mode: .default,
options: [.defaultToSpeaker, .mixWithOthers, .allowBluetooth]
)
} catch {
debugPrint(error)
}
}
internal func setAudioSessionActivation() {
if UIApplication.shared.applicationState == .active {
do {
try audioSession.setPrefersNoInterruptionsFromSystemAlerts(true)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
if audioSession.isInputGainSettable {
try audioSession.setInputGain(1.0)
}
try audioSession.setPreferredIOBufferDuration(0.01)
try setBuiltInPreferredInput()
} catch {
debugPrint(error)
}
}
}
Starting AVAudioEngine:
internal func setupEngine() {
if callObserver.onCall() { return }
inputNode = audioEngine.inputNode
audioEngine.attach(audioMixer)
audioEngine.connect(inputNode, to: audioMixer, format: AVAudioFormat.validInputAudioFormat(inputNode))
}
internal func beginRecordingEngine() {
audioMixer.removeTap(onBus: 0)
audioMixer.installTap(onBus: 0, bufferSize: 1024, format: AVAudioFormat.validInputAudioFormat(inputNode)) { [weak self] buffer, _ in
guard let self = self, let file = self.audioFile else { return }
write(file, buffer: buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
recordingTimer = Timer.scheduledTimer(withTimeInterval: recordingInterval, repeats: true) { [weak self] _ in
self?.handleRecordingInterval()
}
} catch {
debugPrint(error)
}
}
On the try audioEngine.start() call, I receive error code 561145187 in the catch block.
Logs/Error Messages:
• Error code: 561145187
Request:
I would appreciate any guidance or solutions to ensure the app can resume recording after interruptions and background transitions without requiring a restart.
Thank you for your assistance.
Topic:
Media Technologies
SubTopic:
Audio
Tags:
AVAudioNode
AVAudioSession
AVAudioEngine
AVFoundation
I'm building an app that will allow users to record voice notes. The functionality of all that is working great; I'm trying to now implement changes to the AudioSession to manage possible audio streams from other apps. I want it so that if there is audio playing from a different app, and the user opens my app; the audio keep playing. When we start recording, any third party app audio should stop, and can then can resume again when we stop recording.
This is my main audio setup code:
private var audioEngine: AVAudioEngine!
private var inputNode: AVAudioInputNode!
func setupAudioEngine() {
audioEngine = AVAudioEngine()
inputNode = audioEngine.inputNode
audioPlayerNode = AVAudioPlayerNode()
audioEngine.attach(audioPlayerNode)
let format = AVAudioFormat(standardFormatWithSampleRate: AUDIO_SESSION_SAMPLE_RATE, channels: 1)
audioEngine.connect(audioPlayerNode, to: audioEngine.mainMixerNode, format: format)
}
private func setupAudioSession() {
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.playAndRecord, mode: .default, options: [.defaultToSpeaker, .allowBluetooth])
try audioSession.setPreferredSampleRate(AUDIO_SESSION_SAMPLE_RATE)
try audioSession.setPreferredIOBufferDuration(0.005) // 5ms buffer for lower latency
try audioSession.setActive(true)
// Add observers
setupInterruptionObserver()
} catch {
audioErrorMessage = "Failed to set up audio session: \(error)"
}
}
This is all called upon app startup so we're ready to record whenever the user presses the record button.
However, currently when this happens, any outside audio stops playing.
I isolated the issue to this line: inputNode = audioEngine.inputNode
When that's commented out, the audio will play -- but I obviously need this for recording functionality.
Is this a bug? Expected behavior?
Hi there community,
First and foremost, a big thank you to everyone who takes the time to read this.
TL;DR: How, if even possible, can I record multiple audio streams simultaneously on an iOS application (iPad/iPhone)?
I'm working on a recorder for the iPad to gather data for a machine learning project focused on speech recognition. Our goal is to capture extensive speech data, which requires recording from multiple microphones. Specifically, I need to record from all mics connected to our Scarlett 4i4 audio interface and, most importantly, also record from the built-in mic on the iPad or iPhone at the same time.
As a newcomer to Swift development, I initially explored AVAudioRecorder. However, I quickly realized that it only supports one active audio node at a time, making multi-channel recording impossible. (perhaps you can proof me wrong, would make my day) Next, I transitioned to using AVAudioEngine, but encountered the same limitation: I couldn't manage to get input nodes for both the built-in mic and the Scarlett interface channels simultaneously. The application started behaving oddly, often resulting in identical audio data being recorded across all files.
Determined to find a solution, I delved deeper into the Core Audio framework, specifically using Audio Toolbox. My approach involved creating and configuring multiple Audio Units, each corresponding to a different audio input device. Here's a brief overview of my current implementation:
Listing Available Input Devices: I used AVAudioSession to enumerate all available input devices.
Creating Audio Units: For each device, I created an Audio Unit and attempted to configure it for recording.
Setting Up Callbacks: I set up input and output callbacks to handle the audio processing.
Despite my efforts over the last few days, I haven't had much success. The callbacks for the Audio Units don't seem to be invoked correctly, and I'm struggling to achieve simultaneous multi-channel recording. Below is a snippet of my latest attempt:
let audioUnitCallback: AURenderCallback = { (
inRefCon: UnsafeMutableRawPointer,
ioActionFlags: UnsafeMutablePointer<AudioUnitRenderActionFlags>,
inTimeStamp: UnsafePointer<AudioTimeStamp>,
inBusNumber: UInt32,
inNumberFrames: UInt32,
ioData: UnsafeMutablePointer<AudioBufferList>?
) -> OSStatus in
guard let ioData = ioData else {
return noErr
}
print("Input callback invoked")
let audioUnit = inRefCon.assumingMemoryBound(to: AudioUnit.self).pointee
var bufferList = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: 1,
mDataByteSize: 0,
mData: nil
)
)
let status = AudioUnitRender(audioUnit, ioActionFlags, inTimeStamp, inBusNumber, inNumberFrames, &bufferList)
if status != noErr {
print("AudioUnitRender failed: \(status)")
return status
}
// Copy rendered data to output buffer
let buffer = UnsafeMutableAudioBufferListPointer(ioData)[0]
buffer.mData?.copyMemory(from: bufferList.mBuffers.mData!, byteCount: Int(bufferList.mBuffers.mDataByteSize))
buffer.mDataByteSize = bufferList.mBuffers.mDataByteSize
print("Rendered audio data")
return noErr
}
let outputCallback: AURenderCallback = { (
inRefCon: UnsafeMutableRawPointer,
ioActionFlags: UnsafeMutablePointer<AudioUnitRenderActionFlags>,
inTimeStamp: UnsafePointer<AudioTimeStamp>,
inBusNumber: UInt32,
inNumberFrames: UInt32,
ioData: UnsafeMutablePointer<AudioBufferList>?
) -> OSStatus in
guard let ioData = ioData else {
return noErr
}
print("Output callback invoked")
// Process the output data if needed
return noErr
}
In essence, I'm stuck and in need of guidance. Has anyone here successfully implemented multi-channel recording on iOS, especially involving both built-in microphones and external audio interfaces? Any shared experiences, insights, or suggestions on how to proceed would be immensely appreciated.
Thank you once again for your time and assistance!
Topic:
Media Technologies
SubTopic:
Audio
Tags:
AudioToolbox
AVAudioNode
AVAudioSession
AVAudioEngine
Now my main app is already invoked voip callkit, I would want to invoke voip on iWatch app, but I have some issue:
1、How to deal audio data and network connect of iWatch voip ? can we depends on iPhone app?
2、How to use voip callkit on iWatch that only via bluetooth connect ?
3、If main app is already support voip callkit, how to support callkit for iwatch? Do we need to repeat and independently implement callkit, network, and audio on iWatch?
4、how to add support dial number on iWatch use by the thirdpart app? the case is the same with on the iPhone use, user can send dial by system call record .
Any help is appreciated, thanks in advance.
Hi! I have a music app using AVAudioEngine. Right now, I have set it up to play multi channel tracks and show "Multichannel" in the volume controls. However, I am unable to figure out how to get it to use Dolby Atmos.
Is there something that needs to be enabled? Is it even possible for AVAudioEngine? I saw some apps that are able of playing with Dolby Atmos, but they do not have EQ feature, so I'm guessing that they are not using AVAudioEngine.
I'm using AVAudioEngine to play AVAudioPCMBuffers. I'd like to synchronize some events with the playback. For example if the audio's frame position is >= some point && less than some point trigger some code.
So I'm looking at - (void)installTapOnBus:(AVAudioNodeBus)bus bufferSize:(AVAudioFrameCount)bufferSize format:(AVAudioFormat * __nullable)format block:(AVAudioNodeTapBlock)tapBlock;
Now I have frame positions calculated (predetermined before audio is scheduled I already made all necessary computations) . So I just need to fire code at certain points during playback:
[playerNode installTapOnBus:bus
bufferSize:bufferSize
format:format
block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
//Inspect current audio here and fire...
}];
[playerNode scheduleBuffer:fullbuffer
atTime:startTime
options:0
completionCallbackType:AVAudioPlayerNodeCompletionDataPlayedBack
completionHandler:^(AVAudioPlayerNodeCompletionCallbackType callbackType)
{
// some code is here, not important to this question.
}];
The problem I'm having is figuring out at what point in full buffer I'm at within the tap block. The tap block passes chunks (not the full audio buffer). I tried using the when parameter of the block to calculate the frame position relative to the entire audio but have be unsuccessful so far. I'm assuming the when parameter is relative to the buffer passed in the tap block (not my entire audio buffer I scheduled).
Not installing a tap and just using a timer before scheduling my fullBuffer has given me good results but I'd rather avoid using a timer if possible and use sample time.
Topic:
Media Technologies
SubTopic:
Audio
Tags:
AVAudioNode
AVAudioSession
AVAudioEngine
AVFoundation
Hi, I'm relatively new to iOS development and kindly ask for some feedback on a strategy to achieve this desired behavior in my app.
My Question:
What would be the best strategy for sound effect playback when an app is in the background with precise timing? Is this even possible?
Context:
I created a basic countdown timer app (targeting iOS 17 with Swift/SwiftUI.). Countdown sessions can last up to 30-60 mins. When the timer is started it progresses through a series of sub-intervals and plays a short sound for each one. I used AVAudioPlayer and everything works fine when the app is in the foreground. I'm considering switching to AVAudioEngine b/c precise timing is very important and the AIs tell me this would have better precision.
I'm already setting "App plays audio or streams audio/video using AirPlay" in my Plist, and have configured:
AVAudioSession.sharedInstance().setCategory(.playback, mode: .default, options: .mixWithOthers)
Curiously, when testing on my iPhone 13 mini, sounds sometimes still play when the app is in the background, but not always.
What I've considered:
Background Tasks: Would they make any sense for this use-case? Seems like not if the allowed time is short & limited by the system.
Pre-scheduling all Sounds: Not sure this would even work and seems like a lot of memory would be needed (could be hundreds of intervals).
ActivityKit Alerts: works but with a ~50ms delay which is too long for my purposes.
Pre-Render all SFX to 1 large audio file: Seems like a lot of work and processing time and probably not worth it. I hope there's a better solution.
I'd really appreciate any feedback.
I'm using an AVAudioConverter object to decode an OPUS stream for VoIP. The decoding itself works well, however, whenever the stream stalls (no more audio packet is available to decode because of network instability) this can be heard in crackling / abrupt stop in decoded audio. OPUS can mitigate this by indicating packet loss by passing a null pointer in the C-library to
int opus_decode_float (OpusDecoder * st, const unsigned char * data, opus_int32 len, float * pcm, int frame_size, int decode_fec), see https://opus-codec.org/docs/opus_api-1.2/group__opus__decoder.html#ga9c554b8c0214e24733a299fe53bb3bd2.
However, with AVAudioConverter using Swift I'm constructing an AVAudioCompressedBuffer like so:
let compressedBuffer = AVAudioCompressedBuffer(
format: VoiceEncoder.Constants.networkFormat,
packetCapacity: 1,
maximumPacketSize: data.count
)
compressedBuffer.byteLength = UInt32(data.count)
compressedBuffer.packetCount = 1
compressedBuffer.packetDescriptions!
.pointee.mDataByteSize = UInt32(data.count)
data.copyBytes(
to: compressedBuffer.data
.assumingMemoryBound(to: UInt8.self),
count: data.count
)
where data: Data contains the raw OPUS frame to be decoded.
How can I specify data loss in this context and cause the AVAudioConverter to output PCM data whenever no more input data is available?
More context:
I'm specifying the audio format like this:
static let frameSize: UInt32 = 960
static let sampleRate: Float64 = 48000.0
static var networkFormatStreamDescription =
AudioStreamBasicDescription(
mSampleRate: sampleRate,
mFormatID: kAudioFormatOpus,
mFormatFlags: 0,
mBytesPerPacket: 0,
mFramesPerPacket: frameSize,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0
)
static let networkFormat =
AVAudioFormat(
streamDescription:
&networkFormatStreamDescription
)!
I've tried 1) setting byteLength and packetCount to zero and 2) returning nil but setting .haveData in the AVAudioConverterInputBlock I'm using with no success.
I’m using AVAudioEngine to get a stream of AVAudioPCMBuffers from the device’s microphone using the usual installTap(onBus:) setup.
To distribute the audio stream to other parts of the program, I’m sending the buffers to a Combine publisher similar to the following:
private let publisher = PassthroughSubject<AVAudioPCMBuffer, Never>()
I’m starting to suspect I have some kind of concurrency or memory management issue with the buffers, because when consuming the buffers elsewhere I’m getting a range of crashes that suggest some internal pointer in a buffer is NULL (specifically, I’m seeing crashes in vDSP.convertElements(of:to:) when I try to read samples from the buffer).
These crashes are in production and fairly rare — I can’t reproduce them locally.
I never modify the audio buffers, only read them for analysis.
My question is: should it be possible to put AVAudioPCMBuffers into a Combine pipeline? Does the AVAudioPCMBuffer class not retain/release the underlying AudioBufferList’s memory the way I’m assuming? Is this a fundamentally flawed approach?