Speech

Speech Recognition Problem in iOS 18.0

It looks like Apple has added some new API(s) to SFSpeechRecognition My app, which is currently listed on App Store does feature speech recognition. Yet, trying to use it under iOS 18.0 throws errors: -[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" What happens is that after several words are transcribed and displayed, the next sentence results in previous words disappearance. That's probably what that portion of the error text - "Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" means. The problem occurs ONLY when the app is running under iOS 18.0 Even when it's compiled in Xcode 16.0 using iOS 17.5 everything works fine. Any suggestions?

App & System Services General Speech

37

9

7.2k

Nov ’24

Voice to Text on a Beta platform

I'm writing an app that uses on-device voice to text for recognising scientific terms. It works fine on my phone but now in beta my first tester cannot make it work. All the permission requests are working: p&s Mic and Speech Recognition are both now enabled on the target device where the user granted the app permission. Is there something else I'm missing? Incidentally, both my phone, the target phone and my XCode are fully up to date. Thanks.

Privacy & Security General Beta Entitlements Speech Siri and Voice

0

553

Aug ’24

What methods in what Framework to separate an audio file into two files?

I'm having trouble using SFSpeechRecognizer & SFSpeechRecognitionTask to show me the words from an audio file. I found a solution on stackoverflow to separate the audio file into smaller sizes. How would I do that programmatically using Swift for a macOS app Xcode project? I would prefer not to separate the file into smaller files. I will submit another post with more information for that.

Media Technologies Audio AudioToolbox iOS Swift Speech

3

0

729

Aug ’24

IPCAUClient.cpp:139 IPCAUClient: can't connect to server (-66748) <0x104309130>

When using the AVSpeechSynthesizer() , I get an error after a couple of seconds :"IPCAUClient.cpp:139 IPCAUClient: can't connect to server (-66748) <0x104309130>", and then it speaks the text. The second time I call speak, there is no delay and error and it speaks immediately. Where does this error and delay come from and how can I resolve it? Intialization code: self.audioSession = AVAudioSession.sharedInstance() // 2) handle audio session first, before trying to read the text do { try audioSession.setCategory(.playback, mode: .voicePrompt, options: .duckOthers) try audioSession.setActive(false) } catch let error { Logger.model.debug("❓\(error.localizedDescription)") } speechSynthesizer = AVSpeechSynthesizer() speechSynthesizer.usesApplicationAudioSession = true Speak code: let utterance = AVSpeechUtterance(string: text) utterance.preUtteranceDelay = 0.1 utterance.rate = 0.5 utterance.pitchMultiplier = 0.75 utterance.prefersAssistiveTechnologySettings = false self.speechSynthesizer.speak(utterance) The last statement gives this error message!

Media Technologies Audio Speech AVAudioSession

3

6

1.1k

Aug ’24

ios sound recognition: to what extent can developers access apple's built-in sound recognition?

hi, i am currently developing an app that has core functionalities reliant on detecting user laughter in the background. in our early stages we noticed apple's built-in sound recognition functionality. at the core, i am guessing that sound recognition requires permission from the user to access the microphone 24/7. currently, using the conventional avenue of background audio recording, a yellow indicator will be present on the top of the iphone screen indicating recording. this is not the case for sound recognition; instead. if all sound processing/recognition is kept on-device, is there any way to avoid the yellow dot and achieve sound laughter in a way that is similar to how apple's sound recognition does it? from the settings interface for sound recognition accessible to the user in the settings app, the only detectable "people" sounds are baby crying, coughing, and shouting. is it also possible to add laughter to this list somehow? thank you in advance.

Media Technologies Audio Speech Sound Analysis

2

0

850

Aug ’24

Configuring Apple Vision Pro's microphones to effectively pick up other speaker's voice

I am developing a visionOS app that captions speech in real environments. Currently, I am using Apple's built-in speech recognizer. However, when I was testing the app with a Vision Pro, the device seemed to only pick up the user's voice (in other words, the voices of the wearer of the Vision Pro device). For example, when the speech recognition task is running, and another person in front of me is talking, the system does not pick up the speech well. I tried to set the AVAudioSession to be equally sensitive to all directions: private func configureAudioSession() { do { try audioSession.setCategory(.record, mode: .measurement) try audioSession.setActive(true) if #available(visionOS 1.0, *) { let availableDataSources = audioSession.availableInputs?.first?.dataSources if let omniDirectionalSource = availableDataSources?.first(where: {$0.preferredPolarPattern == .omnidirectional}) { try audioSession.setInputDataSource(omniDirectionalSource) } } } catch { print("Failed to set up audio session: \(error)") } } And here is how I set up the speech recognition and configure the microphone inputs: private func startSpeechRecognition(completion: @escaping (String) -> Void) { do { // Cancel the previous task if it's running. if let recognitionTask = recognitionTask { recognitionTask.cancel() self.recognitionTask = nil } // The AudioSession is already active, creating input node. let inputNode = audioEngine.inputNode try inputNode.setVoiceProcessingEnabled(false) // Create and configure the speech recognition request recognitionRequest = SFSpeechAudioBufferRecognitionRequest() guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create a recognition request") } recognitionRequest.shouldReportPartialResults = true // Keep speech recognition data on device if #available(iOS 13, *) { recognitionRequest.requiresOnDeviceRecognition = true } // Create a recognition task for speech recognition session. // Keep a reference to the task so that it can be canceled. recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error in // var isFinal = false if let result = result { // Update the recognizedText completion(result.bestTranscription.formattedString) } else if let error = error { completion("Recognition error: \(error.localizedDescription)") } if error != nil || result?.isFinal == true { // Stop recognizing speech if there is a problem self.audioEngine.stop() inputNode.removeTap(onBus: 0) self.recognitionRequest = nil self.recognitionTask = nil } } // Configure the microphone input let recordingFormat = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in self.recognitionRequest?.append(buffer) } audioEngine.prepare() try audioEngine.start() } catch { completion("Audio engine could not start: \(error.localizedDescription)") } }

Media Technologies Audio Speech AVAudioSession AVAudioEngine AVFoundation

0

951

Jul ’24

Is there a memory leak in localspeechrecognition?

Hello! I have noticed this in Sonoma and in the betas for Sequoia, the ARM variants. I am using the example from https://github.com/sveinbjornt/hear?tab=readme-ov-file in an attempt to cobble together an all-in-one transcription and low-level grammar checker utilizing LanguageTool. I have noticed that the ram usage, specifically the swap, just keeps on climbing while it is processing an audio file. It is... quite amazing to see exactly how much swap the dang thing can use in a pinch. Frighteningly so considering the Mini I am using only has 256gb of storage. Throw an eight hour mp3 audiobook at the process and see for yourself. I am aware that localspeechrecognition wasn't really designed with the idea that people will be throwing audio files at it, so it is understandable that it wouldn't be equipped to gracefully handle this situation. I am a novice programmer here. Seriously - this is my first major stab at programming since dabbling with Qbasic back in elementary school. Thus, this question: if there is a memory leak, is there a way to shunt the swap being used by the app to an external drive? I am willing to take the performance hit if it keeps the internal SSD from paying the ferryman sooner than expected due to excessive swap usage. Thanks!

App & System Services General Speech

2

0

735

Jul ’24

Random crash from AVFAudio library

Hi everyone ! I'm getting random crashes when I'm using the Speech Recognizer functionality in my app. This is an old bug (for 8 years on Apple Forums) and I will really appreciate if anyone from Apple will be able to find a fix for this crashes. Can anyone also help me please to understand what could I do to keep the Speech Recognizer functionality still available in my app, but to avoid this crashes (if there is any other native library available or a CocoaPod library). Here is my code and also the crash log for it. Code: func startRecording() { startStopRecordBtn.setImage(UIImage(#imageLiteral(resourceName: "microphone_off")), for: .normal) if UserDefaults.standard.bool(forKey: Constants.darkTheme) { commentTextView.textColor = .white } else { commentTextView.textColor = .black } commentTextView.isUserInteractionEnabled = false recordingLabel.text = Constants.recording if recognitionTask != nil { recognitionTask?.cancel() recognitionTask = nil } let audioSession = AVAudioSession.sharedInstance() do { try audioSession.setCategory(AVAudioSession.Category.record) try audioSession.setMode(AVAudioSession.Mode.measurement) try audioSession.setActive(true, options: .notifyOthersOnDeactivation) } catch { showAlertWithTitle(message: Constants.error) } recognitionRequest = SFSpeechAudioBufferRecognitionRequest() let inputNode = audioEngine.inputNode guard let recognitionRequest = recognitionRequest else { fatalError(Constants.error) } recognitionRequest.shouldReportPartialResults = true recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in var isFinal = false if result != nil { self.commentTextView.text = result?.bestTranscription.formattedString isFinal = (result?.isFinal)! } if error != nil || isFinal { self.audioEngine.stop() inputNode.removeTap(onBus: 0) self.recognitionRequest = nil self.recognitionTask = nil self.startStopRecordBtn.isEnabled = true } }) let recordingFormat = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {[weak self] (buffer: AVAudioPCMBuffer, when: AVAudioTime) in // CRASH HERE self?.recognitionRequest?.append(buffer) } audioEngine.prepare() do { try audioEngine.start() } catch { showAlertWithTitle(message: Constants.error) } } Here is the crash log: Thanks for very much for reading this !

Machine Learning & AI General Speech AVFoundation

3

0

1k

Sep ’24

SFSpeechRecognitionResult discards previous transcripts with on-device option set to true

Hi everyone, I might need some help with on-device recognition. It seems that the speech recognition task will discard whatever it has transcribed after a new sentence starts (or it believes it becomes a new sentence) during a single audio session, with requiresOnDeviceRecognition is set to true. This doesn't happen with requiresOnDeviceRecognition set to false. System environment: macOS 14 with Xcode 15, deploying to iOS 17 Thank you all!

Machine Learning & AI General Speech

13

4

2.2k

Oct ’24

Error throws while using the speech recognition service in my app

Recently I updated to Xcode 14.0. I am building an iOS app to convert recorded audio into text. I got an exception while testing the application from the simulator(iOS 16.0). [SpeechFramework] -[SFSpeechRecognitionTask handleSpeechRecognitionDidFailWithError:]_block_invoke Ignoring subsequent recongition error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" Error Domain=kAFAssistantErrorDomain Code=1107 "(null)" I have to know what does the error code means and why this error occurred.

Programming Languages Swift iOS Speech Swift Debugging

20

3

12k

Feb ’25

iOS 15 - AVSpeechSynthesizerDelegate didCancel not getting called

in iOS 15, on stopSpeaking of AVSpeechSynthesizer, didFinish delegate method getting called instead of didCancel which is working fine in iOS 14 and below version.

Machine Learning & AI General Speech

7

2

1.5k

Sep ’24

Post

Replies

Boosts

Views

Activity

Speech

Posts under Speech tag

Post

Replies

Boosts

Views

Activity