I'm experimenting with downloading an audio file of spoken content, using the Speech framework to transcribe it, then using FoundationModels to clean up the formatting to add paragraph breaks and such. I have this code to do that cleanup:
private func cleanupText(_ text: String) async throws -> String? {
print("Cleaning up text of length \(text.count)...")
let session = LanguageModelSession(instructions: "The content you read is a transcription of a speech. Separate it into paragraphs by adding newlines. Do not modify the content - only add newlines.")
let response = try await session.respond(to: .init(text), generating: String.self)
return response.content
}
The content length is about 29,000 characters. And I get this error:
InferenceError::inferenceFailed::Failed to run inference: Context length of 4096 was exceeded during singleExtend..
Is 4096 a reference to a max input length? Or is this a bug?
This is running on an M1 iPad Air, with iPadOS 26 Seed 1.
Is 4096 a reference to a max input length? Or is this a bug?
That is not a bug. The framework has a context size limit, which is 4096 as of today, meaning that the token number of the whole session needs to be no more than 4096.
As @ziopiero mentioned, a token is not equal to a character or word in the input. How to calculate tokens depends on the tokenizer. I'd assume 3~4 characters per token in English, and one character per token in Japanese or Chinese.
When hitting the context size limit, as shown in the provided error message, consider shortening the prompt in a creative way, or starting a new session if the curent session has multiple prompts and responses. In your case, maybe splitting your input into smaller chunks can help?
Best,
——
Ziqiao Chen
Worldwide Developer Relations.