Testing Foundation Models framework with a health-focused recipe generation app. The on-device approach is appealing but performance is rough. Taking 20+ seconds just to get recipe name and description. Same content from Claude API: 4 seconds.
I know it's beta and on-device has different tradeoffs, but this is approaching unusable territory for real-time user experience. The streaming helps psychologically but doesn't mask the underlying latency.The privacy/cost benefits are compelling but not if users abandon the feature before it completes.
Anyone else seeing similar performance? Is this expected for beta, or are there optimization techniques I'm missing?
You can probably start with profiling your app with Instruments.app, as discussed in the WWDC25 code along session (starting at 24:32). How to set up the Foundation Models instrument is detailed here.
The Foundation Models instrument provides the token numbers the models generate. From there, you can calculate how many tokens per second. The number can vary a lot, but if it is consistently much worse than 20~30/s, I'd suggest that you file a feedback report and share your report ID here.
The WWDC25 session also discusses how to use prewarm and includesSchemaInInstructions to improve performance in cases that are appropriate. You can check if that can be applied to your app.
Best,
——
Ziqiao Chen
Worldwide Developer Relations.