Unpredictable performance when using structured output

Question

Created 1w

Replies 1

Boosts 0

Participants 2

Hey, When generating responses with structured output and non-streaming API, it sometimes takes 3s, sometimes 10-20s. I am firing that request subsequently while testing the app. Is this by design, or any place I can learn more about what contributes to such variation?

Boost

Answer 1

DTS Engineer OP

Apple

1w

The models uses Speculative decoding under the hood, and so indeed, some responses can be slower for faster than others. In general, regarding performance, I'd start with profiling the app with Instruments.app, as mentioned here. If you see that something is unusually slow, please share the details, and I'll be interested in taking a closer look from there.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

0