How to analyse CPU usage with Core Image?

I have a complex CoreImage pipeline which I'm keen to optimise. I'm aware that calling back to the CPU can have a significant impact on the performance - what is the best way to find out where this is happening?

Answered by DTS Engineer in 829320022

Hello,

To better understand Core Image performance bottleneck(s) in general we recommend profiling with Instruments using the Metal System Trace or Game Performance templates.

To investigate Core Image performance in particular, check out Discover Core Image debugging techniques. You'll want to generate a Core Image graph to determine the most expensive stages of your image processing pipeline.

@Etresoft's reference is also good but somewhat out of date (being in the reference library).

You may also want to consider capturing your GPU workload i.e. Core Image kernels. This provides some insight into how well a Core Image kernel is performing on the GPU but lacks source access so is tricky to optimize beyond inferring behavior from GPU counters.

Lastly, ProRAW assets are generally large and time consuming to process so anything you can do to efficiently stage the work e.g. delaying processing immediately after capture, is expected to help.

What do you mean by "calling back to the CPU" exactly?

The main CPU load of Core Image actually comes from the filter graph optimization CI does before the actual rendering. And unfortunately there is not much you can do to speed this up.

Don't use the "comments" feature here in the forums. It hides your posts.

Core Image is designed to be a black box where you don't know the implementation details. The more complex your pipeline, the greater the risk to performance.

Generally speaking, think of how you would implement any given Core Image filter. If it seems like it could be implemented on the GPU in a straightforward way, then it probably was. If not, you had better investigate that filter more closely.

Review Apple's Core Image documentation for Getting the Best Performance.. These suggestions are more than just general best practice suggestions. I'm sure the reflect the internal limitations of Core Image as well.

These suggestions say to "Make sure images don’t exceed CPU and GPU limits", "User[sic] smaller images when possible", and "Avoid unnecessary texture transfers between the CPU and GPU". Sometimes you have to read between the lines. For example, Core Image has image tiling support that violates all of these principles - and is horribly slow as a result.

@Etresoft thanks for the tips, I think my biggest bottleneck is needing to process a ProRAW image immediately after capture which seems a few seconds to even start the image processing. I think I'll look into deferring the rendering as this will probably have the biggest impact on the user experience.

I think my biggest bottleneck is needing to process a ProRAW image immediately after capture which seems a few seconds to even start the image processing.

That doesn't sound like it has anything to do with Core Image. That sounds like more of a threading or concurrency problem.

When I talked about Core Image hitting the CPU resulting in "horribly slow" performance, I was talking about a reduction in frames per second, not seconds per frame.

Accepted Answer

Hello,

To better understand Core Image performance bottleneck(s) in general we recommend profiling with Instruments using the Metal System Trace or Game Performance templates.

To investigate Core Image performance in particular, check out Discover Core Image debugging techniques. You'll want to generate a Core Image graph to determine the most expensive stages of your image processing pipeline.

@Etresoft's reference is also good but somewhat out of date (being in the reference library).

You may also want to consider capturing your GPU workload i.e. Core Image kernels. This provides some insight into how well a Core Image kernel is performing on the GPU but lacks source access so is tricky to optimize beyond inferring behavior from GPU counters.

Lastly, ProRAW assets are generally large and time consuming to process so anything you can do to efficiently stage the work e.g. delaying processing immediately after capture, is expected to help.

How to analyse CPU usage with Core Image?
 
 
Q