Hi,
I am working with a large project. We are compiling each material to its own .metallib. They all include many common files full of inline functions. Finally we link it all together at the end with a single big pathtrace kernel. Everything works as expected, however the compile times have gotten completely out of hand and it takes multiple minutes to compile at runtime (to native code). I have gathered that I can do this offline by using metal-tt however if I am wondering if there is a way to reduce the compile times in such a scenario, and how to investigate what the root cause of the problem is. I suspect it could have to do with the fact that every materials metallib contains duplications of all the inline functions. Any ideas on how to profile and debug this?
Thanks,
Rasmus
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hello
I am trying to get thread group memory access in fragment shader. In essence, I would like to have all the fragments in a tile to bitwiseOR some value. My idea was to use simd_or across the SIMD group, then make each SIMD group thread 0 to atomic or the value into thread group memory. Finally very first thread of the tile would be tasked with writing the value down to texture with write access.
Now, I can allocate the thread group memory argument to the fragment function all right. MTLRenderEncoder has setThreadgroupMemoryLength call, which I am using the following way
[renderEncoder setThreagroupMemoryLength: 16 offset: 0 atIndex:0]
Unfortunately, all I am getting is the following error (runtime assertion)
-[MTLDebugRenderCommandEncoder setThreadgroupMemoryLength:offset:atIndex:]:3487: failed assertion Set Threadgroup Memory Length Validation
offset + length(16) must be <= threadgroupMemoryLength(0).`
What I am doing wrong? How I can get thread group memory in the fragment shader? I know I could use tile shading and compute function but the problem is that here I really like to use fragment stuff. Will be grateful for help.
I'm implementing optimized matmul on metal: https://github.com/crynux-ai/metal-matmul/blob/main/metal/1_shared_mem.metal
I notice that performance is significantly different with different threadgroup memory set in
[computeEncoder setThreadgroupMemoryLength]
All other lines are exactly same, the only difference is this parameter.
Matmul performance is roughly 250 GFLops if I set 32768 (max bytes allowed on this M1 Max),
but 400 GFLops if I set 8192.
Why does this happen? How can I optimize it?
Topic:
Graphics & Games
SubTopic:
Metal
View Layout
Add the following views in a view controller:
Label
View A, with a subview of the same size: MTKView A
View B, with a subview of the same size: MTKView B
Refresh Rates of Each View
The label view refreshes at 60fps (driven by CADisplayLink).
MTKView A and B refresh at 15fps.
MTKView Implementation Details
The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering.
The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame.
self.metalView.enableSetNeedsDisplay = NO;
self.metalView.paused = YES;
A new high-priority queue is created for drawing, instead of handling it on the main queue.
MTKView Latency Tracking
The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer.
The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView.
Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that.
I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism.
Observation from Instruments
From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details.
Questions
According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer?
The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens?
Explanation of the Reasoning Behind Some MTKView Code Details
Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering.
Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
The code is pretty simple
kernel void naive(
constant RunParams *param [[ buffer(0) ]],
const device float *A [[ buffer(1) ]], // [N, K]
device float *output [[ buffer(2) ]],
uint2 gid [[ thread_position_in_grid ]]) {
uint a_ptr = gid.x * param->K;
for (uint i = 0; i < param->K; i++, a_ptr++) {
val += A[b_ptr];
}
output[ptr] = val;
}
when uint a_ptr = gid.x * param->K, the code got 150 GFLops
when uint a_ptr = gid.y * param->K, the code got 860 GFLops
param->K = 256;
thread per group: [16, 16]
I'd like to understand why the performance is so different, and how can I profile/diagnose this to help with further optimization.
Topic:
Graphics & Games
SubTopic:
Metal
The game physics work as expected using GTPK 2.0 using Crossover 24 or Whisky. However, using GPTK 2.1 with Crossover 25, the player and camera physics misbehave. See https://www.reddit.com/r/WWEGames/comments/1jx9mph/the_siamese_elbow/ and https://www.reddit.com/r/WWEGames/comments/1jx9ow4/camera_glitch/
Full video also linked in the Reddit post.
I have also submitted this bug via the feedback assistant.
Hi,
seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example..
useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU..
also useful for MoltenVK to support that extensions..
thanks..
I have run into an issue where I am trying to use atomic_float in a swift package but I cannot get things to compile because it appears that the Swift Package Manager doesn't support Metal 3 (atomic_float is Metal 3 functionality). Is there any way around this? I am using
// swift-tools-version: 6.1
and my Metal code includes:
#include <metal_stdlib>
#include <metal_geometric>
#include <metal_math>
#include <metal_atomic>
using namespace metal;
kernel void test(device atomic_float* imageBuffer [[buffer(1)]],
uint id [[ thread_position_in_grid ]]) {
}
But I get an error on the definition of atomic_float .
Any help, one more importantly, where I could have found this information about this limitation, would be helpful.
-RadBobby
Topic:
Graphics & Games
SubTopic:
Metal
In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command:
cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/
I keep getting an error:
CMake Error at CMakeLists.txt:5 (include):
include could not find requested file:
/Users/macuser/USDInstall/bin/pxrConfig.cmake
I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?
Hello!
I'm a developer working on a plugin for the Elgato Stream Deck, called GPU Metrics. The plugin currently only works on Windows but I'd like to bring it to macOS. However, based on forum posts I've read (and StackOverflow) there isn't a very clear path to query GPU metrics like usage, temperature, used GPU memory, and power consumption. There are some tools out there that do similar things, but I wanted to see what would be the recommendation from Apple's engineering team to get this data via a public API.
Requirements:
Access GPU utilization, temperature, memory usage, power usage
C/C++ based API for querying the metrics so I can expose the data to JavaScript via Node Addon
No need to compatibile with Intel-based Macs, as Apple silicon will be fine for now
Plugin GitHub
Thank you!
Noah
hi
When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?
Hey, I've been struggling with this for some days now.
I am trying to write to a sparse texture in a compute shader. I'm performing the following steps:
Set up a sparse heap and create a texture from it
Map the whole area of the sparse texture using updateTextureMapping(..)
Overwrite every value with the value "4" in a compute shader
Blit the texture to a shared buffer
Assert that the values in the buffer are "4".
I have a minimal example (which is still pretty long unfortunately).
It works perfectly when removing the line heapDesc.type = .sparse.
What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated.
import Metal
func sparseTexture64x64Demo() throws {
// ── Metal objects
guard let device = MTLCreateSystemDefaultDevice()
else { throw NSError(domain: "SparseNotSupported", code: -1) }
let queue = device.makeCommandQueue()!
let lib = device.makeDefaultLibrary()!
let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!)
// ── Texture descriptor
let width = 64, height = 64
let format: MTLPixelFormat = .r32Uint // 4 B per texel
let desc = MTLTextureDescriptor()
desc.textureType = .type2D
desc.pixelFormat = format
desc.width = width
desc.height = height
desc.storageMode = .private
desc.usage = [.shaderWrite, .shaderRead]
// ── Sparse heap
let bytesPerTile = device.sparseTileSizeInBytes
let meta = device.heapTextureSizeAndAlign(descriptor: desc)
let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile
let heapDesc = MTLHeapDescriptor()
heapDesc.type = .sparse
heapDesc.storageMode = .private
heapDesc.size = heapBytes
let heap = device.makeHeap(descriptor: heapDesc)!
let tex = heap.makeTexture(descriptor: desc)!
// ── CPU buffers
let bytesPerPixel = MemoryLayout<UInt32>.stride
let rowStride = width * bytesPerPixel
let totalBytes = rowStride * height
let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)!
let cb = queue.makeCommandBuffer()!
let fence = device.makeFence()!
// 2. Map the sparse tile, then signal the fence
let rse = cb.makeResourceStateCommandEncoder()!
rse.updateTextureMapping(
tex,
mode: .map,
region: MTLRegionMake2D(0, 0, width, height),
mipLevel: 0,
slice: 0)
rse.update(fence) // ← capture all work so far
rse.endEncoding()
let ce = cb.makeComputeCommandEncoder()!
ce.waitForFence(fence)
ce.setComputePipelineState(pipeline)
ce.setTexture(tex, index: 0)
let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1)
let tgCount = MTLSize(width: (width + 7) / 8,
height: (height + 7) / 8,
depth: 1)
ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG)
ce.updateFence(fence)
ce.endEncoding()
// Blit texture into shared buffer
let blit = cb.makeBlitCommandEncoder()!
blit.waitForFence(fence)
blit.copy(
from: tex,
sourceSlice: 0,
sourceLevel: 0,
sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0),
sourceSize: MTLSize(width: width, height: height, depth: 1),
to: dstBuf,
destinationOffset: 0,
destinationBytesPerRow: rowStride,
destinationBytesPerImage: totalBytes)
blit.endEncoding()
cb.commit()
cb.waitUntilCompleted()
assert(cb.error == nil, "GPU error: \(String(describing: cb.error))")
// ── Verify a few texels
let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height)
print("first three texels:", out[0], out[1], out[width]) // 0 1 64
assert(out[0] == 4 && out[1] == 4 && out[width] == 4)
}
Metal shader:
#include <metal_stdlib>
using namespace metal;
kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]],
uint2 gid [[thread_position_in_grid]])
{
tex.write(4, gid);
}
I have a Core Image filter in my app that uses Metal. I cannot compile it because it complains that the executable tool metal is not available, but I have installed it in Xcode.
If I go to the "Components" section of Xcode Settings, it shows it as downloaded. And if I run the suggested command, it also shows it as installed. Any advice?
Xcode Version
Version 26.0 beta (17A5241e)
Build Output
Showing All Errors Only
Build target Lessons of project StudyJapanese with configuration Light
RuleScriptExecution /Users/chris/Library/Developer/Xcode/DerivedData/StudyJapanese-glbneyedpsgxhscqueifpekwaofk/Build/Intermediates.noindex/StudyJapanese.build/Light-iphonesimulator/Lessons.build/DerivedSources/OtsuThresholdKernel.ci.air /Users/chris/Code/SerpentiSei/Shared/iOS/CoreImage/OtsuThresholdKernel.ci.metal normal undefined_arch (in target 'Lessons' from project 'StudyJapanese')
cd /Users/chris/Code/SerpentiSei/StudyJapanese
/bin/sh -c xcrun\ metal\ -w\ -c\ -fcikernel\ \"\$\{INPUT_FILE_PATH\}\"\ -o\ \"\$\{SCRIPT_OUTPUT_FILE_0\}\"'
'
error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain
/Users/chris/Code/SerpentiSei/StudyJapanese/error:1:1: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain
Build failed 6/9/25, 8:31 PM 27.1 seconds
Result of xcodebuild -downloadComponent MetalToolchain (after switching Xcode-beta.app with xcode-select)
xcodebuild -downloadComponent MetalToolchain
Beginning asset download...
Downloaded asset to: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/4d77809b60771042e514cfcf39662c6d1c195f7d.asset/AssetData/Restore/022-19457-035.dmg
Done downloading: Metal Toolchain (17A5241c).
Screenshots from Xcode
Result of "Copy Information"
Metal Toolchain 26.0 [com.apple.MobileAsset.MetalToolchain: 17.0 (17A5241c)] (Installed)
Hello,
Thank you for attending today’s Metal & game technologies group lab at WWDC25!
We were delighted to answer many questions from developers and energized by the community engagement.
We hope you enjoyed it and welcome your feedback.
We invite you to carry on the conversation here, particularly if your question appeared in Slido and we were unable to answer it during the lab.
If your question received feedback let us know if you need clarification.
You may want to ask your question again in a different lab e.g. visionOS tomorrow.
(We realize that this can be confusing when frameworks interoperate)
We have a lot to learn from each other so let’s get to Q&A and make the best of WWDC25! 😃
Looking forward to your questions posted in new threads.
I'm a newbee at Vulkan and Xcode.
I have my project on github https://github.com/flocela/OrangeSpider/
Whenever I run, two windows open instead of only one.
I added testing, which means I have an OrangeSpider.xctestplan in the OrangeSpider/TestsOrangeSpider/ folder.
This is my first time adding testing to an XCode project, so I think this may be where the problem is.
I also get this error message:
ViewBridge to RemoteViewService Terminated: Error Domain=com.apple.ViewBridge Code=18 "(null)" UserInfo={com.apple.ViewBridge.error.hint=this process disconnected remote view controller -- benign unless unexpected, com.apple.ViewBridge.error.description=NSViewBridgeErrorCanceled}
Topic:
Graphics & Games
SubTopic:
Metal
I'm running into a persistent visual issue while deploying a floral corridor scene to Apple Vision Pro using Unity 6.0 with URP and Metal. The issue only appears on the Vision Pro device — everything looks fine in the Unity Editor.
Issue Description
When the frame rate drops to around 60–70 FPS, noticeable distortion artifacts appear around the edges of foliage models. It seems like the background meshes (behind the plants) get warped and leak through the edges of the foliage. Although this is most visible around the leaves, even solid objects like standard URP wall or box models show distorted edges when the issue occurs.
All the foliage uses Opaque or Alpha Clipping materials.
Things I've Tried
Changing the foliage materials to Transparent mode —distortion around edges disappears, but using Transparent for a large number of foliage assets is not ideal for performance or sorting complexity.
Reducing the number of foliage objects — with only a few plants in the scene and the frame rate staying around 100 FPS, the distortion disappears. However, this isn’t a practical solution for a full environment.
Possible Cause?
I came across this note in the Unity documentation:
"Ensure depth-buffer for each pixel is non-zero - on visionOS, the depth buffer is used for reprojection. To ensure visual effects like skyboxes and shaders are displayed beautifully, ensure that some value is written to the depth for each pixel."
Could this be related to the issue? Is it possible that Alpha Clipping with low pixel coverage leads to some pixels not writing to the depth buffer, which then causes problems during Vision Pro’s reprojection or foveated rendering? However, even when I disable Alpha Clipping entirely, the distortion issue still persists, so it may not be solely caused by clipping itself.
Project Setup
Unity 6.0 (URP)
Depth Texture: Enable
Using Metal as the graphics backend
Running on real Vision Pro hardware (not simulator)
Any advice on how to avoid these distortion issues on Vision Pro would be greatly appreciated.
Thanks!
hi everyone,
我们发现了一个和Metal相关崩溃。应用中使用了Metal相关的接口,在进行性能测试时,打开了设置-开发者-显示HUD图形。运行应用后,正常展示HUD,但应用很快发生了崩溃,日志主要信息如下:
Incident Identifier: 1F093635-2DB8-4B29-9DA5-488A6609277B
CrashReporter Key: 233e54398e2a0266d95265cfb96c5a89eb3403fd
Hardware Model: iPhone14,3
Process: waimai [16584]
Path: /private/var/containers/Bundle/Application/CCCFC0AE-EFB8-4BD8-B674-ED089B776221/waimai.app/waimai
Identifier:
Version: 61488 (8.53.0)
Code Type: ARM-64
Parent Process: ? [1]
Date/Time: 2025-06-12 14:41:45.296 +0800
OS Version: iOS 18.0 (22A3354)
Report Version: 104
Monitor Type: Mach Exception
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x000000014fffae00
Crashed Thread: 57
Thread 57 Crashed:
0 libMTLHud.dylib esfm_GenerateTriangesForString + 408
1 libMTLHud.dylib esfm_GenerateTriangesForString + 92
2 libMTLHud.dylib Renderer::DrawText(char const*, int, unsigned int) + 204
3 libMTLHud.dylib Overlay::onPresent(id<CAMetalDrawable>) + 1656
4 libMTLHud.dylib CAMetalDrawable_present(void (*)(), objc_object*, objc_selector*) + 72
5 libMTLHud.dylib invocation function for block in void replaceMethod<void>(objc_class*, objc_selector*, void (*)(void (*)(), objc_object*, objc_selector*)) + 56
6 Metal __45-[_MTLCommandBuffer presentDrawable:options:]_block_invoke + 104
7 Metal MTLDispatchListApply + 52
8 Metal -[_MTLCommandBuffer didScheduleWithStartTime:endTime:error:] + 312
9 IOGPU IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 136
10 IOGPU __IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64
11 libdispatch.dylib _dispatch_client_callout4 + 20
12 libdispatch.dylib _dispatch_mach_msg_invoke + 464
13 libdispatch.dylib _dispatch_lane_serial_drain + 368
14 libdispatch.dylib _dispatch_mach_invoke + 456
15 libdispatch.dylib _dispatch_lane_serial_drain + 368
16 libdispatch.dylib _dispatch_lane_invoke + 432
17 libdispatch.dylib _dispatch_lane_serial_drain + 368
18 libdispatch.dylib _dispatch_lane_invoke + 380
19 libdispatch.dylib _dispatch_root_queue_drain_deferred_wlh + 288
20 libdispatch.dylib _dispatch_workloop_worker_thread + 540
21 libsystem_pthread.dylib _pthread_wqthread + 288
我们测试了几个不同的机型,只有iPhone 13 Pro Max会发生崩溃。
Q1:为什么会发生这个崩溃?
Q2:相同的逻辑,为什么仅在iPhone 13 Pro Max机型上出现崩溃?
期待您的解答。
Hi there,
I'm wondering if it's possible under iOS 28 developer beta to enable MetalFX scaling info with '{"MTL_HUD_ENABLED": "1" for my App.
This information has been added to Mac, but looks to be absent on iPhone / iPad
Hi,
What's the best way to handle drastic changes in scene charateristics with the new MTLFXTemporalDenoisedScaler?
Let's say a visible object of the scene radically changes its material properties. I can modify the albedo and roughness textures consequently. But I suspect the history will be corrupted. Blending visual information between the new frame and the previous ones might be a nonsense.
I guess the problem should be the same when objects appear or disappear instantly.
Is the upsacler manage these events for us (by lowering blending), or should we use the reactive or the denoise strength mask or something like that to handle them?
Description:
In the official visionOS 26 Hover Effect sample code project , I encountered an issue where the event.trackingAreaIdentifier returned by onSpatialEvent does not reset as expected.
Steps to Reproduce:
Select an object with trackingAreaID = 6 in the sample app.
Look at a blank space (outside any tracking area) and perform a pinch gesture .
Expected Behavior:
The event.trackingAreaIdentifier should return 0 when interacting with a non-tracking area.
Actual Behavior:
The event.trackingAreaIdentifier still returns 6, even after restarting the app or killing the process. This persists regardless of where the pinch gesture is performed