I wanted to try the new logging feature for Metal but could not get it to work.
I modified the PerformingCalculationsOnAGPU example by adding os_log_default.log_debug("Hello thread: %d", index); to log the current thread id. But never saw any messages neither in the console nor in Xcode.
I also added the -fmetal-enable-logging flag. I am running the Sequoia release candidate 15.0 (24A335) on M1 Max and Xcode 16.0 (16A242).
What am I missing?
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I am trying to get a little game prototype up and running using Metal using the metal-cpp libraries where I run everything natively at 120Hz with a coupled renderer using Vsync turned on so that I have the absolute physically minimum input to photon latency possible.
// Create the metal view
SDL_MetalView metal_view = SDL_Metal_CreateView(window);
CA::MetalLayer *swap_chain = (CA::MetalLayer *)SDL_Metal_GetLayer(metal_view);
// Set up the Metal device
MTL::Device *device = MTL::CreateSystemDefaultDevice();
swap_chain->setDevice(device);
swap_chain->setPixelFormat(MTL::PixelFormat::PixelFormatBGRA8Unorm);
swap_chain->setDisplaySyncEnabled(true);
swap_chain->setMaximumDrawableCount(2);
I am using SDL3 just for creating the window. Now when I go through my game / render loop - I stall for a long time on getting the next drawable which is understandable - my app runs in about 2-3ms.
m_CurrentContext->m_Drawable = m_SwapChain->nextDrawable();
m_CurrentContext->m_CommandBuffer = m_CommandQueue->commandBuffer()->retain();
char frame_label[32];
snprintf(frame_label, sizeof(frame_label), "Frame %d", m_FrameIndex);
m_CurrentContext->m_CommandBuffer->setLabel(NS::String::string(frame_label, NS::UTF8StringEncoding));
m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal] = MTL::RenderPassDescriptor::alloc()->init();
MTL::RenderPassColorAttachmentDescriptor* cd = m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal]->colorAttachments()->object(0);
cd->setTexture(m_CurrentContext->m_Drawable->texture());
cd->setLoadAction(MTL::LoadActionClear);
cd->setClearColor(MTL::ClearColor( 0.53f, 0.81f, 0.98f, 1.0f ));
cd->setStoreAction(MTL::StoreActionStore);
However my ProMotion display does not reliably run at 120Hz when fullscreen and using the direct to display system - it seems to run faster when windowed in composite which is the opposite of what I would expect. The Metal HUD says 120Hz, but the delay to getting the next drawable and looking at what Instruments is saying tells otherwise.
When I profile it, the game loop has completed and is sitting there waiting for the next drawable, but the screen does not want to complete in 8.33ms, so the whole thing slows down for no discernible reason.
Also as a game developer it is very strange for the command buffer to actually need the drawable texture free to be allowed to encode commands - usually the command buffers and swapping the front and back render buffers are not directly dependent on each other. Usually you only actually need the render buffer texture free when you want to draw to it. I could give myself another drawable, but because I am completing in less than 3ms, all it would do would be to add another frame of latency.
I also looked at the FramePacing example and its behaviour is even worse at having high framerate with low latency - the direct to display is always rejected for some reason.
Is this just a flaw in the Metal API? Or am I missing something important? I hope someone can help - the behaviour of the display is baffling.
Guten Tag,
my project is simple, first I want draw wired Hexa,-Tetra- and Octahedrons.
I draw a cube with Metal but I didn't found rotation, translation and scale.
I have searched help , the examples I found are too complicated for me.
Mit freundlichen Grüßen
VanceRegnet
I am trying to convert a ThreeJS project to Metal for the Vision Pro. The issue is ThreeJS doesn't do any color space conversion (when I output a color in a fragment shader and then read it using the digital color meter in SRGB mode I get the same value I inputed in the fragment shader) This is not the case when using metal. When setting up my LayerRenderer I set the colorFormat to rgba16Unorm since it is the only non srgb color format supported on the vision pro apps. However switching between bgra8Unorm_srgb and rgba16Unorm seems to have no affect.
when I set up the renderPassDescriptor I use the drawable colorTexture
renderPassDescriptor.colorAttachments[0].texture = drawable.colorTextures[0]
and when printing its pixel format it seems to be passed from the configuration.
If there is anyway to disable this behavior or perform an inverse function of such that I get the original value out from the shader, that would be appreciated.
I'm trying to create a custom Metal-based visual effect as a UIView to be used inside an existing UIKit-based interface. (An example might be a view that applies a blur effect to what's behind it.) I need to capture the MTLTexture of what's behind the view so that I can feed it to MTLRenderCommandEncoder.setFragmentTexture(_:index:). Can someone show me how or point me to an example? Thanks!
Greetings! I have been battling with a bit of a tough issue. My use case is running a pixelwise regression model on a 2D array of images using CIImageProcessorKernel and a custom Metal Shader.
It mostly works great, but the issue that arises is that if the regression calculation in Metal takes too long, an error occurs and the resulting output texture has strange artifacts, for example:
The specific error is:
Error excuting command buffer = Error Domain=MTLCommandBufferErrorDomain Code=1 "Internal Error (0000000e:Internal Error)" UserInfo={NSLocalizedDescription=Internal Error (0000000e:Internal Error), NSUnderlyingError=0x60000320ca20 {Error Domain=IOGPUCommandQueueErrorDomain Code=14 "(null)"}} (com.apple.CoreImage)
There are multiple levels of concurrency: Swift Concurrency calling the Core Image code (which shouldn't have an impact) and of course the Metal command buffer.
Is there anyway to ensure the compute command encoder can complete its work?
Here is the full implementation of my CIImageProcessorKernel subclass:
class ParametricKernel: CIImageProcessorKernel {
static let device = MTLCreateSystemDefaultDevice()!
override class var outputFormat: CIFormat {
return .BGRA8
}
override class func formatForInput(at input: Int32) -> CIFormat {
return .BGRA8
}
override class func process(with inputs: [CIImageProcessorInput]?, arguments: [String : Any]?, output: CIImageProcessorOutput) throws {
guard
let commandBuffer = output.metalCommandBuffer,
let images = arguments?["images"] as? [CGImage],
let mask = arguments?["mask"] as? CGImage,
let fillTime = arguments?["fillTime"] as? CGFloat,
let betaLimit = arguments?["betaLimit"] as? CGFloat,
let alphaLimit = arguments?["alphaLimit"] as? CGFloat,
let errorScaling = arguments?["errorScaling"] as? CGFloat,
let timing = arguments?["timing"],
let TTRThreshold = arguments?["ttrthreshold"] as? CGFloat,
let input = inputs?.first,
let sourceTexture = input.metalTexture,
let destinationTexture = output.metalTexture
else {
return
}
guard let kernelFunction = device.makeDefaultLibrary()?.makeFunction(name: "parametric") else {
return
}
guard let commandEncoder = commandBuffer.makeComputeCommandEncoder() else {
return
}
let imagesTexture = Texture.textureFromImages(images)
let pipelineState = try device.makeComputePipelineState(function: kernelFunction)
commandEncoder.setComputePipelineState(pipelineState)
commandEncoder.setTexture(imagesTexture, index: 0)
let maskTexture = Texture.textureFromImages([mask])
commandEncoder.setTexture(maskTexture, index: 1)
commandEncoder.setTexture(destinationTexture, index: 2)
var errorScalingFloat = Float(errorScaling)
let errorBuffer = device.makeBuffer(bytes: &errorScalingFloat, length: MemoryLayout<Float>.size, options: [])
commandEncoder.setBuffer(errorBuffer, offset: 0, index: 1)
// Other buffers omitted....
let threadsPerThreadgroup = MTLSizeMake(16, 16, 1)
let width = Int(ceil(Float(sourceTexture.width) / Float(threadsPerThreadgroup.width)))
let height = Int(ceil(Float(sourceTexture.height) / Float(threadsPerThreadgroup.height)))
let threadGroupCount = MTLSizeMake(width, height, 1)
commandEncoder.dispatchThreadgroups(threadGroupCount, threadsPerThreadgroup: threadsPerThreadgroup)
commandEncoder.endEncoding()
}
}
The Metal feature set tables specifies that beginning with the Apple4 family, the "Maximum threads per threadgroup" is 1024. Given that a single threadgroup is guaranteed to be run on the same GPU shader core, it means that a shader core of any new Apple GPU must be capable of running at least 1024/32 = 32 warps in parallel.
From the WWDC session "Scale compute workloads across Apple GPUs (6:17)":
For relatively complex kernels, 1K to 2K concurrent threads per shader core is considered a very good occupancy.
The cited sentence suggests that a single shader core is capable of running at least 2K (I assume this is meant to be 2048) threads in parallel, so 2048/32 = 64 warps running in parallel.
However, I am curious what is the maximum theoretical amount of warps running in parallel on a single shader core (it sounds like it is more than 64). The WWDC session mentions 2K to be only "very good" occupancy. How many threads would be "the best possible" occupancy?
Our app encountered the following error:
Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
How many 32-bit variables can I use concurrently in a single thread of a Metal compute kernel without worrying about the variables getting spilled into the device memory? Alternatively: how many 32-bit registers does a single thread have available for itself?
Let's say that each thread of my compute kernel needs to store and work with its own array of N float variables, where N can be 128, 256, 512 or more. To achieve maximum possible performance, I do not want to the local thread variables to get spilled into the slow device memory. I want all N variables to be stored "on-chip", in the thread memory space.
To make my question more concrete, let's say there is an array thread float localArray[N]. Assuming an unrealistic hypothetical scenario where localArray is the only variable in the whole kernel, what is the maximum value of N for which no portion of localArray would get spilled into the device memory?
I searched in the Metal feature set tables, but I could not find any details.
We have a production Metal app with a complex multithreaded Metal pipeline.
When everything is operating smoothly, it works great.
Even when extremely overloaded, it does not crash for days at a time.
This isn't good enough for our users.
Unfortunately, when I have zero visibility into id, I have no way of knowing when metal is "done" with an id.
When overloaded, stale metal render passes need to be 'aborted', which results in metal callbacks not being called.
for example, these callbacks may not be called after an aborted pass:
id<MTLCommandBuffer> m_cmdbuf;
[m_cmdbuf addScheduledHandler:^(id <MTLCommandBuffer> cb) {
cpr->scheduled = MachAbsoluteTime();
}];
[m_cmdbuf addCompletedHandler:^(id <MTLCommandBuffer> cb) {
cpr->completed = MachAbsoluteTime();
}];
For the moment, our workaround is a system which waits a few seconds after we "think" a rendering pass should be done with all its (aborted) resources before releasing buffers. This is not ideal, to say the least.
So, in summary, my question is, it would be nice to be able to 'query' an id to know when metal is done with it, so that we know that its safe to release it along with our own internal resources.
Is there any such (undocumented) mechanism? I have exhaustively read all existing Metal documentation many times.
An idea that I've been toying with... it would be nice to have something akin to Zombie detection running all the time for id only.
In OpenGL, it was OK to use a released texture... you may display a bad frame, but not CRASH!. Is there any similar option for id?
Topic:
Graphics & Games
SubTopic:
Metal
We have been having a mysterious crash in our media server app that I've never seen before. After fixing a number of other rare thread safety crashes relating to Metal buffers, this rare crash happens inside a Metal com.Metal.CompletionQueueDispatch?
I have no clue what is happening here. It looks to me like Metal is specifically calling abort() for some reason.
All of the other threads in the crash log appear to be in a normal state.
Thread 70 Crashed:: updateAllMedia Dispatch queue: com.Metal.CompletionQueueDispatch
0 libsystem_kernel.dylib 0x1af572d38 __pthread_kill + 8
1 libsystem_pthread.dylib 0x1af5a7ee0 pthread_kill + 288
2 libsystem_c.dylib 0x1af4e2330 abort + 168
3 libc++abi.dylib 0x1af562b18 abort_message + 132
4 libc++abi.dylib 0x1af552a3c demangling_terminate_handler() + 312
5 libobjc.A.dylib 0x1af4481c8 _objc_terminate() + 160
6 libc++abi.dylib 0x1af561eb4 std::__terminate(void (*)()) + 20
7 libc++abi.dylib 0x1af561e50 std::terminate() + 64
8 libdispatch.dylib 0x1af3e4288 _dispatch_client_callout4 + 40
9 libdispatch.dylib 0x1af40053c _dispatch_mach_msg_invoke + 464
10 libdispatch.dylib 0x1af3eb784 _dispatch_lane_serial_drain + 376
11 libdispatch.dylib 0x1af40125c _dispatch_mach_invoke + 456
12 libdispatch.dylib 0x1af3eb784 _dispatch_lane_serial_drain + 376
13 libdispatch.dylib 0x1af3ec438 _dispatch_lane_invoke + 444
14 libdispatch.dylib 0x1af3eb784 _dispatch_lane_serial_drain + 376
15 libdispatch.dylib 0x1af3ec404 _dispatch_lane_invoke + 392
16 libdispatch.dylib 0x1af3f6c98 _dispatch_workloop_worker_thread + 648
17 libsystem_pthread.dylib 0x1af5a4360 _pthread_wqthread + 288
18 libsystem_pthread.dylib 0x1af5a3080 start_wqthread + 8
Note that the thread name "updateAllMedia" is a misnomer because this thread appears to be a general Metal dispatch queue. I wish there was a debugging option in Metal that called "setThreadName" to name its internal threads.
VisionOS 2 beta 5 ,unity text shader errors
Topic:
Graphics & Games
SubTopic:
Metal
I have a test application that draws a large number of simple textured polygons (sprites).
Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task.
(In this case, nanosleep is used to adjust the amount of METAL commands per unit time so that they are approximately the same)
This appears to be a drawing-related thread, but there is no overhead when displaySyncEnabled is TRUE.
What are these differences?
A specific application is the SDL test program, SDL/test/testsprite.c.
https://github.com/libsdl-org/SDL/issues/10475
I have a test application that draws a large number of simple textured polygons (sprites).
Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task.
In this case, nanosleep() is used to adjust the amount of METAL commands per unit time so that they are approximately the same.
This appears to be a drawing-related thread, but there is no overhead when displaySyncEnabled is TRUE.
What are these differences?
A specific application is the SDL test program, SDL/test/testsprite.c.
https://github.com/libsdl-org/SDL/issues/10475
I use quad_sum to optimize the lighting grid and shadow filter performance.
Based on Metal Feature Set Tables, Apple Family 4 should support quad group operations like quad_sum and quad_max. However, on the iPhone X and iPhone 8, during creating pipeline states, we have the following error output: Encountered unlowered function call to air.quad_sum.f32.
It works perfectly for iPhone 11 and higher versions. Should I improve my feature-checking logic from Apple Family 4 to Apple Family 5, or do I have other options to fix this unexpected behavior?
Is there new API for generating Indirect Commands for the Metal Shader Converter? Is there any example project? I currently use a shader to copy indirect commands. Is there a way to do that with the new Shader Converter pipeline?
I am trying to work out how to enable the Metal HUD on iOS for App Store games?
I am aware you can go into Developer settings and enable it… but it only appears for some games like HADES or TestFlight apps.
I know the HUD appears for sideloaded games too. With sideloadly, I’ve sideloaded GRID Autosport and Myst - per screenshots. But it’s a very time consuming process, on demand resources usually don’t download… and I’m not sure if its legal.
I tried using Xcode and Attach to Process for a game like Resident Evil 7 or just anything… but it doesn’t work.
I’ve tried restoring a backup and editing the .GlobalPreferences.plist and info.plist file with “MetalForceHudEnabled Boolean Yes”, in a new row… but nothing.
any ideas?
Topic:
Graphics & Games
SubTopic:
Metal
I'm testing on an iPhone 12 Pro, running iOS 17.5.1.
Playing an HDR video with AVPlayer without explicitly specifying a pixel format (but specifying Metal Compatibility as below) gives buffers with the pixel format kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarVideoRange (&xv0).
_videoOutput = [[AVPlayerItemVideoOutput alloc] initWithPixelBufferAttributes:@{ (NSString*)kCVPixelBufferMetalCompatibilityKey: @(YES)
}
I can't find an appropriate metal format to use for these buffers to access the data in a shader. Using MTLPixelFormatR16Unorm for the Y plane and MTLPixelFormatRG16Unorm for UV plane causes GPU command buffer aborts.
My suspicion is that this compressed format isn't actually metal compatible due to the lack of padding bytes between pixels. Explicitly selecting kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange (which uses 16 bits per pixel) for the AVPlayerItemVideoOutput works, but I'd ideally like to use the compressed formats if possible for the bandwidth savings.
With SDR video, the pixel format is the lossless 8-bit one, and there are no problems binding those buffers to metal textures.
I'm just looking for confirmation there's currently no appropriate metal format for binding the packed 10-bit planes. And if that's the case, is it a bug that AVPlayerVideoOutput uses this format despite requesting Metal compatibility?
The title is self-exploratory. I wasn't able to find the CAMetalDisplayLink on the most recent metal-cpp release (metal-cpp_macOS15_iOS18-beta). Are there any plans to include it in the next release?
I have a very simple Mac app with just a MTKView in it which shows a single color. I want to move the rendering code to C++. For this I created a C++ framework target which interoperates with the Swift code - main project target. I am trying to link metal-cpp library to the C++ framework target using these instructions. Approach described in this article works with simple C++ Mac console apps. But in my mixed Swift/C++ project Xcode cannot find Foundation/Foundation.hpp (and probably other headers) to include into the C++ header.
I inserted metal-cpp folder into my project and added it to C++ target's header search paths, as written in the instructions.