From tensorflow-metal example:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )
I know that Apple silicon uses UMA, and that memory copies are typical of CUDA, but wouldn't the GPU memory still be faster overall?
I have an iMac Pro with a Radeon Pro Vega 64 16 GB GPU and an Intel iMac with a Radeon Pro 5700 8 GB GPU.
But using tensorflow-metal is still WAY faster than using the CPUs. Thanks for that. I am surprised the 5700 is twice as fast as the Vega though.
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi everyone! 👋
I'm working on a C++ project using TensorFlow Lite and was wondering if anyone has a prebuilt TensorFlow Lite C++ library (libtensorflowlite) for macOS (Apple Silicon M1/M2) that they’d be willing to share.
I’m looking specifically for the TensorFlow Lite C++ API — something that lets me use tflite::Interpreter, tflite::FlatBufferModel, etc. Building it from source using Bazel on macOS has been quite challenging and time-consuming, so a ready-to-use .dylib or .a build along with the required headers would be incredibly helpful.
TensorFlow Lite version: v2.18.0 preferred
Target: macOS arm64 (Apple Silicon)
What I need:
libtensorflowlite.dylib or .a
Corresponding headers (ideally organized in a clean include/ folder)
If you have one available or know where I can find a reliable prebuilt version, I’d be super grateful. Thanks in advance! 🙏
Hi! I'm trying to use the ImagePlayground API in SwiftUI with the .imagePlaygroundSheet modifier. However, when the sheet is shown (in the preview or in the simulator) it displays the following message: "Image Playground is not available. Image Playground is not available on this iPhone.".
I'm using an iPhone 16 Pro with iOS 18.3.1 in the Xcode (16.2) Simulator.
Anyone else having this problem? How can I fix it?
I’m considering creating an ILMessageFilterExtension using a mini LLM/SLM to detect fraud and I’ve read it has strict memory limits yet I can’t find it in the documentation. What’s the set limit or any other constraints impacting the feasibility of running 100-500mb model?
Topic:
Machine Learning & AI
SubTopic:
Core ML
Hi Apple product owners.
I am missing a unified concept which might be derived from the use cases for mail categories and mail spam for the app "Mail" on Mac.
I need a recommendation on how to use categories in combination with the spam filter to get most out of it.
So I was looking for the use cases for the 2 functionality areas in order to figure out how to organise my mails by using as much automation as possible before I start creating intelligent folders in addition.
What can you recommend where I get this information from? I don't want to guess or read a lot of forum contributions which are based on guesses.
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
I followed below url for converting Llama-3.1-8B-Instruct model but always fails even i have 64GB of free space after downloading model from huggingface.
https://machinelearning.apple.com/research/core-ml-on-device-llama
Also tried with other models Llama-3.1-1B-Instruct & Llama-3.1-3B-Instruct models those are converted but while doing performance test in xcode fails for all compunits.
Is there any source code to run llama models in ios app.
Recently, I'm trying to deploy some third-party LLM to Apple devices.
The methodoloy is similar to https://github.com/Anemll/Anemll.
The biggest issue I'm having now is the runtime memory usage.
When there are multiple functions in a model (mlpackage or mlmodelc), the runtime memory usage for weights is somehow duplicated when I load all of them. Here's the detail:
I created my multifunction mlpackage following https://apple.github.io/coremltools/docs-guides/source/multifunction-models.html
I loaded each of the functions using the generated swift class:
let config = MLModelConfiguration()
config.computeUnits = MLComputeUnits.cpuAndNeuralEngine
config.functionName = "infer_512";
let ffn1_infer_512 = try! mimo_FFN_PF_lut4_chunk_01of02(configuration: config)
config.functionName = "infer_1024";
let ffn1_infer_1024 = try! mimo_FFN_PF_lut4_chunk_01of02(configuration: config)
config.functionName = "infer_2048";
let ffn1_infer_2048 = try! mimo_FFN_PF_lut4_chunk_01of02(configuration: config)
I observed that RAM usage increases linearly as I load each of the functions.
Using instruments, I see that there are multiple HWX files generated and loaded, each of which contains all the weight data.
My understanding of what's happening here:
The CoreML framework did some MIL->MIL preprocessing before further compilation, which includes separating CPU workload from ANE workload.
The ANE part of each function is moved into a separate MIL file then compile separately into a HWX file each.
The problem is that the weight data of these HWX files are duplicated. Since that the weight data of LLMs is huge, it will cause out-of-memory issue on mobile devices.
The improvement I'm hoping from Apple:
I hope we can try to merge the processed MIL files back into one before calling ANECCompile(), so that the weights can be merged. I don't have control over that in user space and I'm not sure if that is feasible. So I'm asking for help here.
Thanks.
Topic:
Machine Learning & AI
SubTopic:
Core ML
Hi, DataScannerViewController does't recognize currencies less than 1.00 (e.g. 0.59 USD, 0.99 EUR, etc.). Why? How to solve the problem?
This feature is not described in Apple documentation, is there a solution?
This is my code:
func makeUIViewController(context: Context) -> DataScannerViewController {
let dataScanner = DataScannerViewController(recognizedDataTypes: [ .text(textContentType: .currency)])
return dataScanner
}
I used Yolo5-11 and while performing great detecting balls lets say 5-10ft away in 1920 resolution and even in 640 it really is taking toll on my app performance.
When I use Create ML it outputs all in 415x which is probably the reason why it does not detect objects from far.
What can I do to preserve some energy ?
My model is used with about 1K pictures 200 each test and validate, and from close up and far.
Topic:
Machine Learning & AI
SubTopic:
Create ML
I'm developing a tennis ball tracking feature using Vision Framework in Swift, specifically utilizing VNDetectedObjectObservation and VNTrackObjectRequest.
Occasionally (but not always), I receive the following runtime error:
Failed to perform SequenceRequest: Error Domain=com.apple.Vision Code=9 "Internal error: unexpected tracked object bounding box size" UserInfo={NSLocalizedDescription=Internal error: unexpected tracked object bounding box size}
From my investigation, I suspect the issue arises when the bounding box from the initial observation (VNDetectedObjectObservation) is too small. However, Apple's documentation doesn't clearly define the minimum bounding box size that's considered valid by VNTrackObjectRequest.
Could someone clarify:
What is the minimum acceptable bounding box width and height (normalized) that Vision Framework's VNTrackObjectRequest expects?
Is there any recommended practice or official guidance for bounding box size validation before creating a tracking request?
This information would be extremely helpful to reliably avoid this internal error.
Thank you!
Hi,
I have been trying to integrate a CoreML model into Xcode. The model was made using tensorflow layers. I have included both the model info and a link to the app repository. I am mainly just really confused on why its not working. It seems to only be printing the result for case 1 (there are 4 cases labled, case 0, case 1, case 2, and case 3).
If someone could help work me through this error that would be great!
here is the link to the repository: https://github.com/ShivenKhurana1/Detect-to-Protect-App
this file with the model code is called SecondView.swift
and here is the model info:
Input: conv2d_input-> image (color 224x224)
Output: Identity -> MultiArray (Float32 1x4)
Hi friends,
I have just found that the inference speed dropped to only 1/10 of the original model.
Had anyone encountered this?
Thank you.
Topic:
Machine Learning & AI
SubTopic:
Core ML
I got 3203.23 GFLOPS (FP16) on the M3 Macbook Pro and only 2833.24 GFLOPS (FP16) on the M4 Macbook Air for 4096x4096 matrix multiplications for a PyTorch MPS FP16 Benchmark. Wasn't the performance supposed to be twice as high on the M4 compared to the M3 even with the termal throtling on the Macbook Air? What went wrong?
I have seen inconsistent results for my Colab machine learning notebooks running locally on a Mac M4, compared to running the same notebook code on either T4 (in Colab) or a RTX3090 locally.
To illustrate the problems I have set up a notebook that implements two simple CNN models that solves the Fashion-MNIST problem. https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing
For the good model with 2M parameters I get the following results:
T4 (Colab, JAX): Test accuracy: 0.925
3090 (Local PC via ssh tunnel, Jax): Test accuracy: 0.925
Mac M4 (Local, JAX): Test accuracy: 0.893
Mac M4 (Local, Tensorflow): Test accuracy: 0.893
That is, I see a significant drop in performance when I run on the Mac M4 compared to the NVIDIA machines, and it seems to be independent of backend. I however do not know how to pinpoint this to either Keras or Apple’s METAL implementation. I have reported this to Keras: https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing but as this can be (likely is?) an Apple Metal issue, I wanted to report this here as well.
On the mac I am running the following Python libraries:
keras 3.9.1
tensorflow 2.19.0
tensorflow-metal 1.2.0
jax 0.5.3
jax-metal 0.1.1
jaxlib 0.5.3
Topic:
Machine Learning & AI
SubTopic:
General
Hey dear developers!
This post should be available for the future Siri updates and improvements but also for wishes in this forum so that everyone can share their opinion and idea please stay friendly. have fun! I had already thought about developing a demo app to demonstrate my idea for a better Siri.
My change of many:
Wish Update: Siri's language recognition capabilities have been significantly enhanced. Instead of manually setting the language, Siri can now automatically recognize the language you intend to use, making language switching much more efficient. Simply speak the language you want to communicate in, and Siri will automatically recognize it and respond accordingly. Whether you speak English, German, or Japanese, Siri will respond in the language you choose.
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
Tags:
iPhone
Siri Event Suggestions Markup
Siri and Voice
Apple Intelligence
In my quantization code, the line:
compressed_model_a8 = cto.coreml.experimental.linear_quantize_activations(
model, activation_config, [{'img':np.random.randn(1,13,1024,1024)}]
)
has taken 90 minutes to run so far and is still not completed. From debugging, I can see that the line it's stuck on is line 261 in _model_debugger.py:
model = ct.models.MLModel(
cloned_spec,
weights_dir=self.weights_dir,
compute_units=compute_units,
skip_model_load=False, # Don't skip model load as we need model prediction to get activations range.
)
Is this expected behaviour? Would it be quicker to run on another computer with more RAM?
Hi,
I'm testing DockKit with a very simple setup:
I use VNDetectFaceRectanglesRequest to detect a face and then call dockAccessory.track(...) using the detected bounding box.
The stand is correctly docked (state == .docked) and dockAccessory is valid.
I'm calling .track(...) with a single observation and valid CameraInformation (including size, device, orientation, etc.). No errors are thrown.
To monitor this, I added a logging utility – track(...) is being called 10–30 times per second, as recommended in the documentation.
However: the stand does not move at all.
There is no visible reaction to the tracking calls.
Is there anything I'm missing or doing wrong?
Is VNDetectFaceRectanglesRequest supported for DockKit tracking, or are there hidden requirements?
Would really appreciate any help or pointers – thanks!
That's my complete code:
extension VideoFeedViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let frame = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
detectFace(image: frame)
func detectFace(image: CVPixelBuffer) {
let faceDetectionRequest = VNDetectFaceRectanglesRequest() { vnRequest, error in
guard let results = vnRequest.results as? [VNFaceObservation] else {
return
}
guard let observation = results.first else {
return
}
let boundingBoxHeight = observation.boundingBox.size.height * 100
#if canImport(DockKit)
if let dockAccessory = self.dockAccessory {
Task {
try? await trackRider(
observation.boundingBox,
dockAccessory,
frame,
sampleBuffer
)
}
}
#endif
}
let imageResultHandler = VNImageRequestHandler(cvPixelBuffer: image, orientation: .up)
try? imageResultHandler.perform([faceDetectionRequest])
func combineBoundingBoxes(_ box1: CGRect, _ box2: CGRect) -> CGRect {
let minX = min(box1.minX, box2.minX)
let minY = min(box1.minY, box2.minY)
let maxX = max(box1.maxX, box2.maxX)
let maxY = max(box1.maxY, box2.maxY)
let combinedWidth = maxX - minX
let combinedHeight = maxY - minY
return CGRect(x: minX, y: minY, width: combinedWidth, height: combinedHeight)
}
#if canImport(DockKit)
func trackObservation(_ boundingBox: CGRect, _ dockAccessory: DockAccessory, _ pixelBuffer: CVPixelBuffer, _ cmSampelBuffer: CMSampleBuffer) throws {
// Zähle den Aufruf
TrackMonitor.shared.trackCalled()
let invertedBoundingBox = CGRect(
x: boundingBox.origin.x,
y: 1.0 - boundingBox.origin.y - boundingBox.height,
width: boundingBox.width,
height: boundingBox.height
)
guard let device = captureDevice else {
fatalError("Kamera nicht verfügbar")
}
let size = CGSize(width: Double(CVPixelBufferGetWidth(pixelBuffer)),
height: Double(CVPixelBufferGetHeight(pixelBuffer)))
var cameraIntrinsics: matrix_float3x3? = nil
if let cameraIntrinsicsUnwrapped = CMGetAttachment(
sampleBuffer,
key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix,
attachmentModeOut: nil
) as? Data {
cameraIntrinsics = cameraIntrinsicsUnwrapped.withUnsafeBytes { $0.load(as: matrix_float3x3.self) }
}
Task {
let orientation = getCameraOrientation()
let cameraInfo = DockAccessory.CameraInformation(
captureDevice: device.deviceType,
cameraPosition: device.position,
orientation: orientation,
cameraIntrinsics: cameraIntrinsics,
referenceDimensions: size
)
let observation = DockAccessory.Observation(
identifier: 0,
type: .object,
rect: invertedBoundingBox
)
let observations = [observation]
guard let image = CMSampleBufferGetImageBuffer(sampleBuffer) else {
print("no image")
return
}
do {
try await dockAccessory.track(observations, cameraInformation: cameraInfo)
} catch {
print(error)
}
}
}
#endif
func clearDrawings() {
boundingBoxLayer?.removeFromSuperlayer()
boundingBoxSizeLayer?.removeFromSuperlayer()
}
}
}
}
@MainActor
private func getCameraOrientation() -> DockAccessory.CameraOrientation {
switch UIDevice.current.orientation {
case .portrait:
return .portrait
case .portraitUpsideDown:
return .portraitUpsideDown
case .landscapeRight:
return .landscapeRight
case .landscapeLeft:
return .landscapeLeft
case .faceDown:
return .faceDown
case .faceUp:
return .faceUp
default:
return .corrected
}
}
Is it possible to train a model using CreateML to infer a relevance numeric score of a news article based on similar trained data, something like a sentiment score ? I created a Text Classifier that assigns a category label which works perfect but I would like a solution that calculates a numeric value, not a label.
Topic:
Machine Learning & AI
SubTopic:
Create ML
Can't import data in create ML word tagging project
training data is 100% correct I guarantee it:
I mean look this one has one entry in it.
[
{
"tokens": [
"a", "august", "gruters"
],
"labels": [
"BUILDER", "BUILDER", "BUILDER"
]
}
]
Topic:
Machine Learning & AI
SubTopic:
Create ML
Incident Identifier: 4C22F586-71FB-4644-B823-A4B52D158057
CrashReporter Key: adc89b7506c09c2a6b3a9099cc85531bdaba9156
Hardware Model: Mac16,10
Process: PRISMLensCore [16561]
Path: /Applications/PRISMLens.app/Contents/Resources/app.asar.unpacked/node_modules/core-node/PRISMLensCore.app/PRISMLensCore
Identifier: com.prismlive.camstudio
Version: (null) ((null))
Code Type: ARM-64
Parent Process: ? [16560]
Date/Time: (null)
OS Version: macOS 15.4 (24E5228e)
Report Version: 104
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000 at 0x0000000000000000
Crashed Thread: 34
Application Specific Information:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[__NSArrayM insertObject:atIndex:]: object cannot be nil'
Thread 34 Crashed:
0 CoreFoundation 0x000000018ba4dde4 0x18b960000 + 974308 (__exceptionPreprocess + 164)
1 libobjc.A.dylib 0x000000018b512b60 0x18b4f8000 + 109408 (objc_exception_throw + 88)
2 CoreFoundation 0x000000018b97e69c 0x18b960000 + 124572 (-[__NSArrayM insertObject:atIndex:] + 1276)
3 Portrait 0x0000000257e16a94 0x257da3000 + 473748 (-[PTMSRResize addAdditionalOutput:] + 604)
4 Portrait 0x0000000257de91c0 0x257da3000 + 287168 (-[PTEffectRenderer initWithDescriptor:metalContext:useHighResNetwork:faceAttributesNetwork:humanDetections:prevTemporalState:asyncInitQueue:sharedResources:] + 6204)
5 Portrait 0x0000000257dab21c 0x257da3000 + 33308 (__33-[PTEffect updateEffectDelegate:]_block_invoke.241 + 164)
6 libdispatch.dylib 0x000000018b739b2c 0x18b738000 + 6956 (_dispatch_call_block_and_release + 32)
7 libdispatch.dylib 0x000000018b75385c 0x18b738000 + 112732 (_dispatch_client_callout + 16)
8 libdispatch.dylib 0x000000018b742350 0x18b738000 + 41808 (_dispatch_lane_serial_drain + 740)
9 libdispatch.dylib 0x000000018b742e2c 0x18b738000 + 44588 (_dispatch_lane_invoke + 388)
10 libdispatch.dylib 0x000000018b74d264 0x18b738000 + 86628 (_dispatch_root_queue_drain_deferred_wlh + 292)
11 libdispatch.dylib 0x000000018b74cae8 0x18b738000 + 84712 (_dispatch_workloop_worker_thread + 540)
12 libsystem_pthread.dylib 0x000000018b8ede64 0x18b8eb000 + 11876 (_pthread_wqthread + 292)
13 libsystem_pthread.dylib 0x000000018b8ecb74 0x18b8eb000 + 7028 (start_wqthread + 8)
Topic:
Machine Learning & AI
SubTopic:
General