I see the following questions come up a lot on Stack Overflow, the Apple developer forums, and various Slack groups:
My neural network works on images but the Core ML model expects an MLMultiArray object. How do I convert my UIImage to an MLMultiArray?
My neural network outputs an MLMultiArray but how do I convert this back into a UIImage?
My model outputs an image but the UIImage is all black.
People run into these issues because most training tools, such as Keras or PyTorch, treat images just like any other data — as an n-dimensional array.
But in Core ML images are special!
A common issue is that, after converting a model to Core ML, the data type of the input is “multi-array” and not “image”. This is often easily fixed, but sometimes it requires a bit of model surgery.
How to properly handle images with your Core ML models is described in detail in my e-book Core ML Survival Guide — along with many other handy tips & tricks! Because the topic comes up so often, I decided to write about it on my blog as well.
If you’re having problems using images in Core ML, read on!
Image as input
The situation: You have a Core ML model with an input that is a multi-array of shape (3, height, width).
But in your app you have an image object, such as a UIImage
.
How do you convert the UIImage
to an MLMultiArray
object that you can pass to the model?
The answer is: You probably shouldn’t do this!
A better solution is to change the model to expect an image instead of a multi-array. Now Core ML will directly accept your UIImage
as the input — no need to convert it to an MLMultiArray
first.
If a model works on images, it should accept images as input! Multi-arrays are useful for other kinds of data, but are not intended for images.
You can probably do this during conversion
This problem is easy to fix if you have access to the original model.
Most Core ML converters have an image_input_names
option that tells the model which inputs should be treated as an image.
In your conversion script, simply provide the additional argument image_input_names="your_input"
to the convert
function, where "your_input"
is the name of the input that should take an image.
For a model with multiple image inputs, you can supply a list of names: image_input_names=["first_input", "second_input"]
.
Note that this only works if the shape of the input is (3, height, width) for color images, or just (height, width) for grayscale images.
Note: Core ML doesn’t currently work with 4 channel images. The alpha-channel in such images is simply ignored.
Changing the mlmodel file afterwards
If you only have the .mlmodel file and not the original model or the conversion script — or if the converter does not have the image_input_names
option — you can still fix this using a bit of Python.
import coremltools
import coremltools.proto.FeatureTypes_pb2 as ft
spec = coremltools.utils.load_spec("YourModel.mlmodel")
input = spec.description.input[0]
input.type.imageType.colorSpace = ft.ImageFeatureType.RGB
input.type.imageType.height = 224
input.type.imageType.width = 224
coremltools.utils.save_spec(spec, "YourNewModel.mlmodel")
This script changes the data type of the first input, input[0]
, to expect an image.
This assumes that the image is RGB. If necessary, change colorSpace
to GRAYSCALE
or BGR
. You should use BGR if the model was trained with Caffe or if OpenCV was used to load the training images. Core ML will automatically convert the pixel order of your input images to whatever format the model requires, but it can only do this if you set colorSpace
correctly.
Also make sure that the width and height are correct for your model!
Tip: To support flexible image sizes, see the chapter Size Flexibility in the Core ML Survival Guide.
Don’t forget preprocessing
A Core ML model includes a special preprocessing stage for image inputs.
Usually the original model normalizes the image tensor. For example, by converting the pixels from the range [0, 255] to [-1, +1].
You need to set this up in your Core ML model as well. Getting the preprocessing wrong is the number one reason why people get incorrect predictions out of their Core ML models!
To learn more about this, see my blog post Help!? The output of my Core ML model is wrong… And of course you can read all about this in the Core ML Survival Guide.
You need a CVPixelBuffer
OK, I lied a little when I said that Core ML could directly use UIImage
objects. 😰
The Core ML API requires images to be CVPixelBuffer
objects. But in the app you probably have the image as a UIImage
, a CGImage
, a CIImage
, or an MTLTexture
.
In that case, you will still need to convert your image to a CVPixelBuffer
object. Fortunately, this is easy:
On iOS 13 and macOS 10.15 you can use the new
MLFeatureValue(cgImage:)
andMLFeatureValue(imageAt:)
APIs.You can use
VNCoreMLRequest
from the Vision framework, which lets you pass in aCGImage
object that you can easily obtain from yourUIImage
.Or you can use the helper code from CoreMLHelpers.
Converting UIImage → MLMultiArray
If — for some strange reason — you really must convert the UIImage
into an MLMultiArray
anyway, here’s how to do it:
Create an
MLMultiArray
of type.double
with shape (3, height, width).Loop through the pixels in the image and copy the color values into the rows and columns of this new
MLMultiArray
object.Don’t forget the preprocessing! You can do this inside the loop. For example, to convert a pixel from [0, 255] to [-1, 1], first divide the pixel value by 127.5, then subtract 1, and put the resulting value into the
MLMultiArray
.
I don’t have any code for this (because it’s a bad idea) but it’s basically the opposite of what happens in MLMultiArray+Image.swift in CoreMLHelpers.
More trouble than it’s worth, really — it’s much smarter to let Core ML handle this for you. 😁
Images as output
What if your model outputs an image?
As described above, the converters provided with coremltools have an option image_input_names
that tells the converter which of the inputs should be treated as images, so that Core ML lets you pass in a CVPixelBuffer
object.
However, there is no image_output_names
. So if you have an image-to-image model, any image outputs will become multi-array outputs in the mlmodel. That’s not handy.
If a model predicts an image, it should output some kind of image object too! In Core ML, that would be a CVPixelBuffer
object.
There are two things you can do to get a CVPixelBuffer
as output from Core ML:
- convert the
MLMultiArray
to an image yourself, or - change the mlmodel so that it knows the output should be an image.
I recommend against using option 1. It is slow and unnecessary because you can let Core ML do it for you (option 2). But in case you want to do the conversion yourself, check out the MLMultiArray+Image extension in CoreMLHelpers.
Changing the output type to image
Even though the converter doesn’t let you specify that an output should be an image, you can always change this in the mlmodel file afterwards.
Load the mlmodel into a spec object:
import coremltools
import coremltools.proto.FeatureTypes_pb2 as ft
spec = coremltools.utils.load_spec("YourModel.mlmodel")
Let’s say that print(spec.description)
gives the following:
output {
name: "generatedImage"
type {
multiArrayType {
shape: 3
shape: 300
shape: 150
dataType: DOUBLE
}
}
}
Then you can turn this output description into an image by writing:
output = spec.description.output[0]
import coremltools.proto.FeatureTypes_pb2 as ft
output.type.imageType.colorSpace = ft.ImageFeatureType.RGB
output.type.imageType.height = 300
output.type.imageType.width = 150
coremltools.utils.save_spec(spec, "YourNewModel.mlmodel")
And now print(spec.description)
shows the output is an RGB image:
output {
name: "generatedImage"
type {
imageType {
width: 150
height: 300
colorSpace: RGB
}
}
}
If you want the image to have BGR pixel order, write colorSpace = ft.ImageFeatureType.BGR
instead of RGB
.
Important: You can’t just turn any multi-array into an image. Pay attention to the shape
properties in the multiArrayType
structure:
- the first
shape
is the number of channels - the second is the image’s height
- the third is the image’s width
You can only turn a multi-array into an RGB or BGR image if the number of channels is 3.
If the number of channels is 1, or if there are only two shape
values — height and width — then you must use colorSpace = ft.ImageFeatureType.GRAYSCALE
.
If there are more than three shape
values listed, and the first or last of these are 1, then you can delete these unused dimensions. For example, if you have:
multiArrayType {
shape: 1
shape: 3
shape: 300
shape: 150
dataType: DOUBLE
}
Then do the following to get rid of that first dimension with size 1:
del output.type.multiArrayType.shape[0]
Removing a dimension of size 1 doesn’t change the actual data, and now it lets Core ML interpret the data as being an image.
Don’t forget the postprocessing
If your Core ML model includes preprocessing, for example to convert the pixels from the range [0, 255] to [-1, +1], it’s likely that the output of your model is also in that [-1, +1] range.
For Core ML to turn the output data into a CVPixelBuffer
, it must be a tensor with values in the range 0 – 255. Larger values or negative values will be clipped to this range.
This means that, if your model doesn’t already output pixels from 0 – 255, you’ll need to add some postprocessing layers.
For outputs in the range [-1, +1], you’ll need to add two operations:
- add 1 to the tensor to put it in the range [0, 2]
- multiply the tensor by 127.5
This is easiest if you add these operations in the original model and then convert it to Core ML.
In case you don’t have access to the original model, you can also change this directly in the mlmodel file by adding new layers at the end. For more info on how to insert new layers into an existing mlmodel file, see the Core ML Survival Guide.
If you forget to do this postprocessing, your image will be black or the colors will be wrong in some other weird way. Usually the postprocessing needs to be the inverse of the preprocessing.
You get a CVPixelBuffer
For image outputs, Core ML gives you a CVPixelBuffer
object.
If you used Vision, you’ll get a VNPixelBufferObservation
object that contains a CVPixelBuffer
.
Fortunately, a CVPixelBuffer
is easy enough to convert into a UIImage
. Here is one way, using Core Image:
let ciImage = CIImage(cvPixelBuffer: pixelBuffer)
let resultImage = UIImage(ciImage: ciImage)
Conclusion
Core ML can work with image inputs and outputs just fine, but remember:
- you need to set the type of the inputs and/or outputs to accept images instead of multi-arrays
- you need to add preprocessing to normalize the pixel values
- for models with image outputs, you probably need to add postprocessing too
Letting Core ML handle the conversion from CVPixelBuffers
to tensors and back is a lot easier — and more efficient — than doing converting your images to/from MLMultiArray
objects by hand!
For more tricks like this, check out my book Core ML Survival Guide. It has a ton of info on how to get the most out of Core ML.
First published on Monday, 9 December 2019.
If you liked this post, say hi on Twitter @mhollemans or LinkedIn.
Find the source code on my GitHub.
New e-book: Code Your Own Synth Plug-Ins With C++ and JUCE
Interested in how computers make sound? Learn the fundamentals of audio programming by building a fully-featured software synthesizer plug-in, with every step explained in detail. Not too much math, lots of in-depth information! Get the book at Leanpub.com