How to convert images to MLMultiArray

I see the following questions come up a lot on Stack Overflow, the Apple developer forums, and various Slack groups:

My neural network works on images but the Core ML model expects an MLMultiArray object. How do I convert my UIImage to an MLMultiArray?

My neural network outputs an MLMultiArray but how do I convert this back into a UIImage?

My model outputs an image but the UIImage is all black.

People run into these issues because most training tools, such as Keras or PyTorch, treat images just like any other data — as an n-dimensional array.

But in Core ML images are special!

A common issue is that, after converting a model to Core ML, the data type of the input is “multi-array” and not “image”. This is often easily fixed, but sometimes it requires a bit of model surgery.

How to properly handle images with your Core ML models is described in detail in my e-book Core ML Survival Guide — along with many other handy tips & tricks! Because the topic comes up so often, I decided to write about it on my blog as well.

If you’re having problems using images in Core ML, read on!

Image as input

The situation: You have a Core ML model with an input that is a multi-array of shape (3, height, width).

But in your app you have an image object, such as a UIImage.

How do you convert the UIImage to an MLMultiArray object that you can pass to the model?

The answer is: You probably shouldn’t do this!

A better solution is to change the model to expect an image instead of a multi-array. Now Core ML will directly accept your UIImage as the input — no need to convert it to an MLMultiArray first.

If a model works on images, it should accept images as input! Multi-arrays are useful for other kinds of data, but are not intended for images.

You can probably do this during conversion

This problem is easy to fix if you have access to the original model.

Most Core ML converters have an image_input_names option that tells the model which inputs should be treated as an image.

In your conversion script, simply provide the additional argument image_input_names="your_input" to the convert function, where "your_input" is the name of the input that should take an image.

For a model with multiple image inputs, you can supply a list of names: image_input_names=["first_input", "second_input"].

Note that this only works if the shape of the input is (3, height, width) for color images, or just (height, width) for grayscale images.

Note: Core ML doesn’t currently work with 4 channel images. The alpha-channel in such images is simply ignored.

Changing the mlmodel file afterwards

If you only have the .mlmodel file and not the original model or the conversion script — or if the converter does not have the image_input_names option — you can still fix this using a bit of Python.

import coremltools
import coremltools.proto.FeatureTypes_pb2 as ft 

spec = coremltools.utils.load_spec("YourModel.mlmodel")

input = spec.description.input[0]
input.type.imageType.colorSpace = ft.ImageFeatureType.RGB
input.type.imageType.height = 224 
input.type.imageType.width = 224

coremltools.utils.save_spec(spec, "YourNewModel.mlmodel")

This script changes the data type of the first input, input[0], to expect an image.

This assumes that the image is RGB. If necessary, change colorSpace to GRAYSCALE or BGR. You should use BGR if the model was trained with Caffe or if OpenCV was used to load the training images. Core ML will automatically convert the pixel order of your input images to whatever format the model requires, but it can only do this if you set colorSpace correctly.

Also make sure that the width and height are correct for your model!

Tip: To support flexible image sizes, see the chapter Size Flexibility in the Core ML Survival Guide.

Don’t forget preprocessing

A Core ML model includes a special preprocessing stage for image inputs.

Usually the original model normalizes the image tensor. For example, by converting the pixels from the range [0, 255] to [-1, +1].

You need to set this up in your Core ML model as well. Getting the preprocessing wrong is the number one reason why people get incorrect predictions out of their Core ML models!

To learn more about this, see my blog post Help!? The output of my Core ML model is wrong… And of course you can read all about this in the Core ML Survival Guide.

You need a CVPixelBuffer

OK, I lied a little when I said that Core ML could directly use UIImage objects. 😰

The Core ML API requires images to be CVPixelBuffer objects. But in the app you probably have the image as a UIImage, a CGImage, a CIImage, or an MTLTexture.

In that case, you will still need to convert your image to a CVPixelBuffer object. Fortunately, this is easy:

On iOS 13 and macOS 10.15 you can use the new MLFeatureValue(cgImage:) and MLFeatureValue(imageAt:) APIs.
You can use VNCoreMLRequest from the Vision framework, which lets you pass in a CGImage object that you can easily obtain from your UIImage.
Or you can use the helper code from CoreMLHelpers.

Converting UIImage → MLMultiArray

If — for some strange reason — you really must convert the UIImage into an MLMultiArray anyway, here’s how to do it:

Create an MLMultiArray of type .double with shape (3, height, width).
Loop through the pixels in the image and copy the color values into the rows and columns of this new MLMultiArray object.
Don’t forget the preprocessing! You can do this inside the loop. For example, to convert a pixel from [0, 255] to [-1, 1], first divide the pixel value by 127.5, then subtract 1, and put the resulting value into the MLMultiArray.

I don’t have any code for this (because it’s a bad idea) but it’s basically the opposite of what happens in MLMultiArray+Image.swift in CoreMLHelpers.

More trouble than it’s worth, really — it’s much smarter to let Core ML handle this for you. 😁

Images as output

What if your model outputs an image?

As described above, the converters provided with coremltools have an option image_input_names that tells the converter which of the inputs should be treated as images, so that Core ML lets you pass in a CVPixelBuffer object.

However, there is no image_output_names. So if you have an image-to-image model, any image outputs will become multi-array outputs in the mlmodel. That’s not handy.

If a model predicts an image, it should output some kind of image object too! In Core ML, that would be a CVPixelBuffer object.

There are two things you can do to get a CVPixelBuffer as output from Core ML:

convert the MLMultiArray to an image yourself, or
change the mlmodel so that it knows the output should be an image.

I recommend against using option 1. It is slow and unnecessary because you can let Core ML do it for you (option 2). But in case you want to do the conversion yourself, check out the MLMultiArray+Image extension in CoreMLHelpers.

Changing the output type to image

Even though the converter doesn’t let you specify that an output should be an image, you can always change this in the mlmodel file afterwards.

Load the mlmodel into a spec object:

import coremltools
import coremltools.proto.FeatureTypes_pb2 as ft 

spec = coremltools.utils.load_spec("YourModel.mlmodel")

Let’s say that print(spec.description) gives the following:

output {
  name: "generatedImage"
  type {
    multiArrayType {
      shape: 3
      shape: 300
      shape: 150
      dataType: DOUBLE
    }
  }
}

Then you can turn this output description into an image by writing:

output = spec.description.output[0]

import coremltools.proto.FeatureTypes_pb2 as ft
output.type.imageType.colorSpace = ft.ImageFeatureType.RGB
output.type.imageType.height = 300
output.type.imageType.width = 150

coremltools.utils.save_spec(spec, "YourNewModel.mlmodel")

And now print(spec.description) shows the output is an RGB image:

output {
  name: "generatedImage"
  type {
    imageType {
      width: 150
      height: 300
      colorSpace: RGB
    }
  }
}

If you want the image to have BGR pixel order, write colorSpace = ft.ImageFeatureType.BGR instead of RGB.

Important: You can’t just turn any multi-array into an image. Pay attention to the shape properties in the multiArrayType structure:

the first shape is the number of channels
the second is the image’s height
the third is the image’s width

You can only turn a multi-array into an RGB or BGR image if the number of channels is 3.

If the number of channels is 1, or if there are only two shape values — height and width — then you must use colorSpace = ft.ImageFeatureType.GRAYSCALE.

If there are more than three shape values listed, and the first or last of these are 1, then you can delete these unused dimensions. For example, if you have:

multiArrayType {
  shape: 1
  shape: 3
  shape: 300
  shape: 150
  dataType: DOUBLE
}

Then do the following to get rid of that first dimension with size 1:

del output.type.multiArrayType.shape[0]

Removing a dimension of size 1 doesn’t change the actual data, and now it lets Core ML interpret the data as being an image.

Don’t forget the postprocessing

If your Core ML model includes preprocessing, for example to convert the pixels from the range [0, 255] to [-1, +1], it’s likely that the output of your model is also in that [-1, +1] range.

For Core ML to turn the output data into a CVPixelBuffer, it must be a tensor with values in the range 0 – 255. Larger values or negative values will be clipped to this range.

This means that, if your model doesn’t already output pixels from 0 – 255, you’ll need to add some postprocessing layers.

For outputs in the range [-1, +1], you’ll need to add two operations:

add 1 to the tensor to put it in the range [0, 2]
multiply the tensor by 127.5

This is easiest if you add these operations in the original model and then convert it to Core ML.

In case you don’t have access to the original model, you can also change this directly in the mlmodel file by adding new layers at the end. For more info on how to insert new layers into an existing mlmodel file, see the Core ML Survival Guide.

If you forget to do this postprocessing, your image will be black or the colors will be wrong in some other weird way. Usually the postprocessing needs to be the inverse of the preprocessing.

You get a CVPixelBuffer

For image outputs, Core ML gives you a CVPixelBuffer object.

If you used Vision, you’ll get a VNPixelBufferObservation object that contains a CVPixelBuffer.

Fortunately, a CVPixelBuffer is easy enough to convert into a UIImage. Here is one way, using Core Image:

let ciImage = CIImage(cvPixelBuffer: pixelBuffer)
let resultImage = UIImage(ciImage: ciImage)

Conclusion

Core ML can work with image inputs and outputs just fine, but remember:

you need to set the type of the inputs and/or outputs to accept images instead of multi-arrays
you need to add preprocessing to normalize the pixel values
for models with image outputs, you probably need to add postprocessing too

Letting Core ML handle the conversion from CVPixelBuffers to tensors and back is a lot easier — and more efficient — than doing converting your images to/from MLMultiArray objects by hand!

For more tricks like this, check out my book Core ML Survival Guide. It has a ton of info on how to get the most out of Core ML.

Written by Matthijs Hollemans.
First published on Monday, 9 December 2019.
If you liked this post, say hi on Twitter @mhollemans or LinkedIn.
Find the source code on my GitHub.

New e-book: Code Your Own Synth Plug-Ins With C++ and JUCE
Interested in how computers make sound? Learn the fundamentals of audio programming by building a fully-featured software synthesizer plug-in, with every step explained in detail. Not too much math, lots of in-depth information! Get the book at Leanpub.com

Table of contents