Help!? The output of my Core ML model is wrong…

Matthijs Hollemans
by Matthijs Hollemans
26 July 2017

Table of contents

This is a question that I’ve seen asked multiple times over the past weeks on Stack Overflow, the Apple Developer Forums, and various Slack groups.

Usually it involves Core ML models that take images as input.

When you’re using Core ML, it is often not enough to just put your image into a CVPixelBuffer object. Even using Vision to drive Core ML won’t fix this issue.

What happens is that there is no standard format that deep learning models expect their images in. So you need to tell Core ML how to preprocess the image to convert it into the format that your model understands.

A CVPixelBuffer usually contains pixels in RGBA format where each color channel is 8 bits. That means the pixel values in this image are between 0 and 255.

Note: You can construct CVPixelBuffers using different pixel formats too, but RGBA is the most common. And by that I also mean ARGB, BGRA, and ABGR. These are all 32-bit formats where each color channel takes up 8 bits. If you’re using grayscale images, you need a CVPixelBuffer with format kCVPixelFormatType_OneComponent8.

But your model may not actually expect pixel values between 0 and 255. Here are some common options:

For grayscale images, it’s important to know what value is considered black and what value is considered white. I’ve seen models where 0 is black and 1 is white, and others where 1 is black and 0 is white.

You need to tell Core ML about the pixel values used by your model.

If your model expects pixel values in a different range than 0 – 255, then you need to tell Core ML so it can convert the CVPixelBuffer into the right format.

You do this in the Python script that converts the model.

For example, when you convert from Caffe or Keras you can specify the following options for coremltools.converters.caffe.convert() and keras.convert():

It’s very important that you pass in appropriate values for these options! With the wrong settings, coremltools will create a .mlmodel file that will interpret your input images wrongly. And then the model will produce outputs that don’t make sense.

Some examples:

If your model expects values in the range 0 – 1, you should set:

image_scale=1/255.0

If your model expects values in the range -1 to +1, you should set:

image_scale=2/255.0
red_bias=-1
green_bias=-1
blue_bias=-1

If you model was trained on the ImageNet dataset, you will probably need to subtract the mean RGB values:

red_bias=-123.68
green_bias=-116.78
blue_bias=-103.94

Scaling happens before the bias is added, so if you set an image_scale you will need to multiply your red/green/blue_bias etc by this scale as well.

For Caffe models you can also specify the path to your 'mean.binaryproto' file (if you have one of those) that contains the average RGB values. You would use this instead of red/green/blue_bias.

Conclusion: If you did not train the model yourself, but you’re using a pretrained model that you downloaded from the web, you should try to find out what sort of preprocessing is done on the images before they go into the first neural network layer. You need to make Core ML do the exact same preprocessing, otherwise the model will be working on data it does not understand — and that results in wrong predictions.

Written by Matthijs Hollemans.
First published on Wednesday, 26 July 2017.
If you liked this post, say hi on Twitter @mhollemans or LinkedIn.
Find the source code on my GitHub.

Code Your Own Synth Plug-Ins With C++ and JUCENew e-book: Code Your Own Synth Plug-Ins With C++ and JUCE
Interested in how computers make sound? Learn the fundamentals of audio programming by building a fully-featured software synthesizer plug-in, with every step explained in detail. Not too much math, lots of in-depth information! Get the book at Leanpub.com