Usually it involves Core ML models that take images as input.
When you’re using Core ML, it is often not enough to just put your image into a
CVPixelBuffer object. Even using Vision to drive Core ML won’t fix this issue.
What happens is that there is no standard format that deep learning models expect their images in. So you need to tell Core ML how to preprocess the image to convert it into the format that your model understands.
CVPixelBuffer usually contains pixels in RGBA format where each color channel is 8 bits. That means the pixel values in this image are between 0 and 255.
Note: You can construct
CVPixelBuffers using different pixel formats too, but RGBA is the most common. And by that I also mean ARGB, BGRA, and ABGR. These are all 32-bit formats where each color channel takes up 8 bits. If you’re using grayscale images, you need a
CVPixelBuffer with format
But your model may not actually expect pixel values between 0 and 255. Here are some common options:
- between 0 and 1
- between -1 and +1
- between -255 and +255 with the average values of R, G, and B subtracted
- the color channels in BGR order instead of RGB
For grayscale images, it’s important to know what value is considered black and what value is considered white. I’ve seen models where 0 is black and 1 is white, and others where 1 is black and 0 is white.
You need to tell Core ML about the pixel values used by your model.
If your model expects pixel values in a different range than 0 – 255, then you need to tell Core ML so it can convert the
CVPixelBuffer into the right format.
You do this in the Python script that converts the model.
For example, when you convert from Caffe or Keras you can specify the following options for
is_bgr: you typically need to set this to
Truefor Caffe models
green_bias: these will be added to the R, G, and B color values of each pixel
gray_bias: like the RGB biases but for grayscale images
image_scale: the pixel values will be multiplied by this number.
It’s very important that you pass in appropriate values for these options! With the wrong settings, coremltools will create a .mlmodel file that will interpret your input images wrongly. And then the model will produce outputs that don’t make sense.
If your model expects values in the range 0 – 1, you should set:
If your model expects values in the range -1 to +1, you should set:
image_scale=2/255.0 red_bias=-1 green_bias=-1 blue_bias=-1
If you model was trained on the ImageNet dataset, you will probably need to subtract the mean RGB values:
red_bias=-123.68 green_bias=-116.78 blue_bias=-103.94
Scaling happens before the bias is added, so if you set an
image_scale you will need to multiply your
red/green/blue_bias etc by this scale as well.
For Caffe models you can also specify the path to your
'mean.binaryproto' file (if you have one of those) that contains the average RGB values. You would use this instead of
Conclusion: If you did not train the model yourself, but you’re using a pretrained model that you downloaded from the web, you should try to find out what sort of preprocessing is done on the images before they go into the first neural network layer. You need to make Core ML do the exact same preprocessing, otherwise the model will be working on data it does not understand — and that results in wrong predictions.