An in-depth look at Core ML 3

Matthijs Hollemans
by Matthijs Hollemans
8 June 2019

Table of contents

As you may have seen in the WWDC 2019 videos, Core ML 3 adds a lot of new stuff to machine learning on iOS. The new killer feature is on-device training of models, but it can now also run many advanced model architectures — and thanks to the addition of many new layer types, it should even be able to run new architectures that haven’t been invented yet!

Core ML logo

This is by far the biggest update to Core ML since its original release in 2017.

Core ML is not perfect, but with these new additions — as well as the A12 chip’s Neural Engine — Apple is definitely way ahead of everyone else when it comes to machine learning on mobile.

In this blog post I’ll describe in excruciating detail what Core ML 3’s new features are, all the way down to the individual layer types. We’re mostly going to look at the mlmodel format here, not the API from CoreML.framework (which, except for adding training functionality, didn’t really change much).

Do you need to know this stuff if you’re just looking to convert an existing model and use it in your app? Probably not. But it’s definitely good to have as a reference for when you’re designing your own machine learning models that you intend to use with Core ML 3, or when you’re not sure an existing model can be converted.

It’s all in the proto files

So, what is new? If you look at the API documentation for CoreML.framework, you won’t get much wiser. The nitty-gritty isn’t in the API docs but in the Core ML model format specification.

This specification consists of a number of .proto files containing protobuf message definitions. Protobuf, or “protocol buffers”, is the serialization format used by Core ML’s mlmodel files. It’s a common serialization technology, also used by TensorFlow and ONNX. The proto files describe the different objects that can be found in an mlmodel file.

You can find the Core ML model specification here but this website isn’t always up-to-date. It’s better to look directly at the proto files. You can find those inside the coremltools repo. They’re regular text files so you can open them with any editor.

Note: If you’re serious about using Core ML, I suggest getting familiar with the proto files and the protobuf format in general. This is the only place where the capabilities and limitations of Core ML are documented, and you’ll learn a ton by reading through them. You can read more about the mlmodel internals in my book, which I’m updating to include Core ML 3 as we speak.

The main file in the format specification is Model.proto. This defines what a model is, what kind of inputs and outputs a model can have, and what different types of models exist.

An important property of the Model definition is the specification version. This version number determines which functionality is available in the mlmodel file, and which operating system can run the model.

The new specification version is 4, not 3 as you might expect. There have been three major releases of Core ML, but there was also a small update for iOS 11.2 that bumped the version number.

Core ML models with specification version 4 can only run on iOS 13 and macOS 10.15 (Catalina) or better. If you’re targeting iOS 12 or even 11, forget about using any of the new features shown in this blog post.

Note: When you convert a model, coremltools will choose the lowest possible specification version that your model is compatible with. v3 models can run on iOS 12, v2 models on iOS 11.2, and v1 models on iOS 11.0. Of course, if your model uses any of the newly introduced features, it’s iOS 13 or later only.

New model types

Core ML has always supported the following model types (spec v1):

Specification version 2 was only a small update that added support for 16-bit floating point weights. Enabling this makes your mlmodel files about 2× smaller but unlike what some people expect, it does not make the model run any faster.

In Core ML 2 (spec v3), the following model types were added:

Other features added by v3 were weight quantization for even smaller mlmodel files (but still no change in inference speed) and flexible input sizes. The API added batch predictions and better support for dealing with sequential data.

OK, that was the story until now. As of Core ML 3 (spec v4), the following model types can also be described by your mlmodel files:

The Model object now has an isUpdatable property. If this is true, the model can be trained on-device with new data. This currently only works for neural networks and k-Nearest Neighbors (either as a standalone model or inside a Pipeline).

k-Nearest Neighbors is only a simple algorithm but this makes it quite suitable for on-device training. A common method is to have a fixed neural network, such as VisionFeaturePrint, extract the features from the input data, and then use k-NN to classify those feature vectors. Such a model is really fast to “train” because k-NN simply memorizes any examples you give it — it doesn’t do any actual learning.

One downside of k-NN is that making predictions becomes slow when you have a lot of examples memorized, but Core ML supports a K-D tree variant that should be quite efficient.

What’s new in neural networks

This is where it gets interesting… the vast majority of changes in Core ML 3 are related to neural networks.

Where Core ML 2 supported “only” about 40 different layer types, Core ML 3 adds over 100 new ones. But let’s not get carried away, some of these new layers are merely refinements of older layer types to make them suitable for handling flexible tensor shapes.

For Core ML 2 and earlier, the data that flowed through the neural network always was a tensor of rank 5. That means each tensor was made up of the following five dimensions, in this order:

(sequence length, batch size, channels, height, width)

This choice makes a lot of sense when the input to your neural network is mostly images, but it’s not very accommodating to other types of data.

For example, in a neural network that processes 1-dimensional vectors, you were supposed to use the “channels” dimension to describe the size of the vector and set the other dimensions to size 1. In that case, the shape of the input tensor would be (1, batch size, number of elements, 1, 1). That’s just awkward.

Many of the new layers that were added in Core ML 3 support tensors of arbitrary rank and shape, making Core ML much more suitable to data other than images.

Tip: This is why it’s important to know how to read the proto files, as they explain for each layer how tensors of different ranks are handled. This kind of thing isn’t documented anywhere else!

All the neural network stuff is described in NeuralNetwork.proto. It’s a big file at almost 5000 lines…

The main object is NeuralNetwork, although there are two other variations: NeuralNetworkClassifier and NeuralNetworkRegressor. The difference is that a plain neural network outputs MultiArray objects or images, while the classifier variant outputs a dictionary with the classes and their predicted probabilities, and the regressor simply outputs a numeric value. Other than that small difference in how the output is interpreted, these three model types all work the same.

The NeuralNetwork object has a list of layers, as well as a list of preprocessing options for any image inputs. Core ML 3 adds a few new properties that describe:

I plan to write a detailed blog post about on-device training soon, but just to give you an idea of what is involved:

In addition, the NetworkUpdateParameters object describes:

Update: New in beta 3 are shuffle and seed parameters that tell Core ML to randomly shuffle the training data before each epoch.

Note that hyperparameters such as the number of epochs, the learning rate, and so on, can be overridden inside the app. The values inside the mlmodel should be set to reasonable defaults but you’re not stuck with them if you don’t like them.

Neural network layers in Core ML 2

Now let’s get to the good stuff: the neural network layers. The first versions of Core ML supported only the following layer types:

Specification version 2 added support for custom layers in neural networks. This was a very welcome addition, as now it became possible to convert many more models.

Inside the mlmodel file, a custom layer is simply a placeholder, possibly with trained weights and configuration parameters. In your app, you’re supposed to provide a Swift or Objective-C implementation of the layer’s functionality, and possibly a Metal version as well for running it on the GPU. (Unfortunately, the Neural Engine isn’t currently an option for custom layers.)

For example, if a model requires an activation function that is not in the list above, it can be implemented as a custom layer. However, you can also do this by cleverly combining some of the other layer types. For example, a ReLU6 can be made by first doing a regular ReLU, then multiplying the data by -1, thresholding to -6, and finally multiplying by -1 again. That requires 4 different layers but — in theory — the Core ML framework could optimize this away at runtime.

In Core ML 2 (spec v3), the following layer types were added:

As you can tell, even though there are some layer types for dealing with vectors or sequence data, most of the layers are very focused on convolutional neural networks for dealing with images. In Core ML 2, all the layers expect tensors of shape (sequence length, batch size, channels, height, width) even if your data is only one-dimensional.

Core ML 3 (spec v4) relaxes that requirement a little bit for these existing layer types. For example, an inner product layer can now work on input tensors from rank 1 to rank 5. So in addition to adding a whole bunch of new layers, Core ML 3 also made the existing layer types more flexible.

Note: The doc comments inside NeuralNetwork.proto explain for each of these layers how tensors of different ranks are handled. If you’re ever wondering what the right tensor shape is for a layer, that’s the place to look.

Now let’s finally look at the new stuff!

The new neural network layers

Over 100 new layers… phew! It’s a lot but I’m going to list them all because it is useful to have this blog post as a reference. (Of course, the proto files are the authoritative source.)

Keep in mind that in the previous version of the spec, certain operations were combined into a single layer. For example, all the unary tensor operations were part of the layer type UnaryFunctionLayer. But with Core ML 3, a whole bunch of new unary operations were added and they all have their own layer type, which obviously inflates the total count.

Note: In the proto files, the name of every layer type ends with Params, so the unary function layer type is really named UnaryFunctionLayerParams. For the sake of readability, I’m dropping the “params” from the layer names here.

Core ML 3 adds the following layers for element-wise unary operations:

This seriously expands the number of math primitives supported by Core ML. Unlike the math functions it already had, these can deal with tensors of any rank.

There is only one new activation function:

Of course, you can use any unary function as an activation function, or create one by combining different math layers.

There are also new layer types for comparing tensors:

These output a new tensor that is 1.0 where the condition is true, and 0.0 where the condition is false (also known as a tensor “mask”). These layer types support broadcasting, so you can compare tensors of different ranks. You can also compare a tensor with a (hardcoded) scalar value.

One place where these layer types come in useful is with the new control flow operations (see below), so that you can branch based on the outcome of a comparison, or create a loop that keeps repeating until a certain condition becomes false.

Previously, there were a handful of layers for element-wise operations between two or more tensors. Core ML 3 adds a few new types, and — as you can tell from the name — these are now much more flexible because they fully support NumPy-style broadcasting:

Reduction operations have now moved into their own layers. Core ML already supported most or all of these, but these new versions can work with arbitrary tensors, not just images. You can now do the reduction over one or more axes.

Speaking of math stuff, Core ML 3 also adds the following:

A number of other existing operations have been extended to use arbitrary size tensors, also known as rank-N tensors or N-dimensional tensors. You can recognize such layer types by the “ND” in their name:

Slicing lets you keep only a part of the original tensor and throw away the rest. The old slicing layer could slice the input tensor across the width, height, or channel axis. Core ML 3 gives us two new slicing layers that support slicing across any axis:

I’m not 100% sure yet how these layers work as the documentation isn’t very helpful, but it looks like they can slice by indices or by a mask. In any case, these layers slice and dice!

Why two different versions? You’ll actually see this distinction between static and dynamic in some of the upcoming layer types too.

Static basically means, “everything is known about this operation beforehand” while dynamic means “the arguments of this operation can change between runs”. For example, the static version of a layer may have a hardcoded outputShape property while the dynamic version can use a different output shape every time.

Note: In the first version of Core ML, the size of the input was hardcoded — for example, only 224×224 images. Since version 2, Core ML has supported flexible input shapes, where you can tell the mlmodel that a given input can accept tensors between a minimum and maximum size, or from a list of predefined sizes. That sounds pretty dynamic! However, by dynamic operations in Core ML 3 we mean something slightly different…

Here “dynamic” means that inside the graph itself, from one run to the next, the shapes of the intermediate tensors may be different, even if the input tensor is always of the same size.

For example, if the first part of your model predicts a bounding box that then gets cropped and fed to the second part of your model, it’s likely that this bounding box and the resulting cropped tensor will have a different size every time. Therefore, the layers in the second part of the model cannot make any assumptions about the shape of that cropped tensor.

Because Core ML is no longer limited to static image-based models but now also contains methods for control flow and other dynamic operations, it has to be able to manipulate tensors in all kinds of fancy ways. Let’s look at those functions.

Notice how some of these layer types come in three different variants: Like, Static, and Dynamic. What do these mean?

Note: Static / dynamic isn’t always about the output shape, it depends on the layer. For example, in the random distribution layers (see next), you can set the random seed dynamically too. Some of the dynamic layers have several different inputs that let you override their default properties.

Core ML 3 also lets you create new tensors by sampling from random distributions:

There were already layers for reshaping and flattening layers, but more variants have been added:

Besides concat and split operations for arbitrary tensors, Core ML 3 also adds the following tensor manipulation operations:

Also new is support for gather and scatter:

Speaking of selecting elements based on some condition, here are a few more layer types for dealing with masks:

Beta 3 of coremltools 3.0 snuck in a few new layer types:

Finally — and perhaps most excitingly — Core ML 3 adds layers for control flow such as decision making and loops.

Previously, Core ML would run the neural network graph from top to bottom just once for each prediction. But now it can repeat certain parts of the graph and skip others. Exactly which parts of the neural network get executed by Core ML can vary between one run of the model and the next — this depends purely on the contents of your input data.

The control flow layers are:

Note that the BranchLayer and LoopLayer do not have outputs. They always pass control to one of their child NeuralNetwork objects, which will have an output of some kind. (I haven’t tried it, but it seems reasonable to assume you can nest these loops and branches too.)

For an example of how to use these new control flow layers, check out this Jupyter notebook from the coremltools repo. It shows how to implement a simple iterative process inside the Core ML model and uses a many of the new layer types.

The example works like this:

  1. use a LoadConstantND layer to load the value 0 into the output named iteration_count
  2. add a LoopLayer that will loop for a certain maximum number of iterations
  3. inside the loop, add a new neural network that performs some kind of computation
  4. at the end of the computation, use a arithmetic layer to increment the current value from iteration_count, and then a CopyLayer to overwrite the value inside the iteration_count output
  5. they also use another CopyLayer to copy the result of the computation back into the original tensor, so that the next iteration of the loop can use this new value
  6. add a LessThanLayer to compare the output of the computation to some convergence threshold, and feed this yes/no result into a BranchLayer
  7. add a new neural network to the BranchLayer that just has LoopBreakLayer inside it. In other words, if the branch happens — because the output of the computation was small enough to go under the convergence threshold — then we’ll break out of the loop.

It’s a little weird perhaps, but very flexible! Key point is to remember to use the CopyLayer to overwrite existing tensors with new values, much like an assignment statement in Swift. After you run the model, the iteration_count output will now have counted how many times the loop was repeated. Of course, this count may be different every time, depending on the values of the inputs to the model, as some will converge quicker than others. Pretty cool!

Thanks to these control flow layers, Core ML 3 graphs can go way beyond the traditional acyclic graphs. However, you only get branches and loops — there is currently no such thing as a “goto”. Core ML is not Turing-complete quite yet. 😁

At the very bottom of NeuralNetwork.proto are the layer definitions for on-device training. We already briefly looked at those, but here they are again:

Note: I find it a little odd that the loss function is defined in the mlmodel file. This makes it impossible to train with other loss functions. Likewise for the optimizers. Perhaps a future version of Core ML will allow us to provide custom implementations of these.

And that’s it, those are all the new layer types in Core ML 3!

Most of the new layer types are for creating, shaping, and manipulating tensors. There are also many new mathematics primitives. Not a whole lot of “real” neural network stuff has been added. But having these low-level operations will make it a lot easier to support all kinds of new, still unimagined, layer types.

Then again, if implementing a new layer type requires adding 20 different math layers to your Core ML mlmodel, it might be faster to write a custom layer… 😉

Written by Matthijs Hollemans.
First published on Saturday, 8 June 2019.
If you liked this post, say hi on Twitter @mhollemans or LinkedIn.
Find the source code on my GitHub.

Code Your Own Synth Plug-Ins With C++ and JUCENew e-book: Code Your Own Synth Plug-Ins With C++ and JUCE
Interested in how computers make sound? Learn the fundamentals of audio programming by building a fully-featured software synthesizer plug-in, with every step explained in detail. Not too much math, lots of in-depth information! Get the book at Leanpub.com