If Apple’s announcements at WWDC have got you excited about adding machine learning to your iOS apps, you’re probably thinking, “I should use Core ML for this.”
However, Core ML is not the only choice. In this blog post I will list the possible ways you can add machine learning to your iOS apps.
These are the APIs you can choose from:
- using a cloud service
- Core ML
- high-level Metal with the MPS graph API
- low-level Metal Performance Shaders
- third-party APIs such as TensorFlow and Caffe
- rolling your own
Note that none of these APIs offer training on the device — they are for making predictions only. For the time being, training is still something you’ll need to do offline or in the cloud. (Interestingly, Clarifai recently announced a mobile SDK that allows for training on the device.)
Before I go into the pros and cons of these APIs, I want to make an important point about Core ML first…
Warning: Core ML is not magic
Even though it’s now easier than ever to include machine learning models into your apps, you still need to know a thing or two about machine learning.
With Core ML it’s literally possible to drag-and-drop a pre-trained model such as Inception-v3 into your app and then use it to make predictions — without having any further knowledge about machine learning. However, that limits you to only ever using other people’s models.
Most app developers will want to use their own models, or use existing models that work on their own data. If that’s you, this means you’ll have to learn how to build and train your own models.
And for that, you need to understand how machine learning works. If you’re new to the field, get ready to do some studying.
Note: Despite the hype, mobile devices are still quite limited in what they can do when it comes to machine learning. Don’t expect to see the latest deep learning research running in real-time on an iPhone or iPad. We’re at the point where deep learning is only just starting to become feasible on mobile devices. No doubt machine learning on mobile will be ubiquitous in two or three years time, but right now you’ll need to work within — or around — some very real limitations.
With that out of the way, let’s look at the possible machine learning APIs that are currently available for iOS.
By far the easiest way to add machine learning into your apps is by using a cloud service such as Clarifai, Google Cloud Vision, IBM Watson, Microsoft Azure Cognitive Services, Amazon Rekognition, and many others.
With these kinds of cloud services, Machine learning is simply a matter of calling a web API — any iOS developer will know how to do this. You can also host your own models in the cloud.
- Really easy to add into your app.
- Expensive. You’ll need to pay your cloud provider for using their services.
- Slow. Every request you make needs to go over the network. There’s no way you can use a cloud service to do deep learning on live video.
- Possible privacy issues. You’ll need to send the user’s data to the cloud service.
- Vendor lock-in. The success of your app will now (partially) depend on an outside party that you have little control over.
For an in-depth look at the tradeoffs between cloud and mobile machine learning, see my article Machine learning on mobile: on the device or in the cloud?
Core ML (iOS 11)
If you don’t want to use the cloud but run machine learning algorithms directly on the user’s device, then Core ML is the easiest solution. Just take a model and drop it into your Xcode project. Done.
Well, that’s not entirely true… For many machine learning models you’ll still need to process the input and output data that goes into / comes out of the model.
Models are very particular about the inputs they accept, so you’ll need to massage your data into a format that the model understands. Likewise, you have to convert the model’s output into something that your app can use.
For image data this is easy (use the Vision framework); for other data you’re on your own. But at least you won’t need to worry about doing any of the machine learning computations — that’s Core ML’s job.
Note: If you’re using Core ML or any of the other on-device solutions, you will need to obtain a model that is suitable for your app. The models that Apple makes available for download are only capable of a limited number of tasks. For most apps you’ll have to train your own models, which requires special expertise. And machine learning experts are in high demand these days, so they don’t come cheap. 💰
Pros of Core ML:
- Really easy to add into your app.
- Not just for deep learning: also does logistic regression, decision trees, and other “classic” machine learning models.
- Comes with a handy converter tool that supports several different training packages (Keras, Caffe, scikit-learn, and others).
- Core ML only supports a limited number of model types. If you trained a model that does something Core ML does not support, then you cannot use Core ML.
- The conversion tools currently support only a few training packages. A notable omission is TensorFlow, arguably the most popular machine learning tool out there. You can write your own converters, but this isn’t a job for a novice. (The reason TensorFlow is not supported is that it is a low-level package for making general computational graphs, while Core ML works at a much higher level of abstraction.)
- No flexibility, little control. The Core ML API is very basic, it only lets you load a model and run it. There is no way to add custom code to your models.
- iOS 11 and later only.
See also Alex Sosnovshchenko’s critique of Core ML: Why Core ML will not work for your app (most likely).
* * *
Regardless of whether you’re training your own model or you’re using a freely available one, you’re still going to have to convert the model to Core ML. And to do this you need to understand a little about how the model was trained, and the software that was used for training.
To convert models you don’t need to be an expert but you do need to have a grasp on the fundamentals of machine learning. The most common thing I see go wrong is that someone converts a model but skips over the step that converts the input (such as an image) into the format the model expects, and then the model will not make the correct predictions. As they say, garbage in equals garbage out.
It’s possible that we’re going to see online marketplaces for pre-trained Core ML models that are ready to use — but if you have a unique set of data that you want to train on, or a unique problem you want to solve, you’re going to have to get your hands dirty.
If you’re just getting your feet wet with machine learning on mobile, then Core ML is the way to go. But chances are you’ll run into some of its limitations before long. If it works for you, great. If not, then you’ll need to use one of the APIs I’ll describe next.
MPS graph API (iOS 11)
MPS stands for Metal Performance Shaders and is a collection of classes that let you unleash the power of Metal without having to write any GPU code — or really without having to know much about Metal at all.
As of iOS 11, you can create neural networks using the new graph API. You simply add the layers in your neural net to this graph and then MPS does the rest.
Note: I’m making a distinction here between the higher-level MPS graph API —
MPSNNGraph and related classes — and the lower-level MPS kernels such as
MPSCNNConvolution (covered in the next section).
Pros of the graph API:
- If your app wants to draw stuff based on the output of your model, then you can easily integrate the machine learning with your Metal rendering pipeline and run the whole thing on the GPU.
- When Core ML runs models on the GPU, it actually uses MPS but it hides this from you. By using MPS directly, you have more control over what happens than with Core ML (but only a little).
- Playing with Metal is fun.
- You do need to learn a bit about Metal before this API will make sense to you. For example, the inputs and outputs to your model need to be
MPSMatrixobjects and you need to understand how to convert your data to these objects. You don’t need to become an expert on GPU programming, but you also shouldn’t be scared of the idea.
- There is no nice converter tool like Core ML has. You have to implement the model yourself using the building blocks provided by the MPS graph API. This means you also need to convert the model from your training package (say Keras, TensorFlow, or Caffe) to the format that MPS understands. This can be tricky and requires a good understanding of Metal and the training package.
- You cannot create your own node types to add to the graph. So if you want to support a type of layer that MPS does not have, you cannot use the MPS graph API. Because of this, the MPS graph API is not much of an upgrade over Core ML in my opinion. If Core ML does not support your model, then chances are the MPS graph API also doesn’t.
- iOS 11 and later only.
I like the idea of the MPS graph API but in practice I’m just not feeling it. The world of deep learning moves fast. By using the graph API you’re limiting yourself to the operations that Apple has decided to implement. If next week a new paper comes out with a new layer type or a new activation function, then you cannot implement this with the MPS graph API. And even if Apple does make an effort to support new layer types in the future, you’ll always have to wait until the next OS upgrade to get them.
Note: I think we as developers would benefit from frameworks such as Core ML and the MPS graph API being decoupled from OS updates — and preferably even being open source. Every other machine learning framework is open source and they all have thriving communities of contributors. I can understand that Apple wants to keep some portions of their frameworks secret, and running an open source project takes up time and resources, but by keeping their frameworks behind closed doors Apple will always be a step behind everyone else. </rant>
Low-level MPS (iOS 10 and 11)
For full control over your GPU compute pipeline, Metal Performance Shaders (MPS) are the answer. Didn’t we just talk about MPS? Yes and no.
MPS is a framework that contains a lot of powerful functionality that runs on the GPU. This includes:
- image processing kernels
- matrix multiplication
- neural network layers
- the graph API that we just talked about
When you use the graph API, you take the MPS building blocks — known as kernels — and stick them into graph nodes. Then as you execute the graph, MPS will make all these nodes run on the GPU automatically and in the correct order.
But what I’m talking about here is using the MPS kernels by hand.
Instead of building a graph — where you have to play by the rules of the graph — you now instantiate the layer objects you need, such as
MPSCNNConvolution, and you encode them yourself into a Metal command buffer. It’s less convenient than building the graph, but in return you get full control.
Often you’ll combine MPS with your own Metal compute kernels. You use MPS to handle things like convolutional layers, but you’ll use your own Metal code for layer types that MPS does not support.
If Core ML doesn’t support want you want to do — and the MPS graph API isn’t helping out either — then these low-level compute kernels are what you turn to.
- You can do anything you want. If MPS does not support a certain layer type or activation function, you can simply write your own GPU compute kernel.
- Full control. Did I mention full control? You get full control.
- You need to know what you’re doing. This is no longer novice territory. You may have to write your own code to supplement what MPS has to offer. This involves writing Metal Shading Language code, which is a dialect of C++.
- It can take a while to implement a model. With Core ML you can run the conversion tool and be up and running in seconds. With MPS / Metal, you’re completely on your own. It can take several days to implement a model by hand from scratch. (Then again, you only go here if Core ML can’t handle your model in the first place, so you still come out ahead.)
- Programming in Metal can break your brain.
Even though it’s more work than Core ML or the graph API, using the MPS kernels in combination with your own compute shaders is where you’ll need to go if the other options don’t support the functionality you need.
Note: To make it easier to implement neural networks with Metal, I wrote an open source library called Forge. It lets you use the MPS kernels without most of the boilerplate. It also comes with a bunch of cool demos.
There are several non-Apple APIs for machine learning. I’ll highlight the two most popular: TensorFlow and Caffe.
You’ve probably heard about TensorFlow. It’s the most popular tool for building machine learning models and there’s a version that runs natively on iOS.
To get a better idea of what’s involved in using TensorFlow on iOS, check out the blog post I wrote about it.
- Fairly easy to export your model and load it into your app. It’s similar to using Core ML but you have to write a bit more boilerplate code on the iOS side.
- TensorFlow is the number one machine learning tool out there. If you want to find a pretrained model you can probably find a TensorFlow version. And if you hire someone to build a model for you, they probably know how to use TF.
- It’s kind of slow. TensorFlow on iOS does not use the GPU, only the CPU. For fairly basic models this is fine but deep learning will be s-l-o-w.
- The iOS version of TensorFlow does not support all operations, so your graph may not actually work on iOS. (TF comes with a tool to optimize the graph for mobile usage, which strips out all the operations for training etc.)
- The TensorFlow API is written in C++ so you’ll have to write some Objective-C++ code to use it from your iOS projects. There’s no Swift API (yet).
- The shared library is quite big, so it will add 10 to 40 MB to your app bundle.
Even though I don’t like how slow TensorFlow is, it’s still a great option if you’re still iterating on your model. It only takes a few lines of code to load the model into your app, and this can give you a quick way to evaluate how well your model is doing in practice — without having to spend days implementing it from scratch in Metal first. Once you have a model that works, you can decide to speed it up by converting to Metal.
Some time this fall Google is planning to release TensorFlow Lite, which supposedly is optimized for mobile devices. I’m curious to see how fast this will run on iOS, or if Google’s priority is Android first.
Note: An interesting alternative to using the TensorFlow library is Bender. This open source project lets you load TensorFlow models into your app but uses Metal Performance Shaders to actually run the models. This makes it a lot faster than using the TensorFlow library. Of course, this approach only supports a subset of the full TensorFlow functionality, but it’s worth checking out if you’ve got a TensorFlow model!
Caffe is one of the original deep learning tools and it’s still being used by many researchers. Many pretrained models are available in Caffe format.
The pros and cons are similar to those of TensorFlow: loading your models is fairly easy / you have to use C++ / it tends to be slower than re-implementing the model yourself from scratch using Metal Performance Shaders.
Roll your own
As a last-ditch option, you can skip the Apple APIs altogether and implement your own machine learning algorithms from scratch.
Why do this? At the moment, Core ML does not support Naive Bayes classification, for example. Neither does MPS.
Does that mean you can’t use a Naive Bayes model on iOS? Of course you can… but you’ll have to program it yourself. (Tip: search GitHub first.)
For models that are not deep neural networks, I recommend using the Accelerate framework. Accelerate has loads of functionality for writing really fast applications. It’s one of those obscure libraries that tend to scare away people — some of the naming conventions go back to 1960’s FORTRAN libraries — but it’s a great tool to have in your arsenal.
And if all else fails, you can write many machine learning algorithms directly in Swift or Objective-C. For basic models, straight Swift code will be fast enough. And more often than not, a basic model is all you need.
Pros of the DIY approach:
- You can do anything you want.
- You’ll need to learn how machine learning algorithms work internally.
- You may need to learn low-level programming, such as with the Accelerate framework.
- It might take a while to implement your own algorithms.
If Core ML does not support your particular model type and it’s not a deep neural network, then implementing the algorithms yourself is a valid strategy. There are plenty of books and online courses where you can learn how to do this.
As you can see, there are a lot of ways to do machine learning on iOS. Core ML is definitely the first thing you should try. And if your model is not supported by Core ML, it is usually possible to get it working using TensorFlow or Metal. 🤘