Machine learning on mobile: on the device or in the cloud?

Matthijs Hollemans
by Matthijs Hollemans
16 February 2017

Table of contents

So you’ve decided it’s time to add some of this hot new machine learning or deep learning stuff into your app… Great! But what are your options?

For many machine learning solutions, the approach is as follows:

  1. gather data
  2. use that data to train a model
  3. use the model to make predictions

Let’s say you want to make a “celebrity match” app that tells people which famous person they most look alike.

You need to first gather a lot of photos of celebrities’ faces. Then you train a deep learning network on these photos to teach it what each celebrity looks like.

The model you’re using would be some kind of convolutional neural network and you’ve trained it for the specific purpose of comparing people’s faces with the faces of celebrities.

Training is a difficult and expensive process. But once you have the trained model, performing “inference” — or in other words, making predictions — is easy.

You can give the model a selfie and it will instantly say, “You look 85% like George Clooney but you have Lady Gaga’s eyes!”

I know someone who looks just like this!

Exactly what data you need, what sort of model you should design, and how you should train this model all depend on the particular application you have in mind.

But you also need to make some choices about how to put your machine learning system into production. That’s what this blog post is about.

The main things you need to decide are:

In other words, should you use a cloud service to do the deep learning, or maybe roll your own? Let’s find out!

The quick & easy option

The first question to ask is: Do you even need your own model? By far the easiest way to get started is to use someone else’s model.

There is a growing number of companies that provide machine learning services tailored for specific purposes, such as speech recognition, text analysis, or image classification.

You don’t get access to their models directly — that’s their secret sauce — but you can use these models through an API.

Examples of such machine learning as-a-service providers:

and many others… these kinds of services are popping up all over.

If your app needs to perform one of the specific tasks offered by these services, you should definitely consider using them.

Fully managed machine learning

How this works: Your mobile app simply sends an HTTPS request to a web service along with any required data, such as a photo taken by the device’s camera, and within seconds the service replies with the prediction results.

Of course, you need to pay for this privilege, usually by the request. But you have nothing else to worry about: the only thing you need to do is talk to the service’s API endpoint from within your app. Often there is an SDK that makes the service easy to integrate.

The service provider keeps their model up-to-date by re-training it (using their own data), but this all happens behind the scenes. Any time they improve their model, you automatically benefit from it. You don’t have to know anything about machine learning in order to use these services.

Upsides of using such a “canned” machine learning service:

Downsides of this approach:

Note: Some of these services may actually allow a limited kind of training. Clarifai, for example, lets you create your own model by uploading custom training images. This way you can augment their existing model to get better predictions for your own particular kind of images.

Using a fully-managed machine learning service is a great option if an existing model suits all your needs. For a lot of mobile apps this is the absolute right thing to do!

Training your own model

If your data is unique in some sense or existing solutions are not satisfactory, you’ll need to train your own model.

Data is key to doing successful machine learning. It’s the quality and quantity of the data that makes all the difference. If you’re going to do your own training, you need lots of data.

Once you have gathered your training data, the next decision is where and how to train. It depends on the complexity of your model and the amount of training data that you have.

Unless you have your own data center or are rolling in cash, it makes most sense to rent computer power. There are many cloud platforms that are happy to take your money. These days you can even rent GPUs-in-the-cloud for training deep learning systems.

So it’s for you to decide: is renting cheaper than buying? However, there are other considerations besides price. Let’s look at some of those.

What about training on the device? If all the data you need to make predictions is available on the user’s device — and you don’t need data from other sources — then it might be an option to do the training on the device itself and not use the cloud at all. However, this will only work for small datasets and basic machine learning algorithms. It’s not something you can realistically do with deep learning.

Training in the cloud

There are two options here:

We’ll look at general purpose cloud computing first.

How this works: You rent one or more computers in someone else’s data center. Whatever you do on these computers is completely up to you. To train your model, you give the cloud computer access to your training data and run your favorite training software on it.

When you’re done training, you download the learned parameters for the model and delete the compute instance. You only pay for the compute hours used to train the model. Now you have a trained model that you can use anywhere you like.

Training with cloud computing

Examples of this kind of service are Amazon EC2 and Azure Virtual Machines. For deep learning you can even rent instances with fast GPUs.



Note: The above discussion only pertains to training the machine learning model, not to inference. Once you’ve trained your model you’ll need to make it available to the app somehow, so it can use the model for making predictions. If you decide to do this locally on the device then you just need to embed the model into your mobile app. But if you want to do inference in the cloud, you’ll still need to set up your own web service for inference. That comes with its own set of considerations (more about this below).

Hosted machine learning in the cloud

The other cloud option is hosted machine learning.

Several companies, such as Amazon, Microsoft, and Google, now offer machine learning as a service on top of their existing cloud services.

How this works: You don’t need to have the expertise to train models. You simply upload your data, choose the kind of model you want to use, and let the machine learning service take care of all the details.

Hosted machine learning in the cloud

This option is the step between using a fully managed service and doing everything yourself. It’s definitely a lot easier than doing your own training, especially if you’re not really comfortable training your own models.

However… most of these services do not let you download your trained models. So for the inference part of your app you have no choice but to use their platform as well. You can’t take the trained model and load it onto a mobile device and run the predictions on the device itself. You always need to call their API and send along the user’s data for doing inference.

This may not be a problem for your app but it is something to be aware of before you get started. For example, when using Microsoft Azure Machine Learning you’re basically locked into using Azure forever. If you want to switch to another service, you can’t take your trained model with you — you’ll have to train from scratch on the new platform and eat the cost for that again.

These kinds of services charge you for the compute hours you use during training, plus any storage for your training data. Since the service provides the API that your app uses to request predictions, you also pay for each prediction request.

Upsides of using a hosted service:


Note: A pleasant exception here seems to be Google’s new Cloud Machine Learning platform (currently in beta). Like its competition, this cloud service lets you train your models (and if you want deploy them). But you can also export your trained model, which lets you do predictions offline. If you’re a TensorFlow fan this service seems like a great option. Then again, TensorFlow also runs on Amazon and Azure compute instances, and it’s always worth comparing prices.

Training on your own computers

How this works: This is really no different from training in the cloud, except that you use one or more computers that you own. You load your favorite machine learning package on the computers, give them access to your data, and set them off training.

If you’re really serious about deep learning, or if you happen to have a number of spare computers lying around anyway, then this option might be cheaper (in the long run) than renting someone else’s computer.

Tip: Even if you want to train in the cloud, it’s smart to start out with a reduced dataset on your own computer to make sure the model works correctly. Once you’re satisfied the model will give useful predictions, move on to training with the full dataset on a more powerful computer.

One issue with training in the cloud is that you need to upload your data to the cloud service. Since digital storage is their business, these cloud companies usually provide good security for your data. However, your data may be so sensitive that you don’t want it to leave your premises. In that case, you’ll need to do the training in-house as well.



If your model is small enough, training on your own hardware is a sensible choice. However, for big models with lots of training data, using a cloud service makes it easier to scale up quickly should you need more resources.

Recap: training options

Using a “hosted” machine learning service — where you provide your own data but the service takes care of training — has the big disadvantage that you don’t own the trained model. If you use such as service, you’ll also have to use their web API for doing predictions.

So if you want to train on your own data and be able use the trained model offline, then you are limited to these options:

You’ll definitely want to compare pricing between the different service providers. Most cloud services are very similar in the features they offer, so you might as well pay as little as possible. 💰💰💰

Note: Another thing to consider is how often you need to re-train. Only occasionally, in which case you just go through the same process again, or will you be retraining the model constantly? Not all hosted machine learning providers support online learning. And if you do online learning in the cloud, you’ll be renting those computers for a long time. If you run the numbers for this scenario, it might turn out to be less expensive to train on your own hardware after all.

Inference: making predictions

For training it’s pretty obvious that it should happen off-device, whether it’s on your own computers or on computers that you rent.

But for inference you have the choice of doing it on the device, without requiring a network connection.

Let’s look at the options:

As with everything, whether it’s better to use a server or to do inference locally on the device, depends on a few tradeoffs.

One tradeoff could be speed: is it faster to run inference on the (relatively slow) mobile device, or is it quicker to send a network request to a fast and beefy server, do the inference there, and then send a reply back?

And some inference tasks are just going to be impossible on the device — it simply may not have enough processing power or RAM, or it may be limited by some other constraint.

Which option is more practical totally depends on your use case.

Inference on the server

How this works: You set up a server — either your own machine or one that you rent in the cloud — and load your trained model onto it. The server publishes the endpoint for a web API that your app talks to over the internet.

Performing inference on a server

Suppose you have an app that turns a photo into digital art using deep learning. Users can choose different effects for their photos. The app sends the photo to the server, the server feeds the photo through the deep learning network to apply the desired effect, and a few seconds later it sends the modified image back to the app.

Using a server for inference keeps the mobile app simple. All the complexity is on the server, which is always under your control. You can improve the model or add new features any time you want. To deploy the improved model, simply update the server — you don’t have to update the app itself.



With a hosted machine learning service it only takes a click on a button to deploy the trained model to a web API. By creating and hosting your own API, you get more flexibility but the obvious downside is… well, that you need to do everything yourself.

If your app is a big success and has millions of (paying) users then it might be worth running your own inference backend. For many apps it’s probably cheaper and a lot less hassle to use a full-service machine learning in-the-cloud solution.

Note: Instead of rolling your own API from scratch you can also use existing tools, such as TensorFlow Serving.

Inference on the device

How this works: You load the model’s learned parameters into the app. To make a prediction the app runs all the inference computations locally on the device, on its own CPU or GPU — it does not need to talk to a server.

This is the domain of frameworks such as BNNS and Metal CNN on iOS, but some machine learning packages such as TensorFlow and Caffe also run on the device.

Inference on the device

Speed is the major reason for doing inference directly on the device. You don’t need to send a request over the internet and wait for the reply — instead, the prediction happens (almost) instantaneously.

Take that same example of turning photos into “deep art”: what if you wanted to do this to a live camera feed in realtime? There is no way you can do this by sending network requests — it must be done directly on the device.

Note: To be fair, this is not a very realistic example. A lot of deep learning isn’t fast enough to do in realtime yet. But you get the point: for speed, you can’t beat local processing.

One of the big benefits of doing inference on a server is that you can put improved models into action immediately: upload the new model to the server and you’re done. With a mobile app this is not as straightforward, since you need to push out the improved model to all existing app installs somehow.

If you re-train the model often then you may need to set up some infrastructure for distributing the updated model parameters to the users’ devices. So you may need to keep a server around after all.

Upsides of doing inference on the device:

Note: Users now pay for the cost of doing inference with their own battery power. This may actually be a reason to not do inference on the device if the amount of battery power used is unreasonable and creates a bad user experience.


There is another potential issue you need to be aware of: other developers can dig around inside your app bundle. It’s easy to copy over the learned parameters, and if you’re including a TensorFlow graph definition or caffemodel file it’s very simple for unscrupulous people to steal your entire model. You may want to obfuscate this data if having it gives you a competitive advantage.


As you can tell, there are lots of options to choose from!

And no doubt we’ll see many more machine learning services appear on the market in the coming months and years.

What’s best for your app and your business — and your users! — really depends on the kind of machine learning you’re doing, so it’s impossible to give tailored advice without knowing the exact details.

But at least I hope this blog post has given you an idea of the possibilities!

If you’re interested in learning more about doing deep learning on iOS devices, then get in touch and let’s chat. 😄

Written by Matthijs Hollemans.
First published on Thursday, 16 February 2017.
If you liked this post, say hi on Twitter @mhollemans or LinkedIn.
Find the source code on my GitHub.

Code Your Own Synth Plug-Ins With C++ and JUCENew e-book: Code Your Own Synth Plug-Ins With C++ and JUCE
Interested in how computers make sound? Learn the fundamentals of audio programming by building a fully-featured software synthesizer plug-in, with every step explained in detail. Not too much math, lots of in-depth information! Get the book at