Azure ML Thursday 5: trained Python models

Home - Azure - Azure ML Thursday 5: trained Python models

Last week, we stepped out of Azure ML to look at building ML models in Python using scikit-learn. Today, we focus on getting the trained model back into Azure ML – the place where my ML solutions live in a managed, enterprise environment.

The path of bringing a trained model from the local Python/Anaconda environment towards cloud Azure ML is globally as follows:

  1. Export the trained model
  2. Zip the exported files
  3. Upload to the Azure ML environment
  4. Embed in your Azure ML solution

Sounds simple, and it isn’t too hard indeed. The things getting in the way of “just” doing it are primarily a lack of Python / scikit-learn knowledge (“how do you export a trained model in the first place?”) and general lack of ML experience (remember that you need to perform all translations you did on the training data exactly the same way in production!). As soon as you’ve learned how to tackle the first hurdle and seen the trick of importing models inside Azure ML Studio, hardly anything is holding you back to deploy your locally-developed masterpieces to production.

Step 1: Export the trained model

Remember that your trained model in Python is stored in “just” another variable – just as you’re used to in any (almost) object oriented language. Python can export the content of any variable using a process called pickling[ref]For non-native speakers: that’s a verb – to pickle. Gotta be a pun to pickling herring for conservation[/ref]. When you pickle an object, the bytes currently in memory representing the object are dumped to (and can be loaded again from) a file.

It’s actually quite easy:

For scikit-learn, it’s recommended you use the joblib replacement of pickle[ref]joblib is basically more efficient in saving most numpy-matrices[/ref]. It’s not necessary to use joblib (you can also use pickle), but it’s more efficient. Plus, it’s even easier to write: you don’t have to worry about file-opening modes like the “wb” and “rb” above.

joblib dump essential

For large objects, joblib often saves the contents in multiple files, whose filenames will be appended with _(counter).npy.

joblib results in multiple files

You must keep all files representing a single object together in one folder when loading it, but you don’t have to interact with any of the ‘.npy’ files: you only interact with the file you saved explicitly.


Step 2: Zip the exported files

In order to use pickled objects inside Azure ML’s  Execute Python Script module, we need to zip everything and upload it as a dataset. Inside the zip file, all pickled objects should be in the root.


Besides pickled object, you can include Python scripts in the zip file too. For example, you could add a Python script that unpickles the objects you need for a particular ML model so you don’t have to remember the syntax and exact paths where Azure ML stores the contents. These other scripts can be consumed easily by the Execute Python Script, as I’ll show in step 4.


Step 3: Upload to the Azure ML environment

Azure ML has no way to upload “just” libraries – all files are treated equally. The zip file should uploaded as a dataset:


Step 4: Embed in Azure ML experiment

With the zipped file available as a “dataset”, we can embed it inside an experiment. Inside your experiment, the zip we just uploaded is available under My Datasets. In order to use it, throw it onto the canvas and connect it to the right (as opposed to center or left) input port of an Execute Python Script module:


When running the experiment, Azure ML Studio extracts the files inside the zip dataset to the folder “Script Bundle”. From within Python you can access the files via that relative path:

As described under step 2, you could also include helper scripts. To use helper scripts, you don’t have to memorize the path: you can just import the script using Python’s import function:

Through the use of a helper script, the amount of code inside the Execute Python Module is kept to a minimum – which makes your datasets more portable and easier to maintain.

One More Thing: Including all transformations

In order to repeat all transformations you did on the training set in production, it’s important to export not only the trained ML model: all transformations need to be exported too. When using last week’s sample code, there are four objects to be exported (Imputer, Religion-mapper, One-hot encoder and the actual trained model).

If you use just the transformers from within scikit-learn, you could make your life a lot easier by using pipelines. Check out the pipeline documentation in the scikit-learn docs as well as an example of a pipeline constructed from one Imputer and one RandomForestRegressor if you want to know how. Don’t worry – you’ll find it’s pretty easy :).


Last week, I showed you a brief summary of using Python with scikit-learn to train your ML models. This enhances your possibilities of applying ML techniques vastly. Today was the follow-up: how to use your trained Python ML models again within Azure ML.

With today’s knowledge, it’s perfectly doable to participate in an Azure ML competition using your enhanced Python ML models, or with the help of your personal data scientist port his ingenious models towards the managed Azure ML environment.

Latest Posts