Registering Models in Gradient

Objectives

  • Understand the workflow involved in registering models

  • Passing environment variables through Gradient CLI

  • Persisting model files in Gradient Storage

  • Registering Tensorflow Models in Gradient

Introduction

Experiments in Gradient can generate machine learning models, which can be interpreted and stored in your Project's Models list. This list holds references to the model and checkpoint files generated during the training period as well as summary metrics associated with the model's performance, such as accuracy and loss.

In this tutorial, we will create an experiment to generate a Keras model based on the Fashion MNIST dataset. We will learn techniques such as passing environment variables to jobs, specifying the right container image, and mentioning the path to store the model artifacts.

The model is trained in Keras but it is finally exported as a TensorFlow model through tf.saved_model.simple_savemethod. This approach seralizes Keras session into a TensorFlow .pb file.

Start by cloning the repo https://github.com/janakiramm/fashionmnist that contains the code for training and inferencing the model.

Create a Project for Fashion MNIST

We will start by creating a project that can contain multiple experiments we may run during the training.

gradient projects create --name Fashion

Create an Experiment to Train the Model

We will now start an experiment within the project created above. Make a note of the project id before proceeding further.

Switch to the train directory of the cloned Github repo.

The above command has multiple switches that are important to the job. Let’s understand each of them.

The singlenode parameter runs the job on a single host.

--name assigns a friendly name to the experiment.

--projectId associates the experiment with an existing project.

--experimentEnv passes environment variables to the script. In our code, we decide the number of epochs based on the value defined in the EPOCHS environment variable.

--container parameter points the job to a container image used for the training job. Notice that we are passing an image that can advantage of a GPU-based machine.

--machineType schedules the job in one of the preferred instances. In our case, we are using K80 machine type that comes with an NVIDIA K80 GPU. Since the container and machine type are based on GPU, the job exploits the CUDA and cuDNN for accelerated training.

--command instructs the job to execute the script along with the passed parameters. The script expects the path to store the final model artifacts along with the version number. Since we are using a sub-directory under the /storage directory, the files stored are persisted across experiments. The model files stored here are used to register the TensorFlow model with Gradient. Feel free to explore train.py to understand how environment variables and command line parameters can be used to target Gradient specific features while keeping the code independent.

--modelType Tensorflowswitch indicates that the job generates a valid TensorFlow model which can be managed and served by Gradient. Frameworks other than TensorFlow will be supported in the near future, such as ONNX and Custom.

--modelPath tells Gradient where to look for the model artifacts. This is typically /artifacts or /storage location. We are passing /storage/model directory which was used within the code.

--workspace . tells Gradient to upload your current directory (.) to the experiment. The files in this directory will be the working directory of your experiment.

Within a few seconds of running the command, you should see the logs displayed on the screen.

Verifying the Creation of Model

We can check if the output of the job is registered as a valid TensorFlow model with the following command.

+------+-----------------+------------+------------+----------------+ | Name | ID | Model Type | Project ID | Experiment ID | +------+-----------------+------------+------------+----------------+ | None | mosdnkkv1o1xuem | Tensorflow | prioax2c4 | e720893n7f5vx | +------+-----------------+------------+------------+----------------+

The project id prioax2c4 and experiment id e720893n7f5vx confirm that it is the model associated with the latest experiment.

You can also visit the Models section of Gradient UI to see a list of registered models.

Summary

After registering the model, we can turn that into a deployment to perform inferencing.

Last updated