Gradient Docs
Gradient HomeHelp DeskCommunitySign up free
1.0.0
1.0.0
  • About Paperspace Gradient
  • Get Started
    • Quick Start
    • Core Concepts
    • Install the Gradient CLI
    • Common Errors
  • Tutorials
    • Tutorials List
      • Getting Started with Notebooks
      • Train a Model with the Web UI
      • Train a Model with the CLI
      • Advanced: Distributed training sample project
      • Registering Models in Gradient
      • Using Gradient Deployments
      • Using Custom Containers
  • Notebooks
    • Overview
    • Using Notebooks
      • The Notebook interface
      • Notebook metrics
      • Share a Notebook
      • Fork a Notebook
      • Notebook Directories
      • Notebook Containers
        • Building a Custom Container
      • Notebook Workspace Include Files
      • Community (Public) Notebooks
    • ML Showcase
    • Run on Gradient (GitHub badge)
  • Projects
    • Overview
    • Managing Projects
    • GradientCI
      • GradientCI V1 (Deprecated)
  • Workflows
    • Overview
      • Getting Started with Workflows
      • Workflow Spec
      • Gradient Actions
  • Experiments
    • Overview
    • Using Experiments
      • Containers
      • Single-node & multi-node CLI options
      • Experiment options
      • Gradient Config File
      • Environment variables
      • Experiment datasets
      • Git Commit Tracking
      • Experiment metrics
        • System Metrics
        • Custom Metrics
      • Experiment Logs
      • Experiment Ports
      • GradientCI Experiments
      • Diff Viewer
      • Hyperparameter Tuning
    • Distributed Training
      • Distributed Machine Learning with Tensorflow
      • Distributed Machine Learning with MPI
        • Distributed Training using Horovod
        • Distributed Training Using ChainerMN
  • Jobs
    • Overview
    • Using Jobs
      • Stop a Job
      • Delete a Job
      • List Jobs
      • Job Logs
      • Job Metrics
        • System Metrics
        • Custom Metrics
      • Job Artifacts
      • Public Jobs
      • Building Docker Containers with Jobs
  • Models
    • Overview
    • Managing Models
      • Example: Prepare a TensorFlow Model for Deployments
      • Model Path, Parameters, & Metadata
    • Public Models
  • Deployments
    • Overview
    • Managing Deployments
      • Deployment Containers
        • Custom Deployment Containers
      • Deployment States
      • Deployment Logs
      • Deployment Metrics
      • A Deployed Model's API Endpoint
        • Gradient + TensorFlow Serving
      • Deployment Autoscaling
      • Optimize Models for Inference
  • Data
    • Types of Storage
      • Managing Data in Gradient
        • Managing Persistent Storage with VMs
    • Storage Providers
    • Versioned Datasets
    • Public Datasets Repository
  • TensorBoards
    • Overview
    • Using Tensorboards
      • TensorBoards getting started with Tensorflow
  • Metrics
    • Metrics Overview
    • View and Query Metrics
    • Push Metrics
  • Secrets
    • Overview
    • Using Secrets
  • Gradient SDK
    • Gradient SDK Overview
      • Projects Client
      • Experiments Client
      • Models Client
      • Deployments Client
      • Jobs Client
    • End to end tutorial
    • Full SDK Reference
  • Instances
    • Instance Types
      • Free Instances (Free Tier)
      • Instance Tiers
  • Gradient Cluster
    • Overview
    • Setup
      • Managed Private Clusters
      • Self-Hosted Clusters
        • Pre-installation steps
        • Gradient Installer CLI
        • Terraform
          • Pre-installation steps
          • Install on AWS
          • Install on bare metal / VMs
          • Install on NVIDIA DGX
        • Let's Encrypt DNS Providers
        • Updating your cluster
    • Usage
  • Tags
    • Overview
    • Using Tags
  • Machines (Paperspace CORE)
    • Overview
    • Using Machines
      • Start a Machine
      • Stop a Machine
      • Restart a Machine
      • Update a Machine
      • Destroy a Machine
      • List Machines
      • Show a Machine
      • Wait For a Machine
      • Check a Machine's utilization
      • Check availability
  • Paperspace Account
    • Overview
    • Public Profiles
    • Billing & Subscriptions
    • Hotkeys
    • Teams
      • Creating a Team
      • Upgrading to a Team Plan
  • Release Notes
    • Product release notes
    • CLI/SDK Release notes
Powered by GitBook
On this page
  • Objectives
  • Creating a Gradient Experiment to Train the Model
  1. Tutorials
  2. Tutorials List

Train a Model with the CLI

PreviousTrain a Model with the Web UINextAdvanced: Distributed training sample project

Last updated 4 years ago

The steps covered in the tutorial video are also explained below.

Objectives

Prerequisites:

Steps

  • Learn the workflow involved in training a model

  • Use custom containers with experiments

  • Create an experiment that trains a Scikit-learn model

  • Share files and models across experiments

The experiment generates a Python pickle file that gets stored in the shared storage service of Gradient.

Creating a Gradient Experiment to Train the Model

In this step, we will generate a fully-trained model by submitting the dataset and Python code to Gradient. The output from this experiment is stored in a centrally accessible storage location that will be used in the next step of this tutorial.

Let’s take a minute to get familiar with the dataset. To keep this really simple, we are using one feature (x) that represents the years of experience of a candidate and a label (y) associated with the salary.

In the train directory, you’ll find train.py, which is responsible for generating the model by applying linear regression to the dataset.

import numpy as np
import pandas as pd
import os
from argparse import ArgumentParser
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

parser = ArgumentParser()
parser.add_argument("-i", "--in", dest="input",
                    help="location of input dataset")
parser.add_argument("-o", "--out",dest="output",
                    help="location of model"
                    )

dataset = parser.parse_args().input
model_dir = parser.parse_args().output

sal = pd.read_csv(dataset,header=0, index_col=None)
X = sal[['x']]
y = sal['y']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=10)

lm = LinearRegression() 
lm.fit(X_train,y_train) 

print('Intercept :', round(lm.intercept_,2))
print('Slope :', round(lm.coef_[0],2))

from sklearn.metrics import mean_squared_error
y_predict= lm.predict(X_test)
mse = mean_squared_error(y_predict,y_test)
print('MSE :', round(mse,2))

from sklearn.externals import joblib

if not os.path.exists(model_dir):
    os.makedirs(model_dir)
filename = model_dir+'/model.pkl'

joblib.dump(lm, filename)

Apart from applying linear regression, the code prints parameters such as intercept, slope, and MSE to stdout. The trained model is stored in the directory named salary under /storage as a .pkl file.

We are ready to kick off the training experiment on Gradient 🚀 Run the below command to submit the experiment.

The location /storage maps to Gradient persistent storage location. Anything stored at this location will be available even after the experiment is terminated. By storing the .pkl file at /storage, we will be able to access it from the model serving experiment that exposes the REST endpoint.

gradient experiments run singlenode \
--name train \
--projectId prj0ztwij \
--container janakiramm/python:3 \
--machineType C2 \
--command 'python train/train.py -i ./data/sal.csv -o /storage/salary' \
--workspace https://github.com/janakiramm/Salary

This command does quite a bit of heavy lifting for us. It schedules the training experiment in one of the chosen machine types and kicks off the training process.

Let’s analyze the steps taken by Gradient to finish the experiment.

  1. The CLI downloads all the files (including the dataset) and uploads it to Gradient

    1. You can also run the experiment from your local system by not specifying the --workspace parameter! This compress all the files into a .zip file and upload it to Gradient.

  2. Gradient pulls the container image instructed in the CLI (--container janakiramm/python:3) from the registry, which is Docker Hub for this scenario

  3. Before running the container, Gradient maps the directory with the uploaded files to the container’s working directory

  4. Gradient maps the command sent via the CLI parameter (--command 'python train/train.py -i ./data/sal.csv -o /storage/salary') to the docker run command

  5. Gradient schedules the container in one of the machines that match the type mentioned in the CLI (--machineType C2)

Once the experiment's status moves into run mode, it simply executes the code in the Python file. In train.py that we uploaded, we do two things - print the coefficients like intercept, slope, MSE and copy the model into the /storage location.

The output from CLI confirms that the experiment has been successfully executed. If you navigate to the experiment in the UI, you will see the logs printed the coefficients used in the script along with the message PSEOF which is a healthy sign.

We are using a custom Docker container image with prerequisites such as NumPy, Scipy, Pandas, and Scikit-learn. This image was built from the official Python 3 Docker image.

The dataset, sal.csv, is available in the data folder of the repo. Clone the repo to your local machine and open it in your favorite text editor to explore the data.

Install Gradient CLI
GitHub
Obtain an API Key
Learn how to train a simple model from the Gradient CLI. 1m51s.