Gradient Docs
Gradient HomeHelp DeskCommunitySign up free
1.0.0
1.0.0
  • About Paperspace Gradient
  • Get Started
    • Quick Start
    • Core Concepts
    • Install the Gradient CLI
    • Common Errors
  • Tutorials
    • Tutorials List
      • Getting Started with Notebooks
      • Train a Model with the Web UI
      • Train a Model with the CLI
      • Advanced: Distributed training sample project
      • Registering Models in Gradient
      • Using Gradient Deployments
      • Using Custom Containers
  • Notebooks
    • Overview
    • Using Notebooks
      • The Notebook interface
      • Notebook metrics
      • Share a Notebook
      • Fork a Notebook
      • Notebook Directories
      • Notebook Containers
        • Building a Custom Container
      • Notebook Workspace Include Files
      • Community (Public) Notebooks
    • ML Showcase
    • Run on Gradient (GitHub badge)
  • Projects
    • Overview
    • Managing Projects
    • GradientCI
      • GradientCI V1 (Deprecated)
  • Workflows
    • Overview
      • Getting Started with Workflows
      • Workflow Spec
      • Gradient Actions
  • Experiments
    • Overview
    • Using Experiments
      • Containers
      • Single-node & multi-node CLI options
      • Experiment options
      • Gradient Config File
      • Environment variables
      • Experiment datasets
      • Git Commit Tracking
      • Experiment metrics
        • System Metrics
        • Custom Metrics
      • Experiment Logs
      • Experiment Ports
      • GradientCI Experiments
      • Diff Viewer
      • Hyperparameter Tuning
    • Distributed Training
      • Distributed Machine Learning with Tensorflow
      • Distributed Machine Learning with MPI
        • Distributed Training using Horovod
        • Distributed Training Using ChainerMN
  • Jobs
    • Overview
    • Using Jobs
      • Stop a Job
      • Delete a Job
      • List Jobs
      • Job Logs
      • Job Metrics
        • System Metrics
        • Custom Metrics
      • Job Artifacts
      • Public Jobs
      • Building Docker Containers with Jobs
  • Models
    • Overview
    • Managing Models
      • Example: Prepare a TensorFlow Model for Deployments
      • Model Path, Parameters, & Metadata
    • Public Models
  • Deployments
    • Overview
    • Managing Deployments
      • Deployment Containers
        • Custom Deployment Containers
      • Deployment States
      • Deployment Logs
      • Deployment Metrics
      • A Deployed Model's API Endpoint
        • Gradient + TensorFlow Serving
      • Deployment Autoscaling
      • Optimize Models for Inference
  • Data
    • Types of Storage
      • Managing Data in Gradient
        • Managing Persistent Storage with VMs
    • Storage Providers
    • Versioned Datasets
    • Public Datasets Repository
  • TensorBoards
    • Overview
    • Using Tensorboards
      • TensorBoards getting started with Tensorflow
  • Metrics
    • Metrics Overview
    • View and Query Metrics
    • Push Metrics
  • Secrets
    • Overview
    • Using Secrets
  • Gradient SDK
    • Gradient SDK Overview
      • Projects Client
      • Experiments Client
      • Models Client
      • Deployments Client
      • Jobs Client
    • End to end tutorial
    • Full SDK Reference
  • Instances
    • Instance Types
      • Free Instances (Free Tier)
      • Instance Tiers
  • Gradient Cluster
    • Overview
    • Setup
      • Managed Private Clusters
      • Self-Hosted Clusters
        • Pre-installation steps
        • Gradient Installer CLI
        • Terraform
          • Pre-installation steps
          • Install on AWS
          • Install on bare metal / VMs
          • Install on NVIDIA DGX
        • Let's Encrypt DNS Providers
        • Updating your cluster
    • Usage
  • Tags
    • Overview
    • Using Tags
  • Machines (Paperspace CORE)
    • Overview
    • Using Machines
      • Start a Machine
      • Stop a Machine
      • Restart a Machine
      • Update a Machine
      • Destroy a Machine
      • List Machines
      • Show a Machine
      • Wait For a Machine
      • Check a Machine's utilization
      • Check availability
  • Paperspace Account
    • Overview
    • Public Profiles
    • Billing & Subscriptions
    • Hotkeys
    • Teams
      • Creating a Team
      • Upgrading to a Team Plan
  • Release Notes
    • Product release notes
    • CLI/SDK Release notes
Powered by GitBook
On this page
  • Objectives
  • Introduction
  • Create a Project and an Experiment for Training the model
  • Creating the Experiment
  1. Tutorials
  2. Tutorials List

Train a Model with the Web UI

PreviousGetting Started with NotebooksNextTrain a Model with the CLI

Last updated 3 years ago

This section of the documentation covers the previous experiments interface for training. Experiments have been replaced by a more general framework called Workflows. For the latest info on training in Gradient go to .

Objectives

  • Explore Gradient Projects

  • Learn the workflow involved in training a model

  • Use custom containers with jobs

  • Create an Experiment that trains a Scikit-learn model

  • Share files and models across jobs

Introduction

In this tutorial, we’ll show you how to train a linear regression model to predict the salary of a software engineer depending on their experience. The dataset is partially based on the , which we’ll use to build a single variate linear regression model with one feature (experience) and label (salary).

The Experiment generates a Python pickle file that gets stored in the shared storage service of Gradient.

Create a Project and an Experiment for Training the model

After signing in, click on Gradient in the navigation bar on the left to choose Projects. Click on the Create Project button and select Create Standalone Project. Enter a name for the project when prompted.

Creating the Experiment

Within this project, we will create an experiment that trains the model.

Experiment Builder is a wizard-style UI tool to submit a job.

The first step is to choose a machine type for scheduling the Experiment. Gradient takes advantage of Google Compute Engine’s preemptible instances to provide low-cost infrastructure. Make sure you check Enable low-cost instances. Choose G1 machine type that comes with 1 CPU core, 1.7GB RAM, and 250GB SDD. This configuration is sufficient for the Scikit-learn training job. For TensorFlow and PyTorch, you can select a GPU-based machine type for accelerating the job.

In Gradient, Experiments are based on a container image that provides the runtime and dependencies. For this tutorial, we are using an image that contains Python 3 runtime with Scikit-learn framework.

On a side note, if you want to use your own container image, feel free to use or modify the Dockerfile and push it to a public container registry such as Docker Hub.

FROM alpine:latest

LABEL MAINTAINER="Janakiram MSV <janakiramm@gmail.com>"

# Linking of locale.h as xlocale.h
# This is done to ensure successfull install of python numpy package
# see https://forum.alpinelinux.org/comment/690#comment-690 for more information.

WORKDIR /var/www/

ENV PACKAGES="\
    dumb-init \
    musl \
    libc6-compat \
    linux-headers \
    build-base \
    bash \
    git \
    ca-certificates \
    freetype \
    libgfortran \
    libgcc \
    libstdc++ \
    openblas \
    tcl \
    tk \
    libssl1.0 \
    "

ENV PYTHON_PACKAGES="\
    numpy \
    matplotlib \
    scipy \
    scikit-learn \
    pandas \
    nltk \
    "

RUN apk add --no-cache --virtual build-dependencies python3 \
    && apk add --virtual build-runtime \
    build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
    && ln -s /usr/include/locale.h /usr/include/xlocale.h \
    && python3 -m ensurepip \
    && rm -r /usr/lib/python*/ensurepip \
    && pip3 install --upgrade pip setuptools \
    && ln -sf /usr/bin/python3 /usr/bin/python \
    && ln -sf pip3 /usr/bin/pip \
    && rm -r /root/.cache \
    && pip install --no-cache-dir $PYTHON_PACKAGES \
    && apk del build-runtime \
    && apk add --no-cache --virtual build-dependencies $PACKAGES \
    && rm -rf /var/cache/apk/*

With the runtime container in place, we now need to point Gradient to the dataset and the training script. This is done through the integration with GitHub. Gradient pulls the GitHub repo into the experiment workspace and uses the assets for the training job.

Let’s point Gradient to the tutorial repo on GitHub.

Finally, let’s define the command for the job which is the Python script that executes within the context of the runtime of the container. When the script exits gracefully, the job is marked as complete.

The command for this job is python train/train.py -i ./data/sal.csv -o /storage/salary. The script, train.py, takes the source location of the dataset (sal.csv) and the target location to save the training model (/storage/salary).

In Gradient, any file that is saved at /storage location becomes available to other jobs and experiments. By persisting the model to this location, we will be able to access it later.

We are now ready to kick off the training job by clicking on the Submit Experiment button.

Gradient adds the job to the queue and schedules it in one of the chosen machine types. In a few minutes, the job execution completes.

You can verify the logs that show the coefficients like Mean Square Error (MSE), Intercept, and Slope printed to the stdout in the code.

Feel free to explore the environment and files section of the job.

The fully trained and pickled model (model.pkl) is now available at /storage/salary location. You can now safely delete the job to clean up the project.

Feel free to explore the that contains the dataset along with the code for training and deployment.

GitHub repo
Workflows
Stack Overflow salary calculator
Create a new project via Projects &gt; Create a Project
Create an experiment from within the project itself. We can use the CLI or the experiment builder