Gradient Docs
Gradient HomeHelp DeskCommunitySign up free
1.0.0
1.0.0
  • About Paperspace Gradient
  • Get Started
    • Quick Start
    • Core Concepts
    • Install the Gradient CLI
    • Common Errors
  • Tutorials
    • Tutorials List
      • Getting Started with Notebooks
      • Train a Model with the Web UI
      • Train a Model with the CLI
      • Advanced: Distributed training sample project
      • Registering Models in Gradient
      • Using Gradient Deployments
      • Using Custom Containers
  • Notebooks
    • Overview
    • Using Notebooks
      • The Notebook interface
      • Notebook metrics
      • Share a Notebook
      • Fork a Notebook
      • Notebook Directories
      • Notebook Containers
        • Building a Custom Container
      • Notebook Workspace Include Files
      • Community (Public) Notebooks
    • ML Showcase
    • Run on Gradient (GitHub badge)
  • Projects
    • Overview
    • Managing Projects
    • GradientCI
      • GradientCI V1 (Deprecated)
  • Workflows
    • Overview
      • Getting Started with Workflows
      • Workflow Spec
      • Gradient Actions
  • Experiments
    • Overview
    • Using Experiments
      • Containers
      • Single-node & multi-node CLI options
      • Experiment options
      • Gradient Config File
      • Environment variables
      • Experiment datasets
      • Git Commit Tracking
      • Experiment metrics
        • System Metrics
        • Custom Metrics
      • Experiment Logs
      • Experiment Ports
      • GradientCI Experiments
      • Diff Viewer
      • Hyperparameter Tuning
    • Distributed Training
      • Distributed Machine Learning with Tensorflow
      • Distributed Machine Learning with MPI
        • Distributed Training using Horovod
        • Distributed Training Using ChainerMN
  • Jobs
    • Overview
    • Using Jobs
      • Stop a Job
      • Delete a Job
      • List Jobs
      • Job Logs
      • Job Metrics
        • System Metrics
        • Custom Metrics
      • Job Artifacts
      • Public Jobs
      • Building Docker Containers with Jobs
  • Models
    • Overview
    • Managing Models
      • Example: Prepare a TensorFlow Model for Deployments
      • Model Path, Parameters, & Metadata
    • Public Models
  • Deployments
    • Overview
    • Managing Deployments
      • Deployment Containers
        • Custom Deployment Containers
      • Deployment States
      • Deployment Logs
      • Deployment Metrics
      • A Deployed Model's API Endpoint
        • Gradient + TensorFlow Serving
      • Deployment Autoscaling
      • Optimize Models for Inference
  • Data
    • Types of Storage
      • Managing Data in Gradient
        • Managing Persistent Storage with VMs
    • Storage Providers
    • Versioned Datasets
    • Public Datasets Repository
  • TensorBoards
    • Overview
    • Using Tensorboards
      • TensorBoards getting started with Tensorflow
  • Metrics
    • Metrics Overview
    • View and Query Metrics
    • Push Metrics
  • Secrets
    • Overview
    • Using Secrets
  • Gradient SDK
    • Gradient SDK Overview
      • Projects Client
      • Experiments Client
      • Models Client
      • Deployments Client
      • Jobs Client
    • End to end tutorial
    • Full SDK Reference
  • Instances
    • Instance Types
      • Free Instances (Free Tier)
      • Instance Tiers
  • Gradient Cluster
    • Overview
    • Setup
      • Managed Private Clusters
      • Self-Hosted Clusters
        • Pre-installation steps
        • Gradient Installer CLI
        • Terraform
          • Pre-installation steps
          • Install on AWS
          • Install on bare metal / VMs
          • Install on NVIDIA DGX
        • Let's Encrypt DNS Providers
        • Updating your cluster
    • Usage
  • Tags
    • Overview
    • Using Tags
  • Machines (Paperspace CORE)
    • Overview
    • Using Machines
      • Start a Machine
      • Stop a Machine
      • Restart a Machine
      • Update a Machine
      • Destroy a Machine
      • List Machines
      • Show a Machine
      • Wait For a Machine
      • Check a Machine's utilization
      • Check availability
  • Paperspace Account
    • Overview
    • Public Profiles
    • Billing & Subscriptions
    • Hotkeys
    • Teams
      • Creating a Team
      • Upgrading to a Team Plan
  • Release Notes
    • Product release notes
    • CLI/SDK Release notes
Powered by GitBook
On this page
  • Experiment Modes
  • Creating Experiments
  • Viewing an Experiment
  • Stop or Cancel an Experiment
  • Delete an Experiment
  1. Experiments

Using Experiments

PreviousOverviewNextContainers

Last updated 4 years ago

Experiment Modes

There are three modes for Experiments:

Single-node: An Experiment that runs on a single compute instance. This option is very simple and is available in the web UI, CLI, and SDK.

Multi-node: Run a distributed training Experiment on more than one compute instance. This option is more advanced and is available in the CLI and SDK only. You can view examples here and the .

: Run a search using multiple instances. This is an advanced option and is available in the CLI and SDK only.

Creating Experiments

Experiment Builder: An interface for Running Single-Node Experiments

You can run Experiments in Gradient without ever leaving your web browser! The Experiment Builder is a great way to learn more about how Experiments are structured, and you can easily run your first GPU-based Experiment on Gradient without writing a single line of code!

The Experiment Builder is very similar to our Job Builder that you may be familiar with, but it allows you to create Experiments in the context of a Project. Experiments created using the Builder are currently limited to creating single-node jobs.

Run an Experiment Using the Builder

To run an Experiment using the Builder:

  1. In the Project Details view, click the "Create Experiment"

You'll now have arrived at the Experiment Builder, so you can click the "Fast Style Transfer" example experiment. The default parameters are filled in below automatically; check those out to familiarize yourself with the default parameters:

  • Machine Type. What type of instance to run your Experiment's job on. We recommend starting with a GPU+. Many Experiments benefit from a machine with a GPU, but some can run just using a CPU.

  • Workspace. The workspace is the collection of code that is run. It can be a Git repository (public or private), your local working directory (if you are using the CLI) which is uploaded to the docker container during the job running process, or none (default value).

  • Command. The command is the entry point to the container. This is the line of code that will kick off your experiment's job. It could be a bash script ./run.sh or python main.py as just some examples.

  • Custom Metrics. Enter a list of custom metrics to use with Gradient's statd client, such as percent_failure or percent_success.

Once you have examined or specified the parameters, hit "Submit Experiment" and watch the Experiment run!

Note that you can use the --help option at any time to reveal information in your terminal about the current command you wish to use. Alternately, if you simply try to run a command, the CLI will prompt you for additional subcommands that you may be intending to use, as well as required options that are missing from your command.

Usage: gradient [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  apiKey           Save your api key
  deployments      Manage deployments
  experiments      Manage experiments
  hyperparameters  Manage hyperparameters
  jobs             Manage gradient jobs
  logout           Log out / remove apiKey from config file
  machines         Manage machines
  models           Manage models
  projects         Manage projects
  run              Run script or command on remote cluster
  version          Show the version and exit

Running experiments

For programmatic use of the CLI, there is the create command, which simply creates an experiment in a target project, with the specified options.

Alternately, for more interactive use of the CLI, there is run, which allows you to both create and automatically start an experiment with one command. With this command, logs will automatically stream once the experiment has been created and started.

There are separate subcommands singlenode and multinode experiments.

gradient experiments run singlenode --help
Usage: gradient experiments create singlenode [OPTIONS]

gradient experiments run multinode --help
Usage: gradient experiments create multinode [OPTIONS]

Creating a single-node experiment using the CLI

The following command creates and starts a single-node experiment called singleEx and places it within the Gradient Project identified by the --projectId option.

gradient experiments run singlenode \
  --projectId <your-project-id> \
  --name singleEx \
  --experimentEnv "{\"EPOCHS_EVAL\":5,\"TRAIN_EPOCHS\":10,\"MAX_STEPS\":1000,\"EVAL_SECS\":10}" \
  --container tensorflow/tensorflow:1.13.1-gpu-py3 \
  --machineType K80 \
  --command "python mnist.py" \
  --workspace https://github.com/Paperspace/mnist-sample.git \
  --modelType Tensorflow \
  --modelPath /artifacts

Creating a multi-node experiment using the CLI

The following command creates and starts a multi-node experiment called multiEx and places it within the Gradient Project identified by the --projectId option.

gradient experiments run multinode \
  --name multiEx \
  --projectId <your-project-id> \
  --experimentType GRPC \
  --workerContainer tensorflow/tensorflow:1.13.1-gpu-py3 \
  --workerMachineType K80 \
  --workerCommand "python mnist.py" \
  --workerCount 2 \
  --parameterServerContainer tensorflow/tensorflow:1.13.1-gpu-py3 \
  --parameterServerMachineType K80 \
  --parameterServerCommand "python mnist.py" \
  --parameterServerCount 1 \
  --workspace https://github.com/Paperspace/mnist-sample.git \
  --modelType Tensorflow

The command above specifies the use of the gRPC framework and names the same Docker container, machine type, and programmatic command for both the 2 workers and the 1 parameter server.

Finally, the command specifies the workspace to pull the Python script from as a public GitHub repository.

Viewing an Experiment

Open the Project that contains the Experiment:

Then click on the Experiment to view information about it:

You can view an experiment details, you can use the following command:

gradient experiments details

Stop or Cancel an Experiment

To cancel an Experiment, click the Cancel button below the state indicator:

To Stop an Experiment, click the Stop button below the state indicator:

To stop an Experiment, you can use the following command:

gradient experiments stop 

Delete an Experiment

To Delete an Experiment, click the Stop button below the state indicator:

To delete an Experiment, you can use the following command:

gradient experiments delete 

Once logged in, navigate to Projects at

Select an existing Project or

Container. Experiments are run within a docker container. You can run a public or private container. Learn more .

The Gradient CLI enables you to run experiments manually and programmatically from your command line for maximum flexibility. Once you have the , use the alias gradient plus any further commands you wish to run.

See more info about and their default values, including for if you want to deploy your models via Gradient Deployments.

To run this command substitute an existing project ID for <your-project-id>. You can get an existing project id by going to and creating a new project or opening an existing project and copying the Project ID value. You can also get a list of existing projects and their IDs from the command line using the command gradient projects list.

For more information about this sample experiment see the README in the mnist-sample github repo: . Note: the code for this experiment can be run in both singlenode and multi-node training modes.

Note: --modelType Tensorflow is will automatically parse and store the model's performance metrics and prepare it for with TensorFlow Serving.

To run this command substitute an existing project ID for <your-project-id>. You can get an existing project id by going to and creating a new project or opening an existing project and copying the Project ID value. You can also get a list of existing projects and their IDs from the command line using the command gradient projects list.

For more information about this sample experiment see the README in the mnist-sample GitHub repo: . (Note: the code for this experiment can be run in both singlenode and multinode training modes.)

https://www.paperspace.com/console/projects
here
CLI installed
your projects list
https://github.com/Paperspace/mnist-sample
Deployment
your projects list
https://github.com/Paperspace/mnist-sample
Hyperparameter Search
model paths
create a new Project
The experiment builder is available by clicking Create Experiment within a project