Using Experiments
Last updated
Last updated
There are three modes for Experiments:
Single-node: An Experiment that runs on a single compute instance. This option is very simple and is available in the web UI, CLI, and SDK.
Multi-node: Run a distributed training Experiment on more than one compute instance. This option is more advanced and is available in the CLI and SDK only. You can view examples here and the .
: Run a search using multiple instances. This is an advanced option and is available in the CLI and SDK only.
You can run Experiments in Gradient without ever leaving your web browser! The Experiment Builder is a great way to learn more about how Experiments are structured, and you can easily run your first GPU-based Experiment on Gradient without writing a single line of code!
The Experiment Builder is very similar to our Job Builder that you may be familiar with, but it allows you to create Experiments in the context of a Project. Experiments created using the Builder are currently limited to creating single-node jobs.
To run an Experiment using the Builder:
In the Project Details view, click the "Create Experiment"
You'll now have arrived at the Experiment Builder, so you can click the "Fast Style Transfer" example experiment. The default parameters are filled in below automatically; check those out to familiarize yourself with the default parameters:
Machine Type. What type of instance to run your Experiment's job on. We recommend starting with a GPU+. Many Experiments benefit from a machine with a GPU, but some can run just using a CPU.
Workspace. The workspace is the collection of code that is run. It can be a Git repository (public or private), your local working directory (if you are using the CLI) which is uploaded to the docker container during the job running process, or none
(default value).
Command. The command is the entry point to the container. This is the line of code that will kick off your experiment's job. It could be a bash script ./run.sh
or python main.py
as just some examples.
Custom Metrics. Enter a list of custom metrics to use with Gradient's statd client, such as percent_failure
or percent_success
.
Once you have examined or specified the parameters, hit "Submit Experiment" and watch the Experiment run!
Open the Project that contains the Experiment:
Then click on the Experiment to view information about it:
To cancel an Experiment, click the Cancel button below the state indicator:
To Stop an Experiment, click the Stop button below the state indicator:
To Delete an Experiment, click the Stop button below the state indicator:
Once logged in, navigate to Projects at
Select an existing Project or
Container. Experiments are run within a docker container. You can run a public or private container. Learn more .
The Gradient CLI enables you to run experiments manually and programmatically from your command line for maximum flexibility. Once you have the , use the alias gradient
plus any further commands you wish to run.
See more info about and their default values, including for if you want to deploy your models via Gradient Deployments.
To run this command substitute an existing project ID for <your-project-id>. You can get an existing project id by going to and creating a new project or opening an existing project and copying the Project ID value. You can also get a list of existing projects and their IDs from the command line using the command gradient projects list
.
For more information about this sample experiment see the README in the mnist-sample github repo: . Note: the code for this experiment can be run in both singlenode and multi-node training modes.
Note: --modelType Tensorflow
is will automatically parse and store the model's performance metrics and prepare it for with TensorFlow Serving.
To run this command substitute an existing project ID for <your-project-id>. You can get an existing project id by going to and creating a new project or opening an existing project and copying the Project ID value. You can also get a list of existing projects and their IDs from the command line using the command gradient projects list
.
For more information about this sample experiment see the README in the mnist-sample GitHub repo: . (Note: the code for this experiment can be run in both singlenode and multinode training modes.)