Gradient Docs
Gradient HomeHelp DeskCommunitySign up free
1.0.0
1.0.0
  • About Paperspace Gradient
  • Get Started
    • Quick Start
    • Core Concepts
    • Install the Gradient CLI
    • Common Errors
  • Tutorials
    • Tutorials List
      • Getting Started with Notebooks
      • Train a Model with the Web UI
      • Train a Model with the CLI
      • Advanced: Distributed training sample project
      • Registering Models in Gradient
      • Using Gradient Deployments
      • Using Custom Containers
  • Notebooks
    • Overview
    • Using Notebooks
      • The Notebook interface
      • Notebook metrics
      • Share a Notebook
      • Fork a Notebook
      • Notebook Directories
      • Notebook Containers
        • Building a Custom Container
      • Notebook Workspace Include Files
      • Community (Public) Notebooks
    • ML Showcase
    • Run on Gradient (GitHub badge)
  • Projects
    • Overview
    • Managing Projects
    • GradientCI
      • GradientCI V1 (Deprecated)
  • Workflows
    • Overview
      • Getting Started with Workflows
      • Workflow Spec
      • Gradient Actions
  • Experiments
    • Overview
    • Using Experiments
      • Containers
      • Single-node & multi-node CLI options
      • Experiment options
      • Gradient Config File
      • Environment variables
      • Experiment datasets
      • Git Commit Tracking
      • Experiment metrics
        • System Metrics
        • Custom Metrics
      • Experiment Logs
      • Experiment Ports
      • GradientCI Experiments
      • Diff Viewer
      • Hyperparameter Tuning
    • Distributed Training
      • Distributed Machine Learning with Tensorflow
      • Distributed Machine Learning with MPI
        • Distributed Training using Horovod
        • Distributed Training Using ChainerMN
  • Jobs
    • Overview
    • Using Jobs
      • Stop a Job
      • Delete a Job
      • List Jobs
      • Job Logs
      • Job Metrics
        • System Metrics
        • Custom Metrics
      • Job Artifacts
      • Public Jobs
      • Building Docker Containers with Jobs
  • Models
    • Overview
    • Managing Models
      • Example: Prepare a TensorFlow Model for Deployments
      • Model Path, Parameters, & Metadata
    • Public Models
  • Deployments
    • Overview
    • Managing Deployments
      • Deployment Containers
        • Custom Deployment Containers
      • Deployment States
      • Deployment Logs
      • Deployment Metrics
      • A Deployed Model's API Endpoint
        • Gradient + TensorFlow Serving
      • Deployment Autoscaling
      • Optimize Models for Inference
  • Data
    • Types of Storage
      • Managing Data in Gradient
        • Managing Persistent Storage with VMs
    • Storage Providers
    • Versioned Datasets
    • Public Datasets Repository
  • TensorBoards
    • Overview
    • Using Tensorboards
      • TensorBoards getting started with Tensorflow
  • Metrics
    • Metrics Overview
    • View and Query Metrics
    • Push Metrics
  • Secrets
    • Overview
    • Using Secrets
  • Gradient SDK
    • Gradient SDK Overview
      • Projects Client
      • Experiments Client
      • Models Client
      • Deployments Client
      • Jobs Client
    • End to end tutorial
    • Full SDK Reference
  • Instances
    • Instance Types
      • Free Instances (Free Tier)
      • Instance Tiers
  • Gradient Cluster
    • Overview
    • Setup
      • Managed Private Clusters
      • Self-Hosted Clusters
        • Pre-installation steps
        • Gradient Installer CLI
        • Terraform
          • Pre-installation steps
          • Install on AWS
          • Install on bare metal / VMs
          • Install on NVIDIA DGX
        • Let's Encrypt DNS Providers
        • Updating your cluster
    • Usage
  • Tags
    • Overview
    • Using Tags
  • Machines (Paperspace CORE)
    • Overview
    • Using Machines
      • Start a Machine
      • Stop a Machine
      • Restart a Machine
      • Update a Machine
      • Destroy a Machine
      • List Machines
      • Show a Machine
      • Wait For a Machine
      • Check a Machine's utilization
      • Check availability
  • Paperspace Account
    • Overview
    • Public Profiles
    • Billing & Subscriptions
    • Hotkeys
    • Teams
      • Creating a Team
      • Upgrading to a Team Plan
  • Release Notes
    • Product release notes
    • CLI/SDK Release notes
Powered by GitBook
On this page
  • About
  • Public Containers
  • Private Containers
  • Examples using common container registries
  • Other Considerations
  1. Experiments
  2. Using Experiments

Containers

PreviousUsing ExperimentsNextSingle-node & multi-node CLI options

Last updated 5 years ago

About

When you run a Gradient Experiment you must provide a docker container within which your code will execute.

There are two types of containers you can provide: or .

Public Containers

The CLI expects that you provide a container using the standard Docker syntax. For example, any container on the Docker Hub registry will work out of the box. To pull the official Tensorflow container, you would check here:

And your container would take the form:

tensorflow/tensorflow:1.5.1-gpu where 1.5.1-gpu is the "tag" taken from here:

Example use:

gradient experiments run ... --container tensorflow/tensorflow:1.5.1-gpu

Private Containers

If you specify a container that is hosted on a private registry, then you will need to authenticate using the optional experiment parameters: registryUsername and registryPassword.

Here are some details on the pulling from private docker repositories:

There are two new options on the gradient experiments run and gradient experiments createmethods to be used when specifying a container (aka a docker image) from a private docker repository: --registryUsername and --registryPassword.

Example use:

gradient experiments run ...  --container "docker.io:/paperspace/tom_test_private" --registryUsername "tompaperspace" --registryPassword "xxxxxxxx"

Notes: You need to specify the username and password each time you reference a private docker repository in an experiment submission. The credentials are transmitted over ssl and cached for the lifetime of the experiment. If you clone the experiment the credentials are automatically reused.

The credential options --registryUsername and --registryPassword should be sufficient for a number of private docker registry services, including

  • Docker Hub

  • Amazon EC2 Container Registry

  • Google Container Registry

  • JFrog/Bintray.io

  • Quay.io

  • Custom registries running Docker Registry v2 protocol, as long as they are publicly accessible on the internet.

The form of the repository link, and the format of the username and password are specific for some of these registries, but are publicly documented.

Examples using common container registries

The following provides a summary for Docker Hub, AWS and GCP:

1. Docker Hub

Username: <your Docker ID> Password: <your Docker ID password>

Container link format: "docker.io/<organization-or-username>/<repository-name>[:<tag>]"

Example:

gradient experiments run ... --container "docker.io/myorganization/my_private_repo:latest" \
--registryUsername "myusername" --registryPassword "mypassword"

2. Amazon Elastic Container Service

Username: "AWS" Password: <value of '-p' option from output of the AWS SDK command "aws ecr get-login --no-include-email">

Container link format: "<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/[<namespace>/]<repository-name>" Example: Install and configure the AWS SDK on your local computer, then execute:

aws ecr get-login --no-include-email

Note: if the --no-include-email is reported as not supported, you may need to update your aws cli using pip3.

It will produce output similar to the following:

"docker login -u AWS -p sb2FkIjSIsImV4calksdhwTTyslia189231qwdgqer3kIjSIsImV4calksd123eiadfviuaigq2 \
9438hvehraw8437rgfq4398yq934ghalibrla478tqkIjSIsImV4calksd34ariIsImV4calksdhliarhg9q34yt9huargq934t \
123jAUhi37rgfq4398yq934ghalibrla478tq348437rgfqIsImV4calkIjSIsImV4calksdksdhlia4398yq934ghlibrla478 \
brla478tq348437rgfqIsImV123jAUhi37rgfq4398yq9sImV4calksdksghlib34ghali4calkIjSIrla478dhlia4398yq934 \
q34t843q98iOjE1MTYyNjUx https://XXXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com"

Use the value of the '-p' option from the actual output as the password value in the gradient experiments method. Note: the value may span multiple lines:

gradient experiments run ... --container "XXXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/mynamespace/tom_test_private" \
--registryUsername "AWS" --registryPassword sb2FkIjSIsImV4calksdhwTTyslia189231qwdgqer3kIjSIsImV4calksd123eiadfviuaigq2 \
9438hvehraw8437rgfq4398yq934ghalibrla478tqkIjSIsImV4calksd34ariIsImV4calksdhliarhg9q34yt9huargq934t \
123jAUhi37rgfq4398yq934ghalibrla478tq348437rgfqIsImV4calkIjSIsImV4calksdksdhlia4398yq934ghlibrla478 \
brla478tq348437rgfqIsImV123jAUhi37rgfq4398yq9sImV4calksdksghlib34ghali4calkIjSIrla478dhlia4398yq934 \
q34t843q98iOjE1MTYyNjUx

3. Google Container Registry

Username: "_json_key" Password: <service account JSON Keyfile contents>

Container link format: "[us.|eu.|asia.]gcr.io/<project-id>/<repository-name>[:<tag>]"

Example:

Use the contents of the keyfile as the password value in the gradient experiments method:

gradient experiments run ... --container "gcr.io/myproject/my_private_repo:latest" \
--registryUsername "_json_key" --registryPassword "$(cat keyfile.json)"

Note: do not base64 encode the contents of the keyfile. It will be encoded by the Gradient Experiments job runner at runtime.

Other Considerations

If you are running on a GPU-node your container will need to include the relevant NVIDIA GPU drivers. This can be done most easily by basing your container off of the official NVIDIA docker containers. i.e. the top line of your Dockerfile would be:

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04
Checkout the NVIDIA/Cuda repo for more possible tags: https://hub.docker.com/r/nvidia/cuda/tags/

For more details and options see:

To get this create a service account JSON key file for your project, following these instructions: Once you have downloaded your JSON key file, rename it to "keyfile.json".

For more details and other options see:

https://docs.aws.amazon.com/AmazonECR/latest/userguide/Registries.html
https://support.google.com/cloud/answer/6158849#serviceaccounts
https://cloud.google.com/container-registry/docs/advanced-authentication
https://hub.docker.com/r/tensorflow/tensorflow/
https://hub.docker.com/r/tensorflow/tensorflow/tags/
public
private