Gradient Docs
Gradient HomeHelp DeskCommunitySign up free
1.0.0
1.0.0
  • About Paperspace Gradient
  • Get Started
    • Quick Start
    • Core Concepts
    • Install the Gradient CLI
    • Common Errors
  • Tutorials
    • Tutorials List
      • Getting Started with Notebooks
      • Train a Model with the Web UI
      • Train a Model with the CLI
      • Advanced: Distributed training sample project
      • Registering Models in Gradient
      • Using Gradient Deployments
      • Using Custom Containers
  • Notebooks
    • Overview
    • Using Notebooks
      • The Notebook interface
      • Notebook metrics
      • Share a Notebook
      • Fork a Notebook
      • Notebook Directories
      • Notebook Containers
        • Building a Custom Container
      • Notebook Workspace Include Files
      • Community (Public) Notebooks
    • ML Showcase
    • Run on Gradient (GitHub badge)
  • Projects
    • Overview
    • Managing Projects
    • GradientCI
      • GradientCI V1 (Deprecated)
  • Workflows
    • Overview
      • Getting Started with Workflows
      • Workflow Spec
      • Gradient Actions
  • Experiments
    • Overview
    • Using Experiments
      • Containers
      • Single-node & multi-node CLI options
      • Experiment options
      • Gradient Config File
      • Environment variables
      • Experiment datasets
      • Git Commit Tracking
      • Experiment metrics
        • System Metrics
        • Custom Metrics
      • Experiment Logs
      • Experiment Ports
      • GradientCI Experiments
      • Diff Viewer
      • Hyperparameter Tuning
    • Distributed Training
      • Distributed Machine Learning with Tensorflow
      • Distributed Machine Learning with MPI
        • Distributed Training using Horovod
        • Distributed Training Using ChainerMN
  • Jobs
    • Overview
    • Using Jobs
      • Stop a Job
      • Delete a Job
      • List Jobs
      • Job Logs
      • Job Metrics
        • System Metrics
        • Custom Metrics
      • Job Artifacts
      • Public Jobs
      • Building Docker Containers with Jobs
  • Models
    • Overview
    • Managing Models
      • Example: Prepare a TensorFlow Model for Deployments
      • Model Path, Parameters, & Metadata
    • Public Models
  • Deployments
    • Overview
    • Managing Deployments
      • Deployment Containers
        • Custom Deployment Containers
      • Deployment States
      • Deployment Logs
      • Deployment Metrics
      • A Deployed Model's API Endpoint
        • Gradient + TensorFlow Serving
      • Deployment Autoscaling
      • Optimize Models for Inference
  • Data
    • Types of Storage
      • Managing Data in Gradient
        • Managing Persistent Storage with VMs
    • Storage Providers
    • Versioned Datasets
    • Public Datasets Repository
  • TensorBoards
    • Overview
    • Using Tensorboards
      • TensorBoards getting started with Tensorflow
  • Metrics
    • Metrics Overview
    • View and Query Metrics
    • Push Metrics
  • Secrets
    • Overview
    • Using Secrets
  • Gradient SDK
    • Gradient SDK Overview
      • Projects Client
      • Experiments Client
      • Models Client
      • Deployments Client
      • Jobs Client
    • End to end tutorial
    • Full SDK Reference
  • Instances
    • Instance Types
      • Free Instances (Free Tier)
      • Instance Tiers
  • Gradient Cluster
    • Overview
    • Setup
      • Managed Private Clusters
      • Self-Hosted Clusters
        • Pre-installation steps
        • Gradient Installer CLI
        • Terraform
          • Pre-installation steps
          • Install on AWS
          • Install on bare metal / VMs
          • Install on NVIDIA DGX
        • Let's Encrypt DNS Providers
        • Updating your cluster
    • Usage
  • Tags
    • Overview
    • Using Tags
  • Machines (Paperspace CORE)
    • Overview
    • Using Machines
      • Start a Machine
      • Stop a Machine
      • Restart a Machine
      • Update a Machine
      • Destroy a Machine
      • List Machines
      • Show a Machine
      • Wait For a Machine
      • Check a Machine's utilization
      • Check availability
  • Paperspace Account
    • Overview
    • Public Profiles
    • Billing & Subscriptions
    • Hotkeys
    • Teams
      • Creating a Team
      • Upgrading to a Team Plan
  • Release Notes
    • Product release notes
    • CLI/SDK Release notes
Powered by GitBook
On this page
  • Set Up GradientCI
  • Creating a GradientCI Project via the Paperspace Console
  • Configure GradientCI Settings
  • Building Branches and Tags
  • Workflows
  • Steps
  • Triggers
  • Metrics Checks
  • Job Environment
  • S3 Datasets
  • Examples
  • Repositories
  • Examples with only required fields
  • Examples with (some) optional fields included
  • Uninstalling GradientCI
  • Troubleshooting
  • Project Setup Incomplete
  1. Projects

GradientCI

Set up continuous integration between your GitHub repository and Gradient

PreviousManaging ProjectsNextGradientCI V1 (Deprecated)

Last updated 4 years ago

Set Up GradientCI

Creating a GradientCI Project via the Paperspace Console

To create a Gradient Project with continuous integration powered by GradientCI and GitHub:

  1. (GitHub Admin privilege is required.)

  2. Click Create Project.

  3. Select GitHub Project.

  4. Grant Paperspace access to your GitHub repos via OAuth.

  5. Confirm a repo with GradientCI installed for your new Gradient° Project.

Configure GradientCI Settings

To set up GradientCI, our continuous integration service, include a directory in your GitHub repository called .ps_project with a configuration file config.yaml, examples below.

Building Branches and Tags

You may additionally disable the builds of pull requests, enabled by default. Or enable builds of pull requests that originate from forked repositories, disabled by default to prevent unauthorized use of Gradient resources. Each of these options will allow configuration to be sourced from the relevant Git branch.

Template

# .ps_project/config.yaml:

version: 2

workflows:
  experiment-workflow:
    steps:
      -
        name: "my-experiment"
        command: experiment.run_single_node
        params:
          command: python mnist.py
          container: tensorflow/tensorflow:1.13.1-gpu-py3
          experimentEnv:
            EPOCHS_EVAL: 5
            EVAL_SECS: 10
            MAX_STEPS: 1000
            TRAIN_EPOCHS: 10
          machineType: P4000
          modelPath: /artifacts
          modelType: Tensorflow
          name: mnist-cli-config-yaml
        checks: #[optional]
          onnx:loss:
            target: "0.0..0.5"
            aggregate: "mean"
          defaults: #[optional]
            precision: 3
      triggers:
        branches:
          ignore: irrelevant-branch
        tags:
          only:
            - v.*
            - latest
  param-file-experiment-workflow:
    steps:
      -
        name: "param-file-experiment"
        command: experiment.run_single_node
        paramsFile: .ps_project/param-file-experiment.yml
        checks: #[optional]
          custom:loss:
            target: "0.0..0.5"
            aggregate: "/mean/0"
      triggers:
        branches:
          only: master

Workflows

Steps

Required Properties

  • name: a string identifier for the step, often used in reporting results.

    This name must be unique amongst the steps for a workflow.

  • command: what type of step should be executed.

    These match the qualified names in the Paperspace SDK.

    Supported commands:

    • experiment.run_single_node

    • experiment.run_multi_node

    • experiment.run_mpi_multi_node

  • One of params or paramsFile

    • paramsFile: a path to file contained in your project repository containing the yaml arguments for the step.

    • params this is a object with the parameters for the relevant step.

      These are the same as the contents of a paramsFile inlined into your .ps_projects/config.yaml.

Optional Properties

  • checks: these configure status checks on your GitHub pull requests.

Triggers

Branches and Tags

Metrics Checks

The GradientCI service can help you from degrading your model. When you run an experiment based on a code change, GradientCI can automatically check properties of the experiment and forward those results to GitHub. GitHub can use these statuses to prevent pull requests merges that degrade your model. GradientCI will automatically report whether the experiment ran without error or which checks failed if there was an error. You may configure additional checks for metrics that are generated by your experiment under the checks key in your config.yaml. Currently, we only support scalar metrics coming from TensorFlow generated summaries.

In addition to the status checks, GradientCI writes a detailed summary of the experiment to a comment on the pull request so you have your critical data at a glance while reviewing code.

GitHub Configuration

Checks Schema

# ...
checks:
  <identifier>:
    target: <range> #[required]
    aggregate: mean #[required]
    round: down #[optional, default: down, up|down]
    precision: 2 #[optional, default: 2]
    only-pulls: false #[optional, default: false]
    if-not-found: failure #[optional, default: "failure", success|failure]
    comment-on-pr: true #[optional, default: true]
  defaults:
    round: down #[optional, default: down, up|down]
    precision: 2 #[optional, default: 2]
    only-pulls: false #[optional, default: false]
    if-not-found: failure #[optional, default: "failure", success|failure]
    comment-on-pr: true #[optional, default: true]

<identifier>s:

  • <identifiers> are split into two parts: 1. the source of the metric and 2. the name of the metric, e.g. tensorflow:loss.

    Supported <identifiers> are:

    • tensorflow

    • onnx

    • custom

  • These <identifiers> are case-sensitive

  • “defaults”: reserved for defaults to set for the <identifiers>.

    Any of the above keys are valid within this block and will set the default behavior for all other <identifier> blocks.

    If round is set to down all other metrics will be evaluated using the round down behavior.

    If an <identifier> has a key specified underneath it it will override the value in the defaults block.

    For instance, round: up under tensorflow:loss will override the round: down behavior in the defaults block.

<range>

Ranges can appear in the following forms:

  1. <number> or <number>.. this form allows you to specify that the metric must be greater than <number>.

  2. ..<number> this form allows you to specify that the metric must be less than <number>.

  3. <left>..<right> this form allows you to specify that the metric must be less than <left> but greater than <right>.

Note these numbers are parsed as floats and relying on precise equality with the ends of the range is not recommended.

Required Properties

  • target: this is a <range> for the metric to appear in

  • aggregate: this is the aggregating function to evaluate the metric by.

    • These are objects emitted from your model summary, for tensorflow valid aggregates are:

      • mean

      • stddev

      • max

      • min

      • var

      • median

      • These paths must begin with /

      • For example the mean of loss in {"loss": { "mean": 0.5 }} would be accessed with /loss/mean

      • For example the 2nd element in the loss array in {"loss": [0.0, 0.5] } would be accessed with /loss/1

Optional Properties

  • precision: how many decimal places to keep (default 2)

  • round: specifies rounding behavior (default “down”)

  • only-pulls: only perform this check on pull requests

  • if-not-found: return a default status if job has no data for <identifier>, defaults to “failure”

  • comment-on-pr: include this metric in a summary content if the metrics were generated from a pull-request

Job Environment

For some steps in your job GradientCI will automatically set environment variables so you can determine the triggering condition. These variables are currently set for experiment.* steps only.

  • PS_GRADIENT_CI_GIT_REF the branch or tag that triggered the build

  • PS_GRADIENT_CI_GIT_SHA the git commit currently being built

  • PS_GRADIENT_CI_GIT_REPO_URL the repository that received the trigger.

    Note that builds from forks will appear to come from the target repository, not the fork.

S3 Datasets

Examples

Repositories

Examples with only required fields

Single-node

version: 2

workflows:
  single-node:
    steps:
      -
        name: "single-node"
        command: experiment.run_single_node
        params:
          command: nvidia-smi
          container: tensorflow/tensorflow:1.13.1-gpu-py3
          machineType: "K80"

Multi-Node

version: 2

workflows:
  multi-node:
    steps:
      -
        name: "multi-node"
        command: experiment.run_multi_node
        params:
          workerCommand: nvidia-smi
          workerContainer: tensorflow/tensorflow:1.13.1-gpu-py3
          workerMachineType: "K80"
          workerCount: 2
          parameterServerCommand: nvidia-smi
          parameterServerContainer: tensorflow/tensorflow:1.13.1-gpu-py3
          parameterServerMachineType: "K80"
          parameterServerCount: 1

Examples with (some) optional fields included

Single-node

version: 2

workflows:
  single-node:
    steps:
      -
        name: "single-node"
        command: experiment.run_single_node
        params:
          command: nvidia-smi
          machineType: "K80"
          ports: "5000"
          workingDirectory: "/home/playground"
          artifactDirectory: "/artifacts"
          container: tensorflow/tensorflow:1.13.1-gpu-py3
          datasetUri: "https://some_other_uri/uri"
          datasetAwsAccessKeyId: "secret:<some_secret_name>"
          datasetAwsSecretAccessKey: "secret:<some_other_secret_name>"

Multi-Node

version: 2

workflows:
  single-node:
    steps:
      -
        name: "single-node"
        command: experiment.run_multi_node
        params:
          workerCommand: nvidia-smi
          workerMachineType: "K80"
          workerWorkingDirectory: "/home/playground"
          workerArtifactDirectory: "/artifacts"
          workerContainer: tensorflow/tensorflow:1.13.1-gpu-py3
          parameterServerCommand: nvidia-smi
          parameterServerMachineType: "K80"
          parameterServerWorkingDirectory: "/home/playground"
          parameterServerArtifactDirectory: "/artifacts"
          parameterServerContainer: tensorflow/tensorflow:1.13.1-gpu-py3

TensorFlow Model Summary Checks

version: 2

workflows:
  single-node:
    steps:
      -
        name: "single-node"
        command: experiment.run_single_node
        params:
          command: nvidia-smi
          machineType: "K80"
          ports: "5000"
          workingDirectory: "/home/playground"
          artifactDirectory: "/artifacts"
          container: tensorflow/tensorflow:1.13.1-gpu-py3
        modelType: "Tensorflow"
        modelPath: "/storage/models"
      checks:
        defaults:
          precision: 3
          round: up
          only-prs: true
        tensorflow:accuracy:
          target: 0.7..
          aggregation: mean
        tensorflow:loss:
          target: ..0.025
          aggregation: max

Uninstalling GradientCI

Note this can only be done by an organization level administrator or on your personal repositories.

  1. Navigate to the repository or organization that you wish to remove GradientCI from.

  2. Click the "Settings" tab in the top row

  3. Select "Integration & services" from the left menu, you should be presented with a list that includes "GradientCI" that looks like

  4. Select "Configure" next to "GradientCI", you will be prompted to enter your password.

  5. a. Uninstall the application from all repositories on the organization or personal account by clicking the red "Uninstall" button

    b. Select the "Only select repositories" and choosing which repositories should have GradientCI from the dropdown, or unselecting them by clicking the "x" next to their name.

Troubleshooting

Project Setup Incomplete

Install the GitHub app on your repository.

Navigate to the .

GradientCI supports building and sourcing project configuration from arbitrary branches or tags. By default we only build configuration sourced from your default branch (typically master). You can , if you only need the configuration from one branch. You can relax or tighten this rule by selecting "All" or "None" from the "Build Branches" dropdown in the project settings pane of the Gradient console. If you would like to build any tags or a subset of branches that are not the default branch, select "All" from this menu and provide filters in your config.yaml. To list the specific patterns of tags and branches to build, see .

Your CI configuration is separated into individual workflows. Workflows are named units of work that are executed in parallel. A workflow is comprised of a block and a series of to execute.

Steps are a series of sequentially executed actions. These generally correspond to actions available in the Paperspace CLI/SDK. Currently, only one step per workflow is supported. For complex pipelines you may run a single node experiment that contains a python script with a series of Paperspace SDK calls to orchestrate your pipeline. The SDK Pipeline style offers more flexibility than can be described by the yaml pipeline configuration as you can describe complex logic using plain python code. SDK pipelines offer the latest SDK functionality as soon as it becomes available, often before it can be integrated into the yaml pipeline syntax. For an example see

These are the same as those generated by the CLI, see the experiments for an example of one of these files and further documentation.

See for the full schema.

By default GradientCI will only build the default branch and no tags. If you would like to build additional non-pull request branches or tags you must select "All" from the "Build Branches" project configuration dropdown. This will build all branches and no tags. Once that is complete you can filter additional branches in your config.yaml by providing a triggers section. You can place the keys branches or tags to apply filters to the default. Under each key you can provide only or ignore fields, but not both, containing a or array of regex to match on. Branches or tags filtered by an only key must match one or more of the regex provided. Branches or tags filtered by an ignore key will be skipped if they match one or more of the regex provided.

For best results, especially on repositories with many contributors, we recommend configuring GitHub branch protections to prevent accidental merges of unintended pull requests. After your first build, statuses for the metrics will be reported back and you can make passing statuses required for the merge of your pull request. To do this follow

If the <identifier> is custom, the aggregate is evaluated as a

We highly recommend the use of secrets on datasets as these get stored as plain text. You may set secrets on Cluster, Project, and/or Team level. If the same secret name is created for more than one scope only one will be applied. Secrets have the following precedence: Cluster > Project > Team. You can learn more about setting and using secrets .

From the "GradientCI" settings menu, you can then either

This error occurs when GradientCI is unable to find the attached Paperspace project for your GitHub repository. The Paperspace project may have been deleted or you may have installed the app only through GitHub's interface. Follow the to ensure that your Paperspace project is properly configured.

GradientCI
Projects page
SDK Pipeline.
config documentation
POSIX compatible regex
GitHub's documentation.
JSON pointer
here
MNIST in TensorFlow
SDK Pipeline
change your repositories default branch from within Github
GradientCI project setup instructions
gradientci logo
Gradient console project settings pane
GitHub pull request blocked by failing GradientCI metric checks.
GradientCI Settings
Integration & services pane
branch and tag triggers
triggers
steps
Metrics Checks