Gradient Docs
Gradient HomeHelp DeskCommunitySign up free
Gradient Next
Gradient Next
  • About Gradient
  • Get Started
    • Quick Start
      • Install the Gradient CLI
    • Core Concepts
    • Organizing Projects
      • Secrets
      • Storing an API key as a Secret
    • Tutorials
      • Gradient Notebooks Tutorial
      • Gradient Workflows Tutorial
      • Gradient Deployments Tutorial
    • FAQ
    • Common Errors
  • Gradient Platform
    • Gradient Notebooks
      • Runtimes
      • Files and storage
      • Machines
      • Terminal
      • Shortcuts
      • Sharing
      • TensorBoard
      • Run on Gradient
    • Gradient Workflows
      • Basic operations
      • Understanding Inputs & Outputs
      • Workflow Spec
      • Gradient Actions
      • Environment Variables
      • Using YAML for Data Science
    • Gradient Deployments
      • Basic operations
      • Deployment Spec
  • Artifacts
    • Container Management
      • Custom Containers
    • Data
      • Versioned Data
        • Public Datasets Repository
        • Storage Providers
      • Persistent Storage
    • Models
      • Managing Models
        • Model Types & Metadata
        • Public Models
    • Code
    • Metrics
      • Push Metrics
      • View & Query Metrics
  • Gradient Cluster
    • Overview
      • Setup
        • Managed Private Clusters
        • Self-Hosted Clusters
          • Pre-installation steps
          • Gradient Installer CLI
          • Terraform
            • Pre-installation steps
            • Install on AWS
            • Install on bare metal / VMs
            • Install on NVIDIA DGX
          • Let's Encrypt DNS Providers
          • Updating your cluster
      • Usage
  • More
    • SDK
      • Projects Client
      • Models Client
      • Deployments Client
      • Workflows Client
      • SDK Examples
      • Full SDK Reference
    • Machine Types
      • Machine Tiers
      • Free Machines (Free Tier)
    • Your Account
      • Teams
        • Creating a Team
        • Upgrading to a Team Plan
      • Hotkeys
      • Billing & Subscriptions
        • Storage Billing
      • Public Profiles
    • Release notes
    • Roadmap
Powered by GitBook
On this page
  • Key Concepts
  • defaults
  • inputs
  • jobs
  • Sample Workflow Spec
  1. Gradient Platform
  2. Gradient Workflows

Workflow Spec

PreviousUnderstanding Inputs & OutputsNextGradient Actions

Last updated 3 years ago

This describes in more detail the main components of a Gradient Workflow, as seen in the YAML file.

Key Concepts

defaults

At the top of the YAML Workflow file, you can specify default parameters to be used throughout the entire Workflow. This includes environment variables and default machine instance configuration. Instances can also be specified per-job.

inputs

The inputs block allows you to specify named inputs (e.g., a ) to be referenced and consumed by your jobs.

Note: you can also collect inputs in a separate YAML and reference this file as an inputPath when creating a Workflow run.

Workflow and job-level inputs can be of type: dataset (a persistent, versioned collection of data), string (e.g., a generated value or ID that may be output from another job) or volume (a temporary workspace mounted onto a job's container).

Note: datasets must be defined in advance of being referenced in a workflow. See for more information.

jobs

Jobs are also sometimes referred to as "steps" within the Gradient Workflow. A job is an individual task that executes code (such as a training a machine learning model) and can consume inputs and produce outputs.

Sample Workflow Spec

To run this Workflow, define datasets named test-one, test-two, and test-three as described in the documentation. Also, to make use of the secret named hello in the inputs section, define a secret as described .

defaults:
  # clusterId defaults to the NY2 public cluster, setting this parameter this is equaivalent to using the `--clusterId` flag on the command line.
  # This parameter often used for github triggered workflows running on private clusters.
  clusterId: clusterId 
  # Default environment variables for all jobs. Can use any supported
  # substitution syntax (named secrets, ephemeral secrets, etc.).
  env:
    # This environment variable uses a Gradient secret called "hello".
    HELLO: secret:hello
  # Default instance type for all jobs
  resources:
    instance-type: P4000
    container-registries: # optional
      - my-registry

# Workflow takes two inputs, neither of which have defaults. This means that
# when the Workflow is run the corresponding input for these values are
# required, for example:
#
# {"inputs": {"data": {"id": "test-one"}, "echo": {"value": "hello world"}}}
#
inputs:
  data:
    type: dataset
    with:
      ref: test-one
  echo:
    type: string
    with:
      value: "hello world"
jobs:
  job-1:
    # These are inputs for the "job-1" job; they are "aliases" to the
    # Workflow inputs.
    #
    # All inputs are placed in the "/inputs/<name>" path of the run
    # containers. So for this job we would have the paths "/inputs/data"
    # and "/inputs/echo".
    inputs:
      # The "/inputs/data" directory would contain the contents for the dataset
      # version. ID here refers to the name of the dataset, not its dataset ID.
      data: workflow.inputs.data
      # The "/inputs/echo" file would contain the string of the Workflow input
      # "echo".
      echo: workflow.inputs.echo
    # These are outputs for the "job-1" job.
    #
    # All outputs are read from the "/outputs/<name>" path.
    outputs:
      # A directory will automatically be created for output datasets and
      # any content written to that directory will be committed to a newly
      # created dataset version when the jobs completes.
      data2:
        type: dataset
        with:
          id: test-two
      # The container is responsible creating the file "/outputs/<name>" with the
      # content being a small-ish utf-8 encoded string.
      echo2:
        type: string
    # Set job-specific environment variables
    env:
      TSTVAR: test
    # Set action
    uses: container@v1
    # Set action arguments
    with:
      args:
      - bash
      - -c
      - find /inputs/data > /outputs/data2/list.txt; echo ENV $HELLO $TSTVAR > /outputs/echo2; cat /inputs/echo; echo; cat /outputs/data2/list.txt /outputs/echo2
      image: bash:5
  job-2:
    inputs:
      # These inputs use job-1 outputs instead of Workflow inputs. You must
      # specify job-1 in the needs section to reference them here.
      data2: job-1.outputs.data2
      echo2: job-1.outputs.echo2
    outputs:
      data3:
        type: dataset
        with:
          ref: test-three
    # List of job IDs that must complete before this job runs
    needs:
    - job-1
    uses: container@v1
    with:
      args:
      - bash
      - -c
      - wc -l /inputs/data2/list.txt > /outputs/data3/summary.txt; cat /outputs/data3/summary.txt /inputs/echo2
      image: bash:5
versioned dataset
Create Datasets for the Workflow
Create Datasets for the Workflow
here