You can view metrics for Jobs by entering the "Metrics" tab. You can view metrics for Experiments by clicking on the individual Jobs that represent worker nodes.
Deployments
To view Deployment metrics, navigate to the Metrics tab of the individual deployment.
Notebooks
To view Notebook metrics, you will see a minimal version in the Notebook header and you can also open a more robust display with detailed information.
To query the metrics for a given workload using the CLI. Here's an example using Experiments (the Deployments and Notebooks syntax are the same).
Usage: gradient experiments metrics [OPTIONS] COMMAND [ARGS]...
Read experiment metrics
Options:
--help Show this message and exit.
Commands:
get Get experiment metrics
stream Watch live experiment metrics
Get a single value for a given metric:
Syntax
gradient experiments metrics get --id <experiment id>
Parameters
Usage: gradient experiments metrics get [OPTIONS]
Get experiment metrics. Shows CPU and RAM usage by default
Options:
--id TEXT ID of the experiment [required]
--metric [cpuPercentage|memoryUsage|gpuMemoryFree|gpuMemoryUsed|gpuPowerDraw|gpuTemp|gpuUtilization|gpuMemoryUtilization]
One or more metrics that you want to read.
Defaults to cpuPercentage and memoryUsage
--interval TEXT Interval
--start [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]
Timestamp of first time series metric to
collect
--end [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]
Timestamp of last time series metric to
collect
--apiKey TEXT API key to use this time only
--optionsFile PATH Path to YAML file with predefined options
--createOptionsFile PATH Generate template options file
--help Show this message and exit.
Example command to get CPU usage in %:
gradient experiments metrics get --id <experiment id> --metric cpuPercentage
Usage: gradient experiments metrics stream [OPTIONS]
Get experiment metrics. Shows CPU and RAM usage by default
Options:
--id TEXT ID of the experiment [required]
--metric [cpuPercentage|memoryUsage|gpuMemoryFree|gpuMemoryUsed|gpuPowerDraw|gpuTemp|gpuUtilization|gpuMemoryUtilization]
One or more metrics that you want to read.
Defaults to cpuPercentage and memoryUsage
--interval TEXT Interval
--start [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]
Timestamp of first time series metric to
collect
--end [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]
Timestamp of last time series metric to
collect
--apiKey TEXT API key to use this time only
--optionsFile PATH Path to YAML file with predefined options
--createOptionsFile PATH Generate template options file
--help Show this message and exit.
Deployment metrics are a Gradient Private Cluster feature. for inquiries!
Notebook metrics are a Gradient Private Cluster feature. for inquiries!