Environment Variables
Information can be passed to and from Workflows, and between the jobs within them.
One form of this is our Workflow input/output, which can be datasets, volumes or strings.
Another is environment variables. These are used when we want to pass information in the Workflow YAML to some other computation being called, such as authenticating to a private GitHub repository, or running a Python .py
script.
Why use environment variables and not just arguments to a script?
Some reasons are
Environment variables can hold other information like secrets, that are harder to pass securely otherwise.
They can be defined as applying to all Workflow jobs (under
defaults:
), or per-job, under a job's name.Sets of arguments may be used several times, for example, the potentially large number of hyperparameters for training an ML model may want to be made explicit in a script. Then we may want to use them for training more than one model, varying some parameters but leaving the rest unchanged.
Using environment variables makes larger setups like this easier to handle than passing large argument lists more than once, helping ensure different model invocations get the same settings when they should be getting them.
Specifying environment variables in Workflows
An environment variable global to a script comes under env:
in the defaults:
field. An example that is commonly used is the secret to hold the user's API key, and the code block containing it, defining the environment variable HELLO
might look like
Common job-specific environment variables are information to be passed to a script, for example, the model hyperparameters from our Deep Learning Recommender tutorial appear in this code:
Here, the number of epochs to train the final model, HP_FINAL_EPOCHS: '50'
, and the model's learning rate, HP_FINAL_LR: '0.1'
, are used by the workflow_train_model.py
Python script that the job calls.
Using values of Workflow environment variables in scripts
To utilize the values of the Workflow environment variables in a script, the user parses them as part of their code. In the recommender case here we pass them to variables in the code:
Python's os.environ
reads the values, and we need to cast them to the correct data type integer, float, etc.) for the model to understand them.
Advanced Usage
For more advanced situations, Python has libraries such as env_config. This overlaps somewhat with what Workflows can do, because it allows you to declare environment variables as well. But it has clearer handling of issues like data types and errors, and can handle variables that are lists (useful for hyperparameter tuning), or more complex structures.
It can read declared variables from a file, e.g., test.sh
, from env_config
's GitHub page:
This could potentially be integrated with Workflows to similarly handle a set of Workflow environment variables.
Last updated