Experiment datasets
Last updated
Last updated
When executing an experiment in Gradient you may optionally supply one or more datasets that will be downloaded into your experiment's environment prior to execution. These datasets can be downloaded from an S3 object or folder (including the full bucket). Gradient allows teams to run reproducible machine learning experiments by taking advantage of S3 ETags and Version IDs, which combine to allow you to be sure that datasets exactly match between training sets, and to be sure which version of a dataset you are using.
Datasets are downloaded and mounted readonly on /data/DATASET
within your experiment jobs using the supplied AWS credentials. The credentials are optional for public buckets. The name of the dataset is the basename
of the last item in the s3 path, e.g. s3://my-bucket/mnist.zip
would have the name mnist
and s3://my-bucket
would have the name my-bucket
. The name maybe overridden with the optional name
parameter.
You can launch an experiment & specify the desired S3 dataset with e-tags using the CLI as follows.
When launching an experiment using the config.yaml, pass in the multiple datasets using the following structure.
The datasets will show up in the web interface in the environment tab of the experiment you launch.