Install on AWS
For AWS, the Gradient installer will utilize Terraform to provision a Elastic Kubernetes Service (EKS) cluster. You must follow the pre-installation steps before continuing.
Requirements
There are many ways of passing in your credentials in order for Terraform to authenticate with your cloud provider. Most likely, you already have your cloud provider credentials loaded through the AWS CLI. Terraform will automatically detect those credentials during initialization for you. See configuring the AWS CLI for more information on setting up credentials and user profiles. The AWS user that's responsible for Gradient installation must have broad read/write privileges across services – ideally administrative privileges in the account.
Do not remove the user later or you will lose access to the cluster.
You will also need to have aws-iam-authenticator
installed on the computer or instance where you plan to run the installer. https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html
Configuration
Next, create a main.tf
file within your local gradient-cluster
directory that you created; main.tf
will be a sibling file to the backend.tf
file that you may have created already. Note: this file must be named main.tf
since Terraform looks for this configuration file by name.
In main.tf
, copy and paste the Terraform configuration below (note the copy icon in the upper right corner). Be sure to follow the value replacement instructions further below, as well.
SSL Configuration
The Gradient installer can use Let's Encrypt to create a SSL certificate, verify it by making entries with your DNS provider, and install the certificate on your cluster to secure access to notebooks, model deployments, etc. For this to work, your domains DNS provider must be on the supported list. To use this functionality, create a block in your main.tf
file similar to the one in the example below. Use the letsencrypt_dns_name
that matches your provider in the list, and provide the required authentication field(s) as specified in the letsencrypt_dns_settings
column.
If you don't want to use automatic SSL, use tls_cert
and tls_key
entries and be sure the SSL certificate files are located in the directory and filenames specified (or change them in the main.tf
file).
You can use either the Let's Encrypt block OR the manual certificate block, but not both.
Replace the following fields in the configuration above with the appropriate values:
name
(the same name used when registering the new cluster in the Paperspace web console)aws_region
(your preferred AWS region)artifacts_access_key_id
(the key for the bucket that was set up for artifacts storage)artifacts_path
(the full s3 path to the bucket)artifacts_secret_access_key
cluster_apikey
(provided during registration of the new cluster)cluster_handle
(provided during registration of the new cluster)domain
(same as what was entered during cluster registration)Also, either use automatic SSL or be sure the SSL certificate files are located in your gradient-cluster directory, and replace the filenames in your
main.tf
configuration to match them as needed.
Installation
Next, install Gradient using Terraform:
The init step should take less than a minute, and the apply step may take 15 minutes or more. At the end of the apply step, the installer will return the AWS hostname of the load balancer in your new cluster.
Gradient requires two DNS CNAME records to make external services accessible. Use the hostname of the load balancer as the target for these records, as shown below.
Example:
*.gradient.mycompany.com [ELB_HOSTNAME]
gradient.mycompany.com [ELB_HOSTNAME]
Hot nodes
By default, hot nodes are set up for experiments, model deployments, notebooks, and tensorboards on one c5.xlarge instance each.
Hot nodes can be reconfigured by setting k8s_node_asg_min_sizes in the main.tf file similar to the example below.
Managing the Kubernetes cluster with KUBECONFIG
For those familiar with Kubernetes, a file will be generated in the gradient-cluster folder that contains the Kubernetes kubeconfig
. To use the generated KUBECONFIG, AWS requires aws-iam-authenticator to be installed: https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html
Managing the Kubernetes cluster manually is not required to use Gradient.
Updating the Gradient cluster
To update Gradient, run terraform apply
from the gradient-cluster folder.
Uninstalling Gradient
Uninstallation can be handled by Terraform by running: terraform destroy
Last updated