Managing Deployments
Last updated
Last updated
To create a Deployment via the UI, there are two pathways to start the Create Deployment flow: a) from a Model on the Models page; b) from the Deployments page. Both of these pathways are shown below, after which we will step through the Create Deployment flow.
Navigate to your list of trained Models by clicking Models in the side nav.
Find the model you want to deploy, and click Deploy Model.
Navigate to your list of Deployments by clicking Deployments in the side nav.
Click Create Deployment +.
Now that you've started the Create Deployment flow, let's walk through the various options and deploy your Model!
If you started the flow via the Models page (pathway a above), you'll skip this step since you've already chosen a Model to deploy.
If you started the flow via the Deployments page (pathway b above), you have the option to choose a Model by clicking the Model selector dropdown and selecting the Model you want to deploy.
Select the base container that will support your trained model to run it as a continuous web service. As both CPU & GPU serving are available, be sure to select the container corresponding to your selected machine type and what your Model was optimized for.
Select the GPU or CPU machine type to run your Deployment.
Select the number of instances to run the Deployment on. Below we chose 3, meaning there will be 3x K80 GPU instances backing this Deployment. Automatic load balancing is provided for all multi-instance deployments.
If applicable, choose a command to run at container launch.
Create Active Deployment (selected by default) means that the Deployment will be created and then automatically run:
Alternately, if you don't want your Deployment to run automatically after it is created, you can click toggle Create Inactive Deployment:
Since your Deployment will run as a continuous web service on the public internet, you may wish to require basic authentication on any requests to it. If so, be sure that Enable Basic Authentication is toggled on and then enter a username and password:
Finally, now that your Deployment is configured, click Create Deployment to create it:
Since Deployments are continuous web services, they can be in multiple states, including Provisioning, Provisioned, Running, Stopped, and Error.
Navigate to the Deployments page in the side nav to see your list of Deployments:
Each Deployment has: a Name and a unique ID; links to its associated Experiment and Model (by ID) that it was created from; its Container Type; Date Created; Status; and Actions you can perform.
To start a Stopped Deployment, click Start from among that Deployment's Actions. The Status will change to Provisioning and, if all goes smoothly, will soon say Running. Learn more about .
You can edit a Deployment's attributes, such as the underlying model, the Deployment's name, instance count, etc.
To edit a Deployment, navigate to the Deployments page, find the Deployment you want to edit, and click Edit in the Actions column:
This will launch the Edit Deployment flow, which is nearly the same as the Create Deployment flow. The differences are that the Edit Deployment flow will display the Deployment ID, the Deployment Endpoint, and will always allow you to Choose a Model; and it will not display the Create Active Deployment toggle. (If you want to edit and start a stopped Deployment, save your changes and then click Start back on the Deployments page.)
Besides those differences, you can edit any of the other values of your Deployment just like you did in the Create Deployment flow.
When you are done and want to save your changes, click
The client sends the user name and password as unencrypted base64 encoded text.
Most web browsers will display a login dialog when this response is received, allowing the user to enter a username and password.
To add HTTP authentication to deployment in GUI - at the bottom of the creation page you have to check "Enable Basic Authentication"
Each Deployment has its own unique RESTful API. Inference can be performed via the shown endpoint: https://services.paperspace.io/model-serving/<your-model-id>:predict
. The number of running instances and the instance count are visible as well.
Congrats, you've created a Deployment and can perform inference!
To start a previously created but Stopped deployment by ID, use the start
subcommand:
To stop a Running Deployment by ID, use the stop
subcommand:
gradient deployments list
--state [BUILDING|PROVISIONING|STARTING|RUNNING|STOPPING|STOPPED|ERROR]
Filter by deployment state
--projectId TEXT Use to filter by project ID
--modelId TEXT Use to filter by model ID
--apiKey TEXT API key to use this time only
--optionsFile PATH Path to YAML file with predefined options
--createOptionsFile PATH Generate template options file
--help Show this message and exit.
gradient list --state RUNNING
gradient deployments update --id <your-deployment-id>
Usage: gradient deployments update [OPTIONS]
Modify existing deployment
Options:
--id TEXT ID of existing deployment
[required]
--deploymentType [TFServing|ONNX|Custom|Flask|TensorRT]
Model deployment type
--projectId TEXT Project ID
--modelId TEXT ID of a trained model
--name TEXT Human-friendly name for new model deployment
--machineType TEXT Type of machine for new deployment
--imageUrl TEXT Docker image for model serving
--instanceCount INTEGER Number of machine instances
--command TEXT Deployment command
--containerModelPath TEXT Container model path
--imageUsername TEXT Username used to access docker image
--imagePassword TEXT Password used to access docker image
--imageServer TEXT Docker image server
--containerUrlPath TEXT Container URL path
--method TEXT Method
--dockerArgs JSON_STRING JSON-style list of docker args
--env JSON_STRING JSON-style environmental variables map
--apiType TEXT Type of API
--ports TEXT Ports
--authUsername TEXT Username
--authPassword TEXT Password
--clusterId TEXT Cluster ID
--workspace TEXT Path to workspace directory, archive, S3 or
git repository
--workspaceRef TEXT Git commit hash, branch name or tag
--workspaceUsername <username> Workspace username
--workspacePassword TEXT Workspace password
--minInstanceCount TEXT Minimal instance count
--maxInstanceCount TEXT Maximal instance count
--scaleCooldownPeriod TEXT Scale cooldown period
--metric TEXT Autoscaling metrics. Example:
my_metric/targetAverage:21.37
--resource TEXT Autoscaling resources. Example:
cpu/target:60
--apiKey TEXT API key to use this time only
--optionsFile PATH Path to YAML file with predefined options
--createOptionsFile PATH Generate template options file
--help Show this message and exit.
--authUsername 'testuser'
--authPassword 'test'
gradient deployments create
--deploymentType TFServing
--name "authtest"
--modelId <model id>
--authUsername 'testuser'
--authPassword 'test'
--machineType c5.xlarge
--clusterId clu7jqg9j
--imageUrl 'tensorflow/serving:latest'
--instanceCount 1
gradient deployments start --id <your-deployment-id>
gradient deployments stop --id <your-deployment-id>