Gradient Docs
Gradient HomeHelp DeskCommunitySign up free
1.0.0
1.0.0
  • About Paperspace Gradient
  • Get Started
    • Quick Start
    • Core Concepts
    • Install the Gradient CLI
    • Common Errors
  • Tutorials
    • Tutorials List
      • Getting Started with Notebooks
      • Train a Model with the Web UI
      • Train a Model with the CLI
      • Advanced: Distributed training sample project
      • Registering Models in Gradient
      • Using Gradient Deployments
      • Using Custom Containers
  • Notebooks
    • Overview
    • Using Notebooks
      • The Notebook interface
      • Notebook metrics
      • Share a Notebook
      • Fork a Notebook
      • Notebook Directories
      • Notebook Containers
        • Building a Custom Container
      • Notebook Workspace Include Files
      • Community (Public) Notebooks
    • ML Showcase
    • Run on Gradient (GitHub badge)
  • Projects
    • Overview
    • Managing Projects
    • GradientCI
      • GradientCI V1 (Deprecated)
  • Workflows
    • Overview
      • Getting Started with Workflows
      • Workflow Spec
      • Gradient Actions
  • Experiments
    • Overview
    • Using Experiments
      • Containers
      • Single-node & multi-node CLI options
      • Experiment options
      • Gradient Config File
      • Environment variables
      • Experiment datasets
      • Git Commit Tracking
      • Experiment metrics
        • System Metrics
        • Custom Metrics
      • Experiment Logs
      • Experiment Ports
      • GradientCI Experiments
      • Diff Viewer
      • Hyperparameter Tuning
    • Distributed Training
      • Distributed Machine Learning with Tensorflow
      • Distributed Machine Learning with MPI
        • Distributed Training using Horovod
        • Distributed Training Using ChainerMN
  • Jobs
    • Overview
    • Using Jobs
      • Stop a Job
      • Delete a Job
      • List Jobs
      • Job Logs
      • Job Metrics
        • System Metrics
        • Custom Metrics
      • Job Artifacts
      • Public Jobs
      • Building Docker Containers with Jobs
  • Models
    • Overview
    • Managing Models
      • Example: Prepare a TensorFlow Model for Deployments
      • Model Path, Parameters, & Metadata
    • Public Models
  • Deployments
    • Overview
    • Managing Deployments
      • Deployment Containers
        • Custom Deployment Containers
      • Deployment States
      • Deployment Logs
      • Deployment Metrics
      • A Deployed Model's API Endpoint
        • Gradient + TensorFlow Serving
      • Deployment Autoscaling
      • Optimize Models for Inference
  • Data
    • Types of Storage
      • Managing Data in Gradient
        • Managing Persistent Storage with VMs
    • Storage Providers
    • Versioned Datasets
    • Public Datasets Repository
  • TensorBoards
    • Overview
    • Using Tensorboards
      • TensorBoards getting started with Tensorflow
  • Metrics
    • Metrics Overview
    • View and Query Metrics
    • Push Metrics
  • Secrets
    • Overview
    • Using Secrets
  • Gradient SDK
    • Gradient SDK Overview
      • Projects Client
      • Experiments Client
      • Models Client
      • Deployments Client
      • Jobs Client
    • End to end tutorial
    • Full SDK Reference
  • Instances
    • Instance Types
      • Free Instances (Free Tier)
      • Instance Tiers
  • Gradient Cluster
    • Overview
    • Setup
      • Managed Private Clusters
      • Self-Hosted Clusters
        • Pre-installation steps
        • Gradient Installer CLI
        • Terraform
          • Pre-installation steps
          • Install on AWS
          • Install on bare metal / VMs
          • Install on NVIDIA DGX
        • Let's Encrypt DNS Providers
        • Updating your cluster
    • Usage
  • Tags
    • Overview
    • Using Tags
  • Machines (Paperspace CORE)
    • Overview
    • Using Machines
      • Start a Machine
      • Stop a Machine
      • Restart a Machine
      • Update a Machine
      • Destroy a Machine
      • List Machines
      • Show a Machine
      • Wait For a Machine
      • Check a Machine's utilization
      • Check availability
  • Paperspace Account
    • Overview
    • Public Profiles
    • Billing & Subscriptions
    • Hotkeys
    • Teams
      • Creating a Team
      • Upgrading to a Team Plan
  • Release Notes
    • Product release notes
    • CLI/SDK Release notes
Powered by GitBook
On this page
  • Model Status API
  • Model Metadata API
  • Classify and Regress API
  • Predict API
  • Authentication
  • Example Python REST API Client
  1. Deployments
  2. Managing Deployments
  3. A Deployed Model's API Endpoint

Gradient + TensorFlow Serving

How to perform inference using a Deployment's TF Serving RESTful API

PreviousA Deployed Model's API EndpointNextDeployment Autoscaling

Last updated 5 years ago

When you specify the default Deployment type via --deploymentType TFServing, Gradient deploys your Model using a . This Deployment comes with a built-in RESTful API.

Request and Response Formats

The request and response to/from a Deployment's RESTful API is a JSON object.

The composition of this object depends on the request type or verb.

In case of an error, all API endpoints will return a JSON object in the response body with error as the key and the error message as the value:

{
  "error": <error message string>
}

Model Status API

Returns the status of a model in the ModelServer.

URL

GET https://services.paperspace.io/model-serving/your-deployment-id/versions/${MODEL_VERSION}

/versions/${MODEL_VERSION} is optional. If omitted, status for all versions is returned in the response.

Response format

If successful, returns a JSON representation of a protobuf.

Model Metadata API

Returns the metadata of a Model in the ModelServer.

URL

GET https://services.paperspace.io/model-serving/your-deployment-id/versions/${MODEL_VERSION}/metadata

/versions/${MODEL_VERSION} is optional. If omitted, the Model metadata for the latest version is returned in the response.

Response format

Classify and Regress API

URL

POST https://services.paperspace.io/model-serving/your-deployment-id/versions/${MODEL_VERSION}:(classify|regress)

/versions/${MODEL_VERSION} is optional. If omitted, the latest version is used.

Request format

The request body for the classify and regress APIs must be a JSON object formatted as follows:

{
  // Optional: serving signature to use.
  // If unspecified, the default serving signature is used.
  "signature_name": <string>,

  // Optional: Common context shared by all examples.
  // Features that appear here MUST NOT appear in examples (below).
  "context": {
    "<feature_name3>": <value>|<list>
    "<feature_name4>": <value>|<list>
  },

  // List of Example objects
  "examples": [
    {
      // Example 1
      "<feature_name1>": <value>|<list>,
      "<feature_name2>": <value>|<list>,
      ...
    },
    {
      // Example 2
      "<feature_name1>": <value>|<list>,
      "<feature_name2>": <value>|<list>,
      ...
    }
    ...
  ]
}

Response format

A classify request returns a JSON object in the response body, formatted as follows:

{
  "result": [
    // List of class label/score pairs for first Example (in request)
    [ [<label1>, <score1>], [<label2>, <score2>], ... ],

    // List of class label/score pairs for next Example (in request)
    [ [<label1>, <score1>], [<label2>, <score2>], ... ],
    ...
  ]
}

<label> is a string (which can be an empty string "" if the model does not have a label associated with the score). <score> is a decimal (floating point) number.

The regress request returns a JSON object in the response body, formatted as follows:

{
  // One regression value for each example in the request in the same order.
  "result": [ <value1>, <value2>, <value3>, ...]
}

<value> is a decimal number.

Users of gRPC API will notice the similarity of this format with ClassificationResponse and RegressionResponseprotos.

Predict API

URL

POST https://services.paperspace.io/model-serving/your-deployment-id/versions/${MODEL_VERSION}:predict

/versions/${MODEL_VERSION} is optional. If omitted the latest version is used.

Request format

The request body for the predict API must be a JSON object formatted as follows:

{
  // (Optional) Serving signature to use.
  // If unspecified, the default serving signature is used.
  "signature_name": <string>,

  // Input Tensors in row ("instances") or columnar ("inputs") format.
  // A request can have either of them but NOT both.
  "instances": <value>|<(nested)list>|<list-of-objects>
  "inputs": <value>|<(nested)list>|<object>
}

Specifying input tensors in row format

In the row format, inputs are keyed to the instances key in the JSON request.

When there is only one named input, specify the value of the instances key to be the value of the input:

{
  // List of 3 scalar tensors.
  "instances": [ "foo", "bar", "baz" ]
}

{
  // List of 2 tensors each of [1, 2] shape
  "instances": [ [[1, 2]], [[3, 4]] ]
}

Tensors are expressed naturally in nested notation since there is no need to manually flatten the list.

For multiple named inputs, each item is expected to be an object containing input name/tensor key-value pairs, one for each named input. As an example, the following is a request with two instances, each with a set of three named input tensors:

{
 "instances": [
   {
     "tag": "foo",
     "signal": [1, 2, 3, 4, 5],
     "sensor": [[1, 2], [3, 4]]
   },
   {
     "tag": "bar",
     "signal": [3, 4, 1, 2, 5]],
     "sensor": [[4, 5], [6, 8]]
   }
 ]
}

Note, each named input ("tag", "signal", "sensor") is implicitly assumed have same 0-th dimension (two in above example, as there are two objects in the instances list). If you have named inputs that have different 0-th dimensions, use the columnar format described below.

Specifying input tensors in column format

In the columnar format, inputs are keyed to the inputs key in the JSON request.

The value for inputs key can either a single input tensor or a map of input name/tensors key-value pairs (listed in their natural nested form). Each input can have an arbitrary shape and does not need to share the same 0-th dimension (aka batch size) as required by the row format described above.

Columnar representation of the previous example is as follows:

{
 "inputs": {
   "tag": ["foo", "bar"],
   "signal": [[1, 2, 3, 4, 5], [3, 4, 1, 2, 5]],
   "sensor": [[[1, 2], [3, 4]], [[4, 5], [6, 8]]]
 }
}

Note, inputs is a JSON object and not a list like instances (used in the row representation). Also, all the named inputs are specified together, as opposed to unrolling them into individual rows done in the row format described previously. This makes the representation compact (but maybe less readable).

Response format

The predict request returns a JSON object in the response body.

{
  "predictions": <value>|<(nested)list>|<list-of-objects>
}

If the output of the model contains only one named tensor, we omit the name and have the predictions key map to a list of scalar or list values. If the model outputs multiple named tensors, we output a list of objects instead, similar to the request in the row format mentioned above.

{
  "outputs": <value>|<(nested)list>|<object>
}

If the output of the model contains only one named tensor, we omit the name and have the outputs key map to a list of scalar or list values. If the model outputs multiple named tensors, we output an object instead. Each key of this object corresponds to a named output tensor. The format is similar to the request in column format mentioned above.

Authentication

Basic Authentication

You can secure your deployment with basic authentication, i.e. a username and password. You can add basic authentication to any deployment when you create or update it, via the Web UI or the CLI.

To do so via the CLI, simply append the authUsername and authPassword parameters, e.g. --authUsername <username> --authPassword <password> to your gradient deployments create ... or gradient deployments update ... command.

Then, to authenticate against a secured deployment's API, the request must contain the following header: Authorization: Basic <base64-encoded-username:password> , where you supply the Base64-encoded version of the string <username>:<password> . Base64 is an encoding technique that converts the username and password into a set of 64 characters to ensure safe transmission.

The Base64 encoding method does not require cookies, session IDs, login pages, or other special solutions; because it uses the HTTP request header itself, there’s no need for handshakes or other complex response systems.

For example, if your username:password is the string literal pineapple:fanatic, then you would supply the request header as follows:

Authorization: Basic cGluZWFwcGxlOmZhbmF0aWM=

Example Python REST API Client

def make_vector(image):
    vector = []
    for item in image.tolist():
        vector.extend(item)
    return vector


def make_prediction_request(image, prediction_url):
    vector = make_vector(image)
    json = {
        "inputs": [vector]
    }
    response = requests.post(prediction_url, json=json)

    print('HTTP Response %s' % response.status_code)
    print(response.text)

If successful, returns a JSON representation of a protobuf.

This API closely follows the Classify and Regress methods of the gRPC API.

<value> is a JSON number (whole or decimal) or string, and <list> is a list of such values. See the section below for details on how to represent a binary (stream of bytes) value. This format is similar to gRPC's ClassificationRequest and RegressionRequest protos. Both versions accept a list of objects.

This API closely follows the gRPC API.

This format is similar to PredictRequest proto of gRPC API and the . Use this format if all named input tensors have the same 0-th dimension. If they don't, use the columnar format described later below.

Use this format to specify your input tensors, if individual named inputs do not have the same 0-th dimension or you want a more compact representation. This format is similar to the inputs field of the gRPC request.

A request in has a response formatted as follows:

A request in has a response formatted as follows:

Our has an example of a REST client that serves as a quick showcase of using a prediction endpoint, reproduced below:

TensorFlow ModelServer
GetModelStatusResponse
GetModelMetadataResponse
PredictionService
Encoding binary values
Example
PredictionService.Predict
CMLE predict API
Predict
row format
columnar format
mnist-sample repository
https://github.com/Paperspace/mnist-sample/blob/master/serving_rest_client_test.py