Gradient + TensorFlow Serving

How to perform inference using a Deployment's TF Serving RESTful API

When you specify the default Deployment type via --deploymentType TFServing, Gradient deploys your Model using a TensorFlow ModelServer. This Deployment comes with a built-in RESTful API.

Request and Response Formats

The request and response to/from a Deployment's RESTful API is a JSON object.

The composition of this object depends on the request type or verb.

In case of an error, all API endpoints will return a JSON object in the response body with error as the key and the error message as the value:

{
  "error": <error message string>
}

Model Status API

Returns the status of a model in the ModelServer.

URL

GET https://services.paperspace.io/model-serving/your-deployment-id/versions/${MODEL_VERSION}

/versions/${MODEL_VERSION} is optional. If omitted, status for all versions is returned in the response.

Response format

If successful, returns a JSON representation of a GetModelStatusResponse protobuf.

Model Metadata API

Returns the metadata of a Model in the ModelServer.

URL

GET https://services.paperspace.io/model-serving/your-deployment-id/versions/${MODEL_VERSION}/metadata

/versions/${MODEL_VERSION} is optional. If omitted, the Model metadata for the latest version is returned in the response.

Response format

If successful, returns a JSON representation of a GetModelMetadataResponse protobuf.

Classify and Regress API

This API closely follows the Classify and Regress methods of the PredictionService gRPC API.

URL

POST https://services.paperspace.io/model-serving/your-deployment-id/versions/${MODEL_VERSION}:(classify|regress)

/versions/${MODEL_VERSION} is optional. If omitted, the latest version is used.

Request format

The request body for the classify and regress APIs must be a JSON object formatted as follows:

{
  // Optional: serving signature to use.
  // If unspecified, the default serving signature is used.
  "signature_name": <string>,

  // Optional: Common context shared by all examples.
  // Features that appear here MUST NOT appear in examples (below).
  "context": {
    "<feature_name3>": <value>|<list>
    "<feature_name4>": <value>|<list>
  },

  // List of Example objects
  "examples": [
    {
      // Example 1
      "<feature_name1>": <value>|<list>,
      "<feature_name2>": <value>|<list>,
      ...
    },
    {
      // Example 2
      "<feature_name1>": <value>|<list>,
      "<feature_name2>": <value>|<list>,
      ...
    }
    ...
  ]
}

<value> is a JSON number (whole or decimal) or string, and <list> is a list of such values. See the Encoding binary values section below for details on how to represent a binary (stream of bytes) value. This format is similar to gRPC's ClassificationRequest and RegressionRequest protos. Both versions accept a list of Example objects.

Response format

A classify request returns a JSON object in the response body, formatted as follows:

{
  "result": [
    // List of class label/score pairs for first Example (in request)
    [ [<label1>, <score1>], [<label2>, <score2>], ... ],

    // List of class label/score pairs for next Example (in request)
    [ [<label1>, <score1>], [<label2>, <score2>], ... ],
    ...
  ]
}

<label> is a string (which can be an empty string "" if the model does not have a label associated with the score). <score> is a decimal (floating point) number.

The regress request returns a JSON object in the response body, formatted as follows:

{
  // One regression value for each example in the request in the same order.
  "result": [ <value1>, <value2>, <value3>, ...]
}

<value> is a decimal number.

Users of gRPC API will notice the similarity of this format with ClassificationResponse and RegressionResponseprotos.

Predict API

This API closely follows the PredictionService.Predict gRPC API.

URL

POST https://services.paperspace.io/model-serving/your-deployment-id/versions/${MODEL_VERSION}:predict

/versions/${MODEL_VERSION} is optional. If omitted the latest version is used.

Request format

The request body for the predict API must be a JSON object formatted as follows:

{
  // (Optional) Serving signature to use.
  // If unspecified, the default serving signature is used.
  "signature_name": <string>,

  // Input Tensors in row ("instances") or columnar ("inputs") format.
  // A request can have either of them but NOT both.
  "instances": <value>|<(nested)list>|<list-of-objects>
  "inputs": <value>|<(nested)list>|<object>
}

Specifying input tensors in row format

This format is similar to PredictRequest proto of gRPC API and the CMLE predict API. Use this format if all named input tensors have the same 0-th dimension. If they don't, use the columnar format described later below.

In the row format, inputs are keyed to the instances key in the JSON request.

When there is only one named input, specify the value of the instances key to be the value of the input:

{
  // List of 3 scalar tensors.
  "instances": [ "foo", "bar", "baz" ]
}

{
  // List of 2 tensors each of [1, 2] shape
  "instances": [ [[1, 2]], [[3, 4]] ]
}

Tensors are expressed naturally in nested notation since there is no need to manually flatten the list.

For multiple named inputs, each item is expected to be an object containing input name/tensor key-value pairs, one for each named input. As an example, the following is a request with two instances, each with a set of three named input tensors:

{
 "instances": [
   {
     "tag": "foo",
     "signal": [1, 2, 3, 4, 5],
     "sensor": [[1, 2], [3, 4]]
   },
   {
     "tag": "bar",
     "signal": [3, 4, 1, 2, 5]],
     "sensor": [[4, 5], [6, 8]]
   }
 ]
}

Note, each named input ("tag", "signal", "sensor") is implicitly assumed have same 0-th dimension (two in above example, as there are two objects in the instances list). If you have named inputs that have different 0-th dimensions, use the columnar format described below.

Specifying input tensors in column format

Use this format to specify your input tensors, if individual named inputs do not have the same 0-th dimension or you want a more compact representation. This format is similar to the inputs field of the gRPC Predict request.

In the columnar format, inputs are keyed to the inputs key in the JSON request.

The value for inputs key can either a single input tensor or a map of input name/tensors key-value pairs (listed in their natural nested form). Each input can have an arbitrary shape and does not need to share the same 0-th dimension (aka batch size) as required by the row format described above.

Columnar representation of the previous example is as follows:

{
 "inputs": {
   "tag": ["foo", "bar"],
   "signal": [[1, 2, 3, 4, 5], [3, 4, 1, 2, 5]],
   "sensor": [[[1, 2], [3, 4]], [[4, 5], [6, 8]]]
 }
}

Note, inputs is a JSON object and not a list like instances (used in the row representation). Also, all the named inputs are specified together, as opposed to unrolling them into individual rows done in the row format described previously. This makes the representation compact (but maybe less readable).

Response format

The predict request returns a JSON object in the response body.

A request in row format has a response formatted as follows:

{
  "predictions": <value>|<(nested)list>|<list-of-objects>
}

If the output of the model contains only one named tensor, we omit the name and have the predictions key map to a list of scalar or list values. If the model outputs multiple named tensors, we output a list of objects instead, similar to the request in the row format mentioned above.

A request in columnar format has a response formatted as follows:

{
  "outputs": <value>|<(nested)list>|<object>
}

If the output of the model contains only one named tensor, we omit the name and have the outputs key map to a list of scalar or list values. If the model outputs multiple named tensors, we output an object instead. Each key of this object corresponds to a named output tensor. The format is similar to the request in column format mentioned above.

Authentication

Basic Authentication

You can secure your deployment with basic authentication, i.e. a username and password. You can add basic authentication to any deployment when you create or update it, via the Web UI or the CLI.

To do so via the CLI, simply append the authUsername and authPassword parameters, e.g. --authUsername <username> --authPassword <password> to your gradient deployments create ... or gradient deployments update ... command.

Then, to authenticate against a secured deployment's API, the request must contain the following header: Authorization: Basic <base64-encoded-username:password> , where you supply the Base64-encoded version of the string <username>:<password> . Base64 is an encoding technique that converts the username and password into a set of 64 characters to ensure safe transmission.

The Base64 encoding method does not require cookies, session IDs, login pages, or other special solutions; because it uses the HTTP request header itself, there’s no need for handshakes or other complex response systems.

For example, if your username:password is the string literal pineapple:fanatic, then you would supply the request header as follows:

Authorization: Basic cGluZWFwcGxlOmZhbmF0aWM=

Example Python REST API Client

Our mnist-sample repository has an example of a REST client that serves as a quick showcase of using a prediction endpoint, reproduced below:

def make_vector(image):
    vector = []
    for item in image.tolist():
        vector.extend(item)
    return vector


def make_prediction_request(image, prediction_url):
    vector = make_vector(image)
    json = {
        "inputs": [vector]
    }
    response = requests.post(prediction_url, json=json)

    print('HTTP Response %s' % response.status_code)
    print(response.text)

Last updated