Gradient + TensorFlow Serving
How to perform inference using a Deployment's TF Serving RESTful API
When you specify the default Deployment type via --deploymentType TFServing
, Gradient deploys your Model using a TensorFlow ModelServer. This Deployment comes with a built-in RESTful API.
Request and Response Formats
The request and response to/from a Deployment's RESTful API is a JSON object.
The composition of this object depends on the request type or verb.
In case of an error, all API endpoints will return a JSON object in the response body with error
as the key and the error message as the value:
Model Status API
Returns the status of a model in the ModelServer.
URL
/versions/${MODEL_VERSION}
is optional. If omitted, status for all versions is returned in the response.
Response format
If successful, returns a JSON representation of a GetModelStatusResponse
protobuf.
Model Metadata API
Returns the metadata of a Model in the ModelServer.
URL
/versions/${MODEL_VERSION}
is optional. If omitted, the Model metadata for the latest version is returned in the response.
Response format
If successful, returns a JSON representation of a GetModelMetadataResponse
protobuf.
Classify and Regress API
This API closely follows the Classify
and Regress
methods of the PredictionService
gRPC API.
URL
/versions/${MODEL_VERSION}
is optional. If omitted, the latest version is used.
Request format
The request body for the classify
and regress
APIs must be a JSON object formatted as follows:
<value>
is a JSON number (whole or decimal) or string, and <list>
is a list of such values. See the Encoding binary values section below for details on how to represent a binary (stream of bytes) value. This format is similar to gRPC's ClassificationRequest
and RegressionRequest
protos. Both versions accept a list of Example
objects.
Response format
A classify
request returns a JSON object in the response body, formatted as follows:
<label>
is a string (which can be an empty string ""
if the model does not have a label associated with the score). <score>
is a decimal (floating point) number.
The regress
request returns a JSON object in the response body, formatted as follows:
<value>
is a decimal number.
Users of gRPC API will notice the similarity of this format with ClassificationResponse
and RegressionResponse
protos.
Predict API
This API closely follows the PredictionService.Predict
gRPC API.
URL
/versions/${MODEL_VERSION}
is optional. If omitted the latest version is used.
Request format
The request body for the predict
API must be a JSON object formatted as follows:
Specifying input tensors in row format
This format is similar to PredictRequest
proto of gRPC API and the CMLE predict API. Use this format if all named input tensors have the same 0-th dimension. If they don't, use the columnar format described later below.
In the row format, inputs are keyed to the instances
key in the JSON request.
When there is only one named input, specify the value of the instances
key to be the value of the input:
Tensors are expressed naturally in nested notation since there is no need to manually flatten the list.
For multiple named inputs, each item is expected to be an object containing input name/tensor key-value pairs, one for each named input. As an example, the following is a request with two instances, each with a set of three named input tensors:
Note, each named input ("tag", "signal", "sensor") is implicitly assumed have same 0-th dimension (two in above example, as there are two objects in the instances list). If you have named inputs that have different 0-th dimensions, use the columnar format described below.
Specifying input tensors in column format
Use this format to specify your input tensors, if individual named inputs do not have the same 0-th dimension or you want a more compact representation. This format is similar to the inputs
field of the gRPC Predict
request.
In the columnar format, inputs are keyed to the inputs key in the JSON request.
The value for inputs key can either a single input tensor or a map of input name/tensors key-value pairs (listed in their natural nested form). Each input can have an arbitrary shape and does not need to share the same 0-th dimension (aka batch size) as required by the row format described above.
Columnar representation of the previous example is as follows:
Note, inputs is a JSON object and not a list like instances (used in the row representation). Also, all the named inputs are specified together, as opposed to unrolling them into individual rows done in the row format described previously. This makes the representation compact (but maybe less readable).
Response format
The predict
request returns a JSON object in the response body.
A request in row format has a response formatted as follows:
If the output of the model contains only one named tensor, we omit the name and have the predictions
key map to a list of scalar or list values. If the model outputs multiple named tensors, we output a list of objects instead, similar to the request in the row format mentioned above.
A request in columnar format has a response formatted as follows:
If the output of the model contains only one named tensor, we omit the name and have the outputs
key map to a list of scalar or list values. If the model outputs multiple named tensors, we output an object instead. Each key of this object corresponds to a named output tensor. The format is similar to the request in column format mentioned above.
Authentication
Basic Authentication
You can secure your deployment with basic authentication, i.e. a username and password. You can add basic authentication to any deployment when you create or update it, via the Web UI or the CLI.
To do so via the CLI, simply append the authUsername
and authPassword
parameters, e.g. --authUsername <username> --authPassword <password>
to your gradient deployments create ...
or gradient deployments update ...
command.
Then, to authenticate against a secured deployment's API, the request must contain the following header: Authorization: Basic <base64-encoded-username:password>
, where you supply the Base64-encoded version of the string <username>:<password>
. Base64 is an encoding technique that converts the username and password into a set of 64 characters to ensure safe transmission.
The Base64 encoding method does not require cookies, session IDs, login pages, or other special solutions; because it uses the HTTP request header itself, there’s no need for handshakes or other complex response systems.
For example, if your username:password
is the string literal pineapple:fanatic
, then you would supply the request header as follows:
Example Python REST API Client
Our mnist-sample repository has an example of a REST client that serves as a quick showcase of using a prediction endpoint, reproduced below:
Last updated