Gradient + TensorFlow Serving
How to perform inference using a Deployment's TF Serving RESTful API
Last updated
How to perform inference using a Deployment's TF Serving RESTful API
Last updated
When you specify the default Deployment type via --deploymentType TFServing
, Gradient deploys your Model using a . This Deployment comes with a built-in RESTful API.
The request and response to/from a Deployment's RESTful API is a JSON object.
The composition of this object depends on the request type or verb.
In case of an error, all API endpoints will return a JSON object in the response body with error
as the key and the error message as the value:
Returns the status of a model in the ModelServer.
/versions/${MODEL_VERSION}
is optional. If omitted, status for all versions is returned in the response.
If successful, returns a JSON representation of a protobuf.
Returns the metadata of a Model in the ModelServer.
/versions/${MODEL_VERSION}
is optional. If omitted, the Model metadata for the latest version is returned in the response.
/versions/${MODEL_VERSION}
is optional. If omitted, the latest version is used.
The request body for the classify
and regress
APIs must be a JSON object formatted as follows:
A classify
request returns a JSON object in the response body, formatted as follows:
<label>
is a string (which can be an empty string ""
if the model does not have a label associated with the score). <score>
is a decimal (floating point) number.
The regress
request returns a JSON object in the response body, formatted as follows:
<value>
is a decimal number.
Users of gRPC API will notice the similarity of this format with ClassificationResponse
and RegressionResponse
protos.
/versions/${MODEL_VERSION}
is optional. If omitted the latest version is used.
The request body for the predict
API must be a JSON object formatted as follows:
Specifying input tensors in row format
In the row format, inputs are keyed to the instances
key in the JSON request.
When there is only one named input, specify the value of the instances
key to be the value of the input:
Tensors are expressed naturally in nested notation since there is no need to manually flatten the list.
For multiple named inputs, each item is expected to be an object containing input name/tensor key-value pairs, one for each named input. As an example, the following is a request with two instances, each with a set of three named input tensors:
Note, each named input ("tag", "signal", "sensor") is implicitly assumed have same 0-th dimension (two in above example, as there are two objects in the instances list). If you have named inputs that have different 0-th dimensions, use the columnar format described below.
Specifying input tensors in column format
In the columnar format, inputs are keyed to the inputs key in the JSON request.
The value for inputs key can either a single input tensor or a map of input name/tensors key-value pairs (listed in their natural nested form). Each input can have an arbitrary shape and does not need to share the same 0-th dimension (aka batch size) as required by the row format described above.
Columnar representation of the previous example is as follows:
Note, inputs is a JSON object and not a list like instances (used in the row representation). Also, all the named inputs are specified together, as opposed to unrolling them into individual rows done in the row format described previously. This makes the representation compact (but maybe less readable).
The predict
request returns a JSON object in the response body.
If the output of the model contains only one named tensor, we omit the name and have the predictions
key map to a list of scalar or list values. If the model outputs multiple named tensors, we output a list of objects instead, similar to the request in the row format mentioned above.
If the output of the model contains only one named tensor, we omit the name and have the outputs
key map to a list of scalar or list values. If the model outputs multiple named tensors, we output an object instead. Each key of this object corresponds to a named output tensor. The format is similar to the request in column format mentioned above.
Basic Authentication
You can secure your deployment with basic authentication, i.e. a username and password. You can add basic authentication to any deployment when you create or update it, via the Web UI or the CLI.
To do so via the CLI, simply append the authUsername
and authPassword
parameters, e.g. --authUsername <username> --authPassword <password>
to your gradient deployments create ...
or gradient deployments update ...
command.
Then, to authenticate against a secured deployment's API, the request must contain the following header: Authorization: Basic <base64-encoded-username:password>
, where you supply the Base64-encoded version of the string <username>:<password>
. Base64 is an encoding technique that converts the username and password into a set of 64 characters to ensure safe transmission.
The Base64 encoding method does not require cookies, session IDs, login pages, or other special solutions; because it uses the HTTP request header itself, there’s no need for handshakes or other complex response systems.
For example, if your username:password
is the string literal pineapple:fanatic
, then you would supply the request header as follows:
If successful, returns a JSON representation of a protobuf.
This API closely follows the Classify
and Regress
methods of the gRPC API.
<value>
is a JSON number (whole or decimal) or string, and <list>
is a list of such values. See the section below for details on how to represent a binary (stream of bytes) value. This format is similar to gRPC's ClassificationRequest
and RegressionRequest
protos. Both versions accept a list of objects.
This API closely follows the gRPC API.
This format is similar to PredictRequest
proto of gRPC API and the . Use this format if all named input tensors have the same 0-th dimension. If they don't, use the columnar format described later below.
Use this format to specify your input tensors, if individual named inputs do not have the same 0-th dimension or you want a more compact representation. This format is similar to the inputs
field of the gRPC request.
A request in has a response formatted as follows:
A request in has a response formatted as follows:
Our has an example of a REST client that serves as a quick showcase of using a prediction endpoint, reproduced below: