Overview

Gradient Deployments enable a hassle-free, automatic “push to deploy” option for any trained model. These allow ML practitioners to quickly validate “end-to-end” services, from R&D to production.

This section of the documentation covers our previous generation of Gradient. For the current version go to Gradient Next.

Deploy any model as a high-performance, low-latency micro-service with a RESTful API. Easily monitor, scale, and version deployments. Deployments take a trained model and expose them as a persistent service at a known, secure URL endpoint.

Gradient makes it easy to deploy your trained model into production so that you can start generating predictions for real-time or batch data. Just specify the type of instance and the autoscaling behavior and Gradient takes care of the rest. Gradient will launch the instances, deploy your model, and set up the secure HTTPS endpoint for your application.

Capabilities

  • Out-of-the-box integration with TensorFlow, ONNX, and TensorRT, as well as Flask for Custom models

  • A variety of GPU & CPU types to deploy on

  • Per second pay-as-you-go billing

  • Multi-instance deployments with automatic load balancing

  • A dedicated, secure endpoint URL per deployment

  • REST and gRPC endpoint options

  • Optimizations for low latency and high throughput inference

  • Accessible via the Gradient CLI, Web UI, or API, or from your own custom applications

Last updated