In order to push metrics from your Experiment or Deployment code, you must import gradient-utils from gradient package:
Installing Gradient Utils
pip install gradient-utils
Instrumenting
Four types of metrics are offered: Counter, Gauge, Summary, and Histogram.
Counter
Counters go up, and reset when the process restarts.
from gradient_utils.metrics import Counter
c = Counter('my_failures', 'Description of counter')
c.inc() # Increment by 1
c.inc(1.6) # Increment by given value
If there is a suffix of _total on the metric name, it will be removed. When exposing the time series for counter, a _total suffix will be added. This is for compatibility between OpenMetrics and the Prometheus text format, as OpenMetrics requires the _total suffix.
There are utilities to count exceptions raised:
@c.count_exceptions()
def f():
pass
with c.count_exceptions():
pass
# Count only one type of exception
with c.count_exceptions(ValueError):
pass
Gauge
Gauges can go up and down.
from gradient_utils.metrics import Gauge
g = Gauge('my_inprogress_requests', 'Description of gauge')
g.inc() # Increment by 1
g.dec(10) # Decrement by given value
g.set(4.2) # Set to a given value
There are utilities for common use cases:
g.set_to_current_time() # Set to current unixtime
# Increment when entered, decrement when exited.
@g.track_inprogress()
def f():
pass
with g.track_inprogress():
pass
A Gauge can also take its value from a callback:
d = Gauge('data_objects', 'Number of objects')
my_dict = {}
d.set_function(lambda: len(my_dict))
Summary
Summaries track the size and number of events.
from gradient_utils.metrics import Summary
s = Summary('request_latency_seconds', 'Description of summary')
s.observe(4.7) # Observe 4.7 (seconds in this case)
There are utilities for timing code:
@s.time()
def f():
pass
with s.time():
pass
The Python client doesn't store or expose quantile information at this time.
Histogram
Histograms track the size and number of events in buckets. This allows for aggregatable calculation of quantiles.
from gradient_utils.metrics import Histogram
h = Histogram('request_latency_seconds', 'Description of histogram')
h.observe(4.7) # Observe 4.7 (seconds in this case)
The default buckets are intended to cover a typical web/rpc request from milliseconds to seconds. They can be overridden by passing buckets keyword argument to Histogram.
There are utilities for timing code:
@h.time()
def f():
pass
with h.time():
pass
Labels
All metrics can have labels, allowing grouping of related time series.
Taking a counter as an example:
from gradient_utils.metrics import Counter
c = Counter('my_requests_total', 'HTTP Failures', ['method', 'endpoint'])
c.labels('get', '/').inc()
c.labels('post', '/submit').inc()
Labels can also be passed as keyword-arguments:
from gradient_utils.metrics import Counter
c = Counter('my_requests_total', 'HTTP Failures', ['method', 'endpoint'])
c.labels(method='get', endpoint='/').inc()
c.labels(method='post', endpoint='/submit').inc()