Push Metrics
How to push metrics into Gradient Metrics system
Last updated
How to push metrics into Gradient Metrics system
Last updated
In order to push metrics from your Experiment or Deployment code, you must import gradient-utils from gradient package:
pip install gradient-utils
Four types of metrics are offered: Counter, Gauge, Summary, and Histogram.
Counters go up, and reset when the process restarts.
from gradient_utils.metrics import Counter
c = Counter('my_failures', 'Description of counter')
c.inc() # Increment by 1
c.inc(1.6) # Increment by given value
If there is a suffix of _total
on the metric name, it will be removed. When exposing the time series for counter, a _total
suffix will be added. This is for compatibility between OpenMetrics and the Prometheus text format, as OpenMetrics requires the _total
suffix.
There are utilities to count exceptions raised:
@c.count_exceptions()
def f():
pass
with c.count_exceptions():
pass
# Count only one type of exception
with c.count_exceptions(ValueError):
pass
Gauges can go up and down.
from gradient_utils.metrics import Gauge
g = Gauge('my_inprogress_requests', 'Description of gauge')
g.inc() # Increment by 1
g.dec(10) # Decrement by given value
g.set(4.2) # Set to a given value
There are utilities for common use cases:
g.set_to_current_time() # Set to current unixtime
# Increment when entered, decrement when exited.
@g.track_inprogress()
def f():
pass
with g.track_inprogress():
pass
A Gauge can also take its value from a callback:
d = Gauge('data_objects', 'Number of objects')
my_dict = {}
d.set_function(lambda: len(my_dict))
Summaries track the size and number of events.
from gradient_utils.metrics import Summary
s = Summary('request_latency_seconds', 'Description of summary')
s.observe(4.7) # Observe 4.7 (seconds in this case)
There are utilities for timing code:
@s.time()
def f():
pass
with s.time():
pass
The Python client doesn't store or expose quantile information at this time.
Histograms track the size and number of events in buckets. This allows for aggregatable calculation of quantiles.
from gradient_utils.metrics import Histogram
h = Histogram('request_latency_seconds', 'Description of histogram')
h.observe(4.7) # Observe 4.7 (seconds in this case)
The default buckets are intended to cover a typical web/rpc request from milliseconds to seconds. They can be overridden by passing buckets
keyword argument to Histogram
.
There are utilities for timing code:
@h.time()
def f():
pass
with h.time():
pass
All metrics can have labels, allowing grouping of related time series.
Taking a counter as an example:
from gradient_utils.metrics import Counter
c = Counter('my_requests_total', 'HTTP Failures', ['method', 'endpoint'])
c.labels('get', '/').inc()
c.labels('post', '/submit').inc()
Labels can also be passed as keyword-arguments:
from gradient_utils.metrics import Counter
c = Counter('my_requests_total', 'HTTP Failures', ['method', 'endpoint'])
c.labels(method='get', endpoint='/').inc()
c.labels(method='post', endpoint='/submit').inc()
from gradient_utils.metrics import MetricsLogger
logger = MetricsLogger(grouping_key={'ProjectA': 'SomeLabel'})
logger.add_gauge("Gauge")
logger.add_counter("Counter")
while datetime.now() <= endAt:
randNum = randint(1, 100)
logger["Gauge"] = 5
logger["Gauge"].set(randNum)
logger["Counter"].inc()
logger.push_metrics()
You have to remember to import MetricsLogger:
from gradient_utils.metrics import MetricsLogger
Gradient uses Prometheus behind the scenes. See the Prometheus documentation on metric types and instrumentation best practices the best practices on naming and labels on how to use them.