The cortex get and cortex get API_NAME commands display the request time (averaged over the past 2 weeks) and response code counts (summed over the past 2 weeks) for your APIs:
cortex get​env api status up-to-date requested last update avg request 2XXcortex iris-classifier live 1 1 17m 24ms 1223cortex text-generator live 1 1 8m 180ms 433cortex image-classifier-resnet50 live 2 2 1h 32ms 1121126
The cortex get API_NAME command also provides a link to a Grafana dashboard:
Panel | Description | Note |
Request Rate | Request rate, computed over every minute, of an API | ​ |
In Flight Request | Active in-flight requests for an API. | In-flight requests are recorded every 10 seconds, which will correspond to the minimum resolution. |
Active Replicas | Active replicas for an API | ​ |
2XX Responses | Request rate, computed over a minute, for responses with status code 2XX of an API | ​ |
4XX Responses | Request rate, computed over a minute, for responses with status code 4XX of an API | ​ |
5XX Responses | Request rate, computed over a minute, for responses with status code 5XX of an API | ​ |
p99 Latency | 99th percentile latency, computed over a minute, for an API | Value might not be accurate because the histogram buckets are not dynamically set. |
p90 Latency | 90th percentile latency, computed over a minute, for an API | Value might not be accurate because the histogram buckets are not dynamically set. |
p50 Latency | 50th percentile latency, computed over a minute, for an API | Value might not be accurate because the histogram buckets are not dynamically set. |
Average Latency | Average latency, computed over a minute, for an API | ​ |
It is possible to export custom user metrics by adding the metrics_client argument to the handler constructor. Below there is an example of how to use the metrics client. The implementation is similar to all handler types.
class Handler:def __init__(self, config, metrics_client):self.metrics = metrics_client​def handle_post(self, payload):# --- my handler code here ---result = ...​# increment a counter with name "my_metric" and tags model:v1self.metrics.increment(metric="my_counter", value=1, tags={"model": "v1"})​# set the value for a gauge with name "my_gauge" and tags model:v1self.metrics.gauge(metric="my_gauge", value=42, tags={"model": "v1"})​# set the value for an histogram with name "my_histogram" and tags model:v1self.metrics.histogram(metric="my_histogram", value=100, tags={"model": "v1"})
Note: The metrics client uses the UDP protocol to push metrics, so if it fails during a metrics push, no exception is thrown.