Autoscaling

Cortex autoscales your web services based on your configuration.
Autoscaling Replicas
Cortex adjusts the number of replicas that are serving predictions by monitoring the compute resource usage of each API. The number of replicas will be at least min_replicas and no more than max_replicas.
Autoscaling Nodes
Cortex spins up and down nodes based on the aggregate resource requests of all APIs. The number of nodes will be at least min_instances and no more than max_instances (configured during installation and modifiable via cortex cluster update or the AWS console).

Deployments - Previous

Request handlers

Next - Deployments

Prediction monitoring

Last updated 2 weeks ago