Cortex autoscales your web services based on your configuration.
Cortex adjusts the number of replicas that are serving predictions by monitoring the compute resource usage of each API. The number of replicas will be at least
min_replicas and no more than
Cortex spins up and down nodes based on the aggregate resource requests of all APIs. The number of nodes will be at least
min_instances and no more than
max_instances (configured during installation and modifiable via
cortex cluster update or the AWS console).