cortex get), there are a few possible causes. Here are some things to check:cortex logs API_NAMEmax_instances for your clustermax_instances (either from the command prompts or via a cluster configuration file, e.g. cluster.yaml). If your cluster already has min_instances running instances, additional instances cannot be created and APIs may not be able to deploy, scale, or update.max_instances by running cortex cluster info (or cortex cluster info --config cluster.yaml if you have a cluster configuration file).max_instances by running cortex cluster configure (or by modifying max_instances in your cluster configuration file and running cortex cluster configure --config cluster.yaml).

on_demand_backup set to true, it is also possible that AWS has run out of spot instances for your requested instance type and region. You can enable on_demand_backup to allow Cortex to fall back to on-demand instances when spot instances are unavailable, or you can try adding additional alternative instance types in instance_distribution. See our spot documentation.max_instances to 1, or your AWS account limits you to a single g4dn.xlarge instance (i.e. your G instance vCPU limit is 4). You have an API running which requested 1 GPU. When you update your API via cortex deploy, Cortex attempts to deploy the updated version, and will only take down the old version once the new one is running. In this case, since there is no GPU available on the single running instance (it's taken by the old version of your API), the new version of your API requests a new instance to run on. Normally this will be ok (it might just take a few minutes since a new instance has to spin up): the new instance will become live, the new API replica will run on it, once it starts up successfully the old replica will be terminated, and eventually the old instance will spin down. In this case, however, the new version gets stuck because the second instance cannot be created, and the first instance cannot be freed up until the new version is running.cortex.yaml): set max_surge to 0 (in the update_strategy configuration). E.g.: