Compute resource requests in Cortex follow the syntax and meaning of compute resources in Kubernetes.
- kind: api...compute:cpu: 2mem: "1Gi"gpu: 1
CPU and memory requests in Cortex correspond to compute resource requests in Kubernetes. In the example above, the API will only be scheduled once 2 CPUs, 1Gi of memory, and 1 GPU are available on any instance, and the deployment will be guaranteed to have access to those resources throughout its execution. In some cases, resource requests can be (or may default to)
One unit of CPU corresponds to one virtual CPU on AWS. Fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix (
200m are equivalent).
One unit of memory is one byte. Memory can be expressed as an integer or by using one of these suffixes:
T (or their power-of two counterparts:
Ti). For example, the following values represent roughly the same memory:
Make sure your AWS account is subscribed to the EKS-optimized AMI with GPU Support.
You may need to file an AWS support ticket to incease the limit for your desired instance type.
Set instance type to an AWS GPU instance (e.g. p2.xlarge) when installing Cortex.
Note that one unit of GPU corresponds to one virtual GPU on AWS. Fractional requests are not allowed.