Compute resource requests in Cortex follow the syntax and meaning of compute resources in Kubernetes.
- kind: api...compute:cpu: 2mem: "1Gi"gpu: 1
CPU and memory requests in Cortex correspond to compute resource requests in Kubernetes. In the example above, the API service will only be scheduled once 2 CPUs, 1Gi of memory, and 1 GPU are available, and the job will be guaranteed to have access to those resources throughout it's execution. In some cases, a Cortex compute resource request can be (or may default to)
One unit of CPU corresponds to one virtual CPU on AWS. Fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix (
200m are equivalent).
One unit of memory is one byte. Memory can be expressed as an integer or by using one of these suffixes:
T (or their power-of two counterparts:
Ti). For example, the following values represent roughly the same memory:
Make sure your AWS account is subscribed to the EKS-optimized AMI with GPU Support.
You may need to file an AWS support ticket to incease the limit for your desired instance type.
Set CORTEX_NODE_TYPE to an AWS GPU instance (e.g. p2.xlarge) before installing Cortex.
Note that one unit of GPU corresponds to one virtual GPU on AWS. Fractional requests are not allowed.