Compute resource requests in Cortex follow the syntax and meaning of compute resources in Kubernetes.

For example:

- kind: api
cpu: 2
mem: "1Gi"
gpu: 1

CPU and memory requests in Cortex correspond to compute resource requests in Kubernetes. In the example above, the API service will only be scheduled once 2 CPUs, 1Gi of memory, and 1 GPU are available, and the job will be guaranteed to have access to those resources throughout it's execution. In some cases, a Cortex compute resource request can be (or may default to) Null.


One unit of CPU corresponds to one virtual CPU on AWS. Fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix (0.2 and 200m are equivalent).


One unit of memory is one byte. Memory can be expressed as an integer or by using one of these suffixes: K, M, G, T (or their power-of two counterparts: Ki, Mi, Gi, Ti). For example, the following values represent roughly the same memory: 128974848, 129e6, 129M, 123Mi.


  1. Make sure your AWS account is subscribed to the EKS-optimized AMI with GPU Support.

  2. You may need to file an AWS support ticket to incease the limit for your desired instance type.

  3. Set CORTEX_NODE_TYPE to an AWS GPU instance (e.g. p2.xlarge) before installing Cortex.

  4. Note that one unit of GPU corresponds to one virtual GPU on AWS. Fractional requests are not allowed.