Model serving at scale

Cortex is a platform for deploying, managing, and scaling machine learning in production.

Realtime and batch workloads

Run realtime inference, batch inference, and training jobs.

Any pipeline

Configure inference endpoints, model retraining, and A/B testing programmatically with Python or interactively with a CLI.

Any model

Cortex supports TensorFlow, PyTorch, ONNX and other models.

Built to scale

Cortex is built on top of Kubernetes to support large-scale machine learning workloads.

Easy to integrate

Export metrics and logs to your favorite monitoring tools.

Cortex Core

Model serving on your cloud infrastructure
Launch on AWS or GCP
Pay only for AWS or GCP resources
Cluster management
Realtime and batch workloads
Workload and cluster autoscaling
Spot or preemptible instances
Traffic splitting and A/B testing
Community support

Cortex Enterprise

Fully-managed model serving platform
Contact us for details
Everything in Core
User management
Admin dashboard
Multi-cloud clusters
Uptime SLAs
Enterprise security
Priority support