Model serving for developers

Cortex makes it simple to deploy
machine learning models to production

What you can do with Cortex

Write APIs in Python

Define any real-time or batch inference pipeline as simple Python APIs, regardless of framework or hardware.

Configure infrastructure in YAML

Define autoscaling behavior, monitoring, cluster resources, update strategies, and more—or use Cortex's defaults.

Worry less about DevOps

You don't need to bring your own cluster or containerize your models—Cortex automates your cloud infrastructure

Scale to handle production traffic

Handle traffic with request-based autoscaling. Limit production costs with spot instances and multi-model APIs.
How AI Dungeon scaled to 1,000,000 users with Cortex
Case study

Integrate with your stack

Integrate Cortex with any data science platform and CI/CD tooling, without changing your workflow.

Run on your AWS account

Run Cortex on your AWS account (GCP coming soon), maintaining control over resource utilization and data access.

Get started

Open Source

Free
Run Cortex on your AWS account
Install
Autoscaling
Spot instances
Rolling updates
Log streaming
Prediction monitoring
Support for all ML frameworks

Managed

Coming soon
Get Cortex as a service
Join the waitlist
Fully managed
Team workflows
Admin dashboard
Advanced security
Customer success