Model serving infrastructure

Supports deploying TensorFlow, PyTorch, sklearn and other models
Ensures high availability with availability zones and automated instance restarts
Autoscales to handle production workloads with support for overprovisioning
Runs inference workloads on on-demand instances or spot instances
Supports realtime and batch workloads

Reproducible deployments

Package dependencies, code, and configuration for reproducible deployments
Configure compute, autoscaling, and networking for each API
Deploy custom Docker images or use the pre-built defaults
Configure traffic splitting for A/B testing
Test locally before deploying to a cluster

Scalable machine learning APIs

Scale to handle production workloads with request-based autoscaling
Stream performance metrics and logs to any monitoring tool
Serve many models efficiently with multi-model caching
Use rolling updates to update APIs without downtime
Integrate with your data science platform or CI/CD system

Subscribe for updates