You can deploy ONNX models as web services by defining a class that implements Cortex's ONNX Predictor interface.
- name: <string> # API name (required)endpoint: <string> # the endpoint for the API (default: /<api_name>)predictor:type: onnxpath: <string> # path to a python file with an ONNXPredictor class definition, relative to the Cortex root (required)model: <string> # S3 path to an exported model (e.g. s3://my-bucket/exported_model.onnx) (required)config: <string: value> # dictionary passed to the constructor of a Predictor (optional)python_path: <string> # path to the root of your Python folder that will be appended to PYTHONPATH (default: folder containing cortex.yaml)env: <string: string> # dictionary of environment variablestracker:key: <string> # the JSON key in the response to track (required if the response payload is a JSON object)model_type: <string> # model type, must be "classification" or "regression" (required)compute:min_replicas: <int> # minimum number of replicas (default: 1)max_replicas: <int> # maximum number of replicas (default: 100)init_replicas: <int> # initial number of replicas (default: <min_replicas>)target_cpu_utilization: <int> # CPU utilization threshold (as a percentage) to trigger scaling (default: 80)cpu: <string | int | float> # CPU request per replica (default: 200m)gpu: <int> # GPU request per replica (default: 0)mem: <string> # memory request per replica (default: Null)max_surge: <string | int> # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)max_unavailable: <string | int> # maximum number of replicas that can be unavailable during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
See packaging ONNX models for information about exporting ONNX models.
- name: my-apipredictor:type: onnxpath: predictor.pymodel: s3://my-bucket/my-model.onnxcompute:gpu: 1
You can log information about each request by adding a
?debug=true parameter to your requests. This will print:
The value after running the
An ONNX Predictor is a Python class that describes how to serve your ONNX model to make predictions.
Cortex provides an
onnx_client and a config object to initialize your implementation of the ONNX Predictor class. The
onnx_client is an instance of ONNXClient that manages an ONNX Runtime session and helps make predictions using your model. Once your implementation of the ONNX Predictor class has been initialized, the replica is available to serve requests. Upon receiving a request, your implementation's
predict() function is called with the JSON payload and is responsible for returning a prediction or batch of predictions. Your
predict() function should call
onnx_client.predict() to make an inference against your exported ONNX model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your
predict() function as well.
class ONNXPredictor:def __init__(self, onnx_client, config):"""Called once before the API becomes available. Setup for model serving such as downloading/initializing vocabularies can be done here. Required.Args:onnx_client: ONNX client which can be used to make predictions.config: Dictionary passed from API configuration (if specified)."""passdef predict(self, payload):"""Called once per request. Runs preprocessing of the request payload, inference, and postprocessing of the inference output. Required.Args:payload: The parsed JSON request payload.Returns:Prediction or a batch of predictions."""
import numpy as nplabels = ["setosa", "versicolor", "virginica"]class ONNXPredictor:def __init__(self, onnx_client, config):self.client = onnx_clientdef predict(self, payload):model_input = [payload["sepal_length"],payload["sepal_width"],payload["petal_length"],payload["petal_width"],]prediction = self.client.predict(model_input)predicted_class_id = predictionreturn labels[predicted_class_id]
The following packages have been pre-installed and can be used in your implementations:
Learn how to install additional packages here.