You can deploy models from any Python framework by defining a class that implements Cortex's Python Predictor interface. The class constructor is responsible for preparing the model for serving, downloading vocabulary files, etc. The
predict() class function is called on every request and is responsible for responding with a prediction.
In addition to supporting Python models via the Python Predictor interface, Cortex can serve the following exported model formats:
- name: <string> # API name (required)endpoint: <string> # the endpoint for the API (default: /<api_name>)predictor:type: pythonpath: <string> # path to a python file with a PythonPredictor class definition, relative to the Cortex root (required)config: <string: value> # dictionary passed to the constructor of a Predictor (optional)python_path: <string> # path to the root of your Python folder that will be appended to PYTHONPATH (default: folder containing cortex.yaml)env: <string: string> # dictionary of environment variablestracker:key: <string> # the JSON key in the response to track (required if the response payload is a JSON object)model_type: <string> # model type, must be "classification" or "regression" (required)compute:min_replicas: <int> # minimum number of replicas (default: 1)max_replicas: <int> # maximum number of replicas (default: 100)init_replicas: <int> # initial number of replicas (default: <min_replicas>)target_cpu_utilization: <int> # CPU utilization threshold (as a percentage) to trigger scaling (default: 80)cpu: <string | int | float> # CPU request per replica (default: 200m)gpu: <int> # GPU request per replica (default: 0)mem: <string> # memory request per replica (default: Null)max_surge: <string | int> # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)max_unavailable: <string | int> # maximum number of replicas that can be unavailable during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
- name: my-apipredictor:type: pythonpath: predictor.pycompute:gpu: 1
You can log information about each request by adding a
?debug=true parameter to your requests. This will print:
The value after running the
A Python Predictor is a Python class that describes how to initialize a model and use it to make a prediction.
The lifecycle of a replica starts with the initialization of the Python Predictor class defined in your implementation file. The constructor is responsible for downloading and initializing the model. It receives the config object, which is an arbitrary dictionary defined in the API configuration (e.g.
cortex.yaml) that can be used to pass in the path to the exported model, vocabularies, etc. After successfully initializing an instance of the Python Predictor class, the replica is available to serve requests. Upon receiving a request, the replica calls the
predict() function with the JSON payload. The
predict() function is responsible for returning a prediction or a batch of predictions. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your
# initialization code and variables can be declared here in global scopeclass PythonPredictor:def __init__(self, config):"""Called once before the API becomes available. Setup for model serving such as downloading/initializing the model or downloading vocabulary can be done here. Required.Args:config: Dictionary passed from API configuration (if specified)."""passdef predict(self, payload):"""Called once per request. Runs preprocessing of the request payload, inference, and postprocessing of the inference output. Required.Args:payload: The parsed JSON request payload.Returns:Prediction or a batch of predictions."""
import boto3from my_model import IrisNetlabels = ["setosa", "versicolor", "virginica"]class PythonPredictor:def __init__(self, config):# download the modelbucket, key = re.match("s3://(.+?)/(.+)", config["model"]).groups()s3 = boto3.client("s3")s3.download_file(bucket, key, "model.pth")# initialize the modelmodel = IrisNet()model.load_state_dict(torch.load(config['model']))model.eval()self.model = modeldef predict(self, payload):# Convert the request to a tensor and pass it into the modelinput_tensor = torch.FloatTensor([[payload["sepal_length"],payload["sepal_width"],payload["petal_length"],payload["petal_width"],]])# Run the predictionoutput = self.model(input_tensor)# Translate the model output to the corresponding label stringreturn labels[torch.argmax(output)]
The following packages have been pre-installed and can be used in your implementations:
Learn how to install additional packages here.