Once your model is exported, you can implement one of Cortex's Predictor classes to deploy your model. A Predictor is a Python class that describes how to initialize your model and use it to make predictions.
Which Predictor you use depends on how your model is exported:
​TensorFlow Predictor if your model is exported as a TensorFlow SavedModel
​ONNX Predictor if your model is exported in the ONNX format
​Python Predictor for all other cases
The response type of the predictor can vary depending on your requirements, see API responses below.
Cortex makes all files in the project directory (i.e. the directory which contains cortex.yaml) available for use in your Predictor implementation. Python bytecode files (*.pyc, *.pyo, *.pyd), files or folders that start with ., and the api configuration file (e.g. cortex.yaml) are excluded.
The following files can also be added at the root of the project's directory:
.cortexignore file, which follows the same syntax and behavior as a .gitignore file.
.env file, which exports environment variables that can be used in the predictor. Each line of this file must follow the VARIABLE=value format.
./iris-classifier/├── cortex.yaml├── values.json├── predictor.py├── ...└── requirements.txt
You can access values.json in your Predictor like this:
import json​class PythonPredictor:def __init__(self, config):with open('values.json', 'r') as values_file:values = json.load(values_file)self.values = values
# initialization code and variables can be declared here in global scope​class PythonPredictor:def __init__(self, config):"""Called once before the API becomes available. Performs setup such as downloading/initializing the model or downloading a vocabulary.​Args:config: Dictionary passed from API configuration (if specified). This may contain information on where to download the model and/or metadata."""pass​def predict(self, payload, query_params, headers):"""Called once per request. Preprocesses the request payload (if necessary), runs inference, and postprocesses the inference output (if necessary).​Args:payload: The request payload (see below for the possible payload types) (optional).query_params: A dictionary of the query parameters used in the request (optional).headers: A dictionary of the headers sent in the request (optional).​Returns:Prediction or a batch of predictions."""pass
For proper separation of concerns, it is recommended to use the constructor's config paramater for information such as from where to download the model and initialization files, or any configurable model parameters. You define config in your API configuration, and it is passed through to your Predictor's constructor.
The payload parameter is parsed according to the Content-Type header in the request. For Content-Type: application/json, payload will be the parsed JSON body. For Content-Type: multipart/form or Content-Type: application/x-www-form-urlencoded, payload will be starlette.datastructures.FormData (key-value pairs where the value is a string for form data, or starlette.datastructures.UploadFile for file uploads, see Starlette's documentation). For all other Content-Type values, payload will be the raw bytes of the request body.
Many of the examples use the Python Predictor, including all of the PyTorch examples.
Here is the Predictor for examples/pytorch/iris-classifier:
import reimport torchimport boto3from model import IrisNet​labels = ["setosa", "versicolor", "virginica"]​class PythonPredictor:def __init__(self, config):# download the modelbucket, key = re.match("s3://(.+?)/(.+)", config["model"]).groups()s3 = boto3.client("s3")s3.download_file(bucket, key, "model.pth")​# initialize the modelmodel = IrisNet()model.load_state_dict(torch.load("model.pth"))model.eval()​self.model = model​def predict(self, payload):# Convert the request to a tensor and pass it into the modelinput_tensor = torch.FloatTensor([[payload["sepal_length"],payload["sepal_width"],payload["petal_length"],payload["petal_width"],]])​# Run the predictionoutput = self.model(input_tensor)​# Translate the model output to the corresponding label stringreturn labels[torch.argmax(output[0])]
The following Python packages are pre-installed in Python Predictors and can be used in your implementations:
boto3==1.13.7cloudpickle==1.4.1Cython==0.29.17dill==0.3.1.1fastapi==0.54.1joblib==0.14.1Keras==2.3.1msgpack==1.0.0nltk==3.5np-utils==0.5.12.1numpy==1.18.4opencv-python==4.2.0.34pandas==1.0.3Pillow==7.1.2pyyaml==5.3.1requests==2.23.0scikit-image==0.17.1scikit-learn==0.22.2.post1scipy==1.4.1six==1.14.0statsmodels==0.11.1sympy==1.5.1tensorflow-hub==0.8.0tensorflow==2.1.0torch==1.5.0torchvision==0.6.0xgboost==1.0.2
For Inferentia-equipped APIs, the list is slightly different:
boto3==1.13.7cloudpickle==1.3.0Cython==0.29.17dill==0.3.1.1fastapi==0.54.1joblib==0.14.1msgpack==1.0.0neuron-cc==1.0.9410.0+6008239556nltk==3.4.5np-utils==0.5.12.1numpy==1.16.5opencv-python==4.2.0.32pandas==1.0.3Pillow==6.2.2pyyaml==5.3.1requests==2.23.0scikit-image==0.16.2scikit-learn==0.22.2.post1scipy==1.3.2six==1.14.0statsmodels==0.11.1sympy==1.5.1tensorflow-neuron==1.15.0.1.0.1333.0torch-neuron==1.0.825.0torchvision==0.4.2
The pre-installed system packages are listed in images/python-predictor-cpu/Dockerfile (for CPU), images/python-predictor-gpu/Dockerfile (for GPU), or images/python-predictor-inf/Dockerfile (for Inferentia).
If your application requires additional dependencies, you can install additional Python packages and system packages.
class TensorFlowPredictor:def __init__(self, tensorflow_client, config):"""Called once before the API becomes available. Performs setup such as downloading/initializing a vocabulary.​Args:tensorflow_client: TensorFlow client which is used to make predictions. This should be saved for use in predict().config: Dictionary passed from API configuration (if specified)."""self.client = tensorflow_client# Additional initialization may be done here​def predict(self, payload, query_params, headers):"""Called once per request. Preprocesses the request payload (if necessary), runs inference (e.g. by calling self.client.predict(model_input)), and postprocesses the inference output (if necessary).​Args:payload: The request payload (see below for the possible payload types) (optional).query_params: A dictionary of the query parameters used in the request (optional).headers: A dictionary of the headers sent in the request (optional).​Returns:Prediction or a batch of predictions."""pass
Cortex provides a tensorflow_client to your Predictor's constructor. tensorflow_client is an instance of TensorFlowClient that manages a connection to a TensorFlow Serving container to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict() function should call tensorflow_client.predict() to make an inference with your exported TensorFlow model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict() function as well.
When multiple models are defined using the Predictor's models field, the tensorflow_client.predict() method expects a second argument model_name which must hold the name of the model that you want to use for inference (for example: self.client.predict(payload, "iris-classifier")). See the multi model guide for more information.
For proper separation of concerns, it is recommended to use the constructor's config paramater for information such as configurable model parameters or download links for initialization files. You define config in your API configuration, and it is passed through to your Predictor's constructor.
The payload parameter is parsed according to the Content-Type header in the request. For Content-Type: application/json, payload will be the parsed JSON body. For Content-Type: multipart/form or Content-Type: application/x-www-form-urlencoded, payload will be starlette.datastructures.FormData (key-value pairs where the value is a string for form data, or starlette.datastructures.UploadFile for file uploads, see Starlette's documentation). For all other Content-Type values, payload will be the raw bytes of the request body.
Most of the examples in examples/tensorflow use the TensorFlow Predictor.
Here is the Predictor for examples/tensorflow/iris-classifier:
labels = ["setosa", "versicolor", "virginica"]​class TensorFlowPredictor:def __init__(self, tensorflow_client, config):self.client = tensorflow_client​def predict(self, payload):prediction = self.client.predict(payload)predicted_class_id = int(prediction["class_ids"][0])return labels[predicted_class_id]
The following Python packages are pre-installed in TensorFlow Predictors and can be used in your implementations:
boto3==1.13.7dill==0.3.1.1fastapi==0.54.1msgpack==1.0.0numpy==1.18.4opencv-python==4.2.0.34pyyaml==5.3.1requests==2.23.0tensorflow-hub==0.8.0tensorflow-serving-api==2.1.0tensorflow==2.1.0
The pre-installed system packages are listed in images/tensorflow-predictor/Dockerfile.
If your application requires additional dependencies, you can install additional Python packages and system packages.
class ONNXPredictor:def __init__(self, onnx_client, config):"""Called once before the API becomes available. Performs setup such as downloading/initializing a vocabulary.​Args:onnx_client: ONNX client which is used to make predictions. This should be saved for use in predict().config: Dictionary passed from API configuration (if specified)."""self.client = onnx_client# Additional initialization may be done here​def predict(self, payload, query_params, headers):"""Called once per request. Preprocesses the request payload (if necessary), runs inference (e.g. by calling self.client.predict(model_input)), and postprocesses the inference output (if necessary).​Args:payload: The request payload (see below for the possible payload types) (optional).query_params: A dictionary of the query parameters used in the request (optional).headers: A dictionary of the headers sent in the request (optional).​Returns:Prediction or a batch of predictions."""pass
Cortex provides an onnx_client to your Predictor's constructor. onnx_client is an instance of ONNXClient that manages an ONNX Runtime session to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict() function should call onnx_client.predict() to make an inference with your exported ONNX model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict() function as well.
When multiple models are defined using the Predictor's models field, the onnx_client.predict() method expects a second argument model_name which must hold the name of the model that you want to use for inference (for example: self.client.predict(model_input, "iris-classifier")). See the multi model guide for more information.
For proper separation of concerns, it is recommended to use the constructor's config paramater for information such as configurable model parameters or download links for initialization files. You define config in your API configuration, and it is passed through to your Predictor's constructor.
The payload parameter is parsed according to the Content-Type header in the request. For Content-Type: application/json, payload will be the parsed JSON body. For Content-Type: multipart/form or Content-Type: application/x-www-form-urlencoded, payload will be starlette.datastructures.FormData (key-value pairs where the value is a string for form data, or starlette.datastructures.UploadFile for file uploads, see Starlette's documentation). For all other Content-Type values, payload will be the raw bytes of the request body.
​examples/onnx/iris-classifier uses the ONNX Predictor:
labels = ["setosa", "versicolor", "virginica"]​class ONNXPredictor:def __init__(self, onnx_client, config):self.client = onnx_client​def predict(self, payload):model_input = [payload["sepal_length"],payload["sepal_width"],payload["petal_length"],payload["petal_width"],]​prediction = self.client.predict(model_input)predicted_class_id = prediction[0][0]return labels[predicted_class_id]
The following Python packages are pre-installed in ONNX Predictors and can be used in your implementations:
boto3==1.13.7dill==0.3.1.1fastapi==0.54.1msgpack==1.0.0numpy==1.18.4onnxruntime==1.2.0pyyaml==5.3.1requests==2.23.0
The pre-installed system packages are listed in images/onnx-predictor-cpu/Dockerfile (for CPU) or images/onnx-predictor-gpu/Dockerfile (for GPU).
If your application requires additional dependencies, you can install additional Python packages and system packages.
The response of your predict() function may be:
A JSON-serializable object (lists, dictionaries, numbers, etc.)
A string object (e.g. "class 1")
A bytes object (e.g. bytes(4) or pickle.dumps(obj))
An instance of starlette.responses.Response​
Here are some examples:
def predict(self, payload):# json-serializable objectresponse = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]return response
def predict(self, payload):# string objectresponse = "class 1"return response
def predict(self, payload):# bytes-like objectarray = np.random.randn(3, 3)response = pickle.dumps(array)return response
def predict(self, payload):# starlette.responses.Responsedata = "class 1"response = starlette.responses.Response(content=data, media_type="text/plain")return response