Once your model is exported, you can implement one of Cortex's Predictor classes to deploy your model. A Predictor is a Python class that describes how to initialize your model and use it to make predictions.
Which Predictor you use depends on how your model is exported:
TensorFlow Predictor if your model is exported as a TensorFlow SavedModel
ONNX Predictor if your model is exported in the ONNX format
Python Predictor for all other cases
The response type of the predictor can vary depending on your requirements, see API responses below.
Cortex makes all files in the project directory (i.e. the directory which contains cortex.yaml) available for use in your Predictor implementation. Python bytecode files (*.pyc, *.pyo, *.pyd), files or folders that start with ., and the api configuration file (e.g. cortex.yaml) are excluded.
The following files can also be added at the root of the project's directory:
.cortexignore file, which follows the same syntax and behavior as a .gitignore file.
.env file, which exports environment variables that can be used in the predictor. Each line of this file must follow the VARIABLE=value format.
For example, if your directory looks like this:
./my-classifier/├── cortex.yaml├── values.json├── predictor.py├── ...└── requirements.txt
You can access values.json in your Predictor like this:
import jsonclass PythonPredictor:def __init__(self, config):with open('values.json', 'r') as values_file:values = json.load(values_file)self.values = values
# initialization code and variables can be declared here in global scopeclass PythonPredictor:def __init__(self, config, python_client):"""(Required) Called once before the API becomes available. Performssetup such as downloading/initializing the model or downloading avocabulary.Args:config (required): Dictionary passed from API configuration (ifspecified). This may contain information on where to downloadthe model and/or metadata.python_client (optional): Python client which is used to retrievemodels for prediction. This should be saved for use in predict().Required when `predictor.model_path` or `predictor.models` isspecified in the api configuration."""self.client = python_client # optionaldef predict(self, payload, query_params, headers):"""(Required) Called once per request. Preprocesses the request payload(if necessary), runs inference, and postprocesses the inference output(if necessary).Args:payload (optional): The request payload (see below for the possiblepayload types).query_params (optional): A dictionary of the query parameters usedin the request.headers (optional): A dictionary of the headers sent in the request.Returns:Prediction or a batch of predictions."""passdef post_predict(self, response, payload, query_params, headers):"""(Optional) Called in the background after returning a response.Useful for tasks that the client doesn't need to wait on beforereceiving a response such as recording metrics or storing results.Note: post_predict() and predict() run in the same thread pool. Thesize of the thread pool can be increased by updating`threads_per_process` in the api configuration yaml.Args:response (optional): The response as returned by the predict method.payload (optional): The request payload (see below for the possiblepayload types).query_params (optional): A dictionary of the query parameters usedin the request.headers (optional): A dictionary of the headers sent in the request."""passdef load_model(self, model_path):"""(Optional) Called by Cortex to load a model when necessary.This method is required when `predictor.model_path` or `predictor.models`field is specified in the api configuration.Warning: this method must not make any modification to the model'scontents on disk.Args:model_path: The path to the model on disk.Returns:The loaded model from disk. The returned object is whatself.client.get_model() will return."""pass
When explicit model paths are specified in the Python predictor's API configuration, Cortex provides a python_client to your Predictor's constructor. python_client is an instance of PythonClient that is used to load model(s) (it calls the load_model() method of your predictor, which must be defined when using explicit model paths). It should be saved as an instance variable in your Predictor, and your predict() function should call python_client.get_model() to load your model for inference. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict() function as well.
When multiple models are defined using the Predictor's models field, the python_client.get_model() method expects an argument model_name which must hold the name of the model that you want to load (for example: self.client.get_model("text-generator")). There is also an optional second argument to specify the model version. See models and the multi model guide for more information.
For proper separation of concerns, it is recommended to use the constructor's config parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config in your API configuration, and it is passed through to your Predictor's constructor.
Your API can accept requests with different types of payloads such as JSON-parseable, bytes or starlette.datastructures.FormData data. Navigate to the API requests section to learn about how headers can be used to change the type of payload that is passed into your predict method.
Your predictor method can return different types of objects such as JSON-parseable, string, and bytes objects. Navigate to the API responses section to learn about how to configure your predictor method to respond with different response codes and content-types.
Many of the examples use the Python Predictor, including all of the PyTorch examples.
Here is the Predictor for examples/pytorch/text-generator:
import torchfrom transformers import GPT2Tokenizer, GPT2LMHeadModelclass PythonPredictor:def __init__(self, config):self.device = "cuda" if torch.cuda.is_available() else "cpu"print(f"using device: {self.device}")self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")self.model = GPT2LMHeadModel.from_pretrained("gpt2").to(self.device)def predict(self, payload):input_length = len(payload["text"].split())tokens = self.tokenizer.encode(payload["text"], return_tensors="pt").to(self.device)prediction = self.model.generate(tokens, max_length=input_length + 20, do_sample=True)return self.tokenizer.decode(prediction[0])
Here is the Predictor for examples/live-reloading/python/mpg-estimator:
import mlflow.sklearnimport numpy as npclass PythonPredictor:def __init__(self, config, python_client):self.client = python_clientdef load_model(self, model_path):return mlflow.sklearn.load_model(model_path)def predict(self, payload, query_params):model_version = query_params.get("version")model = self.client.get_model(model_version=model_version)model_input = [payload["cylinders"],payload["displacement"],payload["horsepower"],payload["weight"],payload["acceleration"],]result = model.predict([model_input]).item()return {"prediction": result, "model": {"version": model_version}}
The following Python packages are pre-installed in Python Predictors and can be used in your implementations:
boto3==1.14.53cloudpickle==1.6.0Cython==0.29.21dill==0.3.2fastapi==0.61.1joblib==0.16.0Keras==2.4.3msgpack==1.0.0nltk==3.5np-utils==0.5.12.1numpy==1.19.1opencv-python==4.4.0.42pandas==1.1.1Pillow==7.2.0pyyaml==5.3.1requests==2.24.0scikit-image==0.17.2scikit-learn==0.23.2scipy==1.5.2six==1.15.0statsmodels==0.12.0sympy==1.6.2tensorflow-hub==0.9.0tensorflow==2.3.0torch==1.6.0torchvision==0.7.0xgboost==1.2.0
The list is slightly different for inferentia-equipped APIs:
boto3==1.13.7cloudpickle==1.6.0Cython==0.29.21dill==0.3.1.1fastapi==0.54.1joblib==0.16.0msgpack==1.0.0neuron-cc==1.0.20600.0+0.b426b885fnltk==3.5np-utils==0.5.12.1numpy==1.18.2opencv-python==4.4.0.42pandas==1.1.1Pillow==7.2.0pyyaml==5.3.1requests==2.23.0scikit-image==0.17.2scikit-learn==0.23.2scipy==1.3.2six==1.15.0statsmodels==0.12.0sympy==1.6.2tensorflow==1.15.4tensorflow-neuron==1.15.3.1.0.2043.0torch==1.5.1torch-neuron==1.5.1.1.0.1721.0torchvision==0.6.1
The pre-installed system packages are listed in images/python-predictor-cpu/Dockerfile (for CPU), images/python-predictor-gpu/Dockerfile (for GPU), or images/python-predictor-inf/Dockerfile (for Inferentia).
If your application requires additional dependencies, you can install additional Python packages and system packages.
class TensorFlowPredictor:def __init__(self, tensorflow_client, config):"""(Required) Called once before the API becomes available. Performssetup such as downloading/initializing a vocabulary.Args:tensorflow_client (required): TensorFlow client which is used tomake predictions. This should be saved for use in predict().config (required): Dictionary passed from API configuration (ifspecified)."""self.client = tensorflow_client# Additional initialization may be done heredef predict(self, payload, query_params, headers):"""(Required) Called once per request. Preprocesses the request payload(if necessary), runs inference (e.g. by callingself.client.predict(model_input)), and postprocesses the inferenceoutput (if necessary).Args:payload (optional): The request payload (see below for the possiblepayload types).query_params (optional): A dictionary of the query parameters usedin the request.headers (optional): A dictionary of the headers sent in the request.Returns:Prediction or a batch of predictions."""passdef post_predict(self, response, payload, query_params, headers):"""(Optional) Called in the background after returning a response.Useful for tasks that the client doesn't need to wait on beforereceiving a response such as recording metrics or storing results.Note: post_predict() and predict() run in the same thread pool. Thesize of the thread pool can be increased by updating`threads_per_process` in the api configuration yaml.Args:response (optional): The response as returned by the predict method.payload (optional): The request payload (see below for the possiblepayload types).query_params (optional): A dictionary of the query parameters usedin the request.headers (optional): A dictionary of the headers sent in the request."""pass
Cortex provides a tensorflow_client to your Predictor's constructor. tensorflow_client is an instance of TensorFlowClient that manages a connection to a TensorFlow Serving container to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict() function should call tensorflow_client.predict() to make an inference with your exported TensorFlow model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict() function as well.
When multiple models are defined using the Predictor's models field, the tensorflow_client.predict() method expects a second argument model_name which must hold the name of the model that you want to use for inference (for example: self.client.predict(payload, "text-generator")). There is also an optional third argument to specify the model version. See models and the multi model guide for more information.
For proper separation of concerns, it is recommended to use the constructor's config parameter for information such as configurable model parameters or download links for initialization files. You define config in your API configuration, and it is passed through to your Predictor's constructor.
Your API can accept requests with different types of payloads such as JSON-parseable, bytes or starlette.datastructures.FormData data. Navigate to the API requests section to learn about how headers can be used to change the type of payload that is passed into your predict method.
Your predictor method can return different types of objects such as JSON-parseable, string, and bytes objects. Navigate to the API responses section to learn about how to configure your predictor method to respond with different response codes and content-types.
Most of the examples in examples/tensorflow use the TensorFlow Predictor.
Here is the Predictor for examples/tensorflow/iris-classifier:
labels = ["setosa", "versicolor", "virginica"]class TensorFlowPredictor:def __init__(self, tensorflow_client, config):self.client = tensorflow_clientdef predict(self, payload):prediction = self.client.predict(payload)predicted_class_id = int(prediction["class_ids"][0])return labels[predicted_class_id]
The following Python packages are pre-installed in TensorFlow Predictors and can be used in your implementations:
boto3==1.14.53dill==0.3.2fastapi==0.61.1msgpack==1.0.0numpy==1.19.1opencv-python==4.4.0.42pyyaml==5.3.1requests==2.24.0tensorflow-hub==0.9.0tensorflow-serving-api==2.3.0tensorflow==2.3.0
The pre-installed system packages are listed in images/tensorflow-predictor/Dockerfile.
If your application requires additional dependencies, you can install additional Python packages and system packages.
class ONNXPredictor:def __init__(self, onnx_client, config):"""(Required) Called once before the API becomes available. Performssetup such as downloading/initializing a vocabulary.Args:onnx_client (required): ONNX client which is used to makepredictions. This should be saved for use in predict().config (required): Dictionary passed from API configuration (ifspecified)."""self.client = onnx_client# Additional initialization may be done heredef predict(self, payload, query_params, headers):"""(Required) Called once per request. Preprocesses the request payload(if necessary), runs inference (e.g. by callingself.client.predict(model_input)), and postprocesses the inferenceoutput (if necessary).Args:payload (optional): The request payload (see below for the possiblepayload types).query_params (optional): A dictionary of the query parameters usedin the request.headers (optional): A dictionary of the headers sent in the request.Returns:Prediction or a batch of predictions."""passdef post_predict(self, response, payload, query_params, headers):"""(Optional) Called in the background after returning a response.Useful for tasks that the client doesn't need to wait on beforereceiving a response such as recording metrics or storing results.Note: post_predict() and predict() run in the same thread pool. Thesize of the thread pool can be increased by updating`threads_per_process` in the api configuration yaml.Args:response (optional): The response as returned by the predict method.payload (optional): The request payload (see below for the possiblepayload types).query_params (optional): A dictionary of the query parameters usedin the request.headers (optional): A dictionary of the headers sent in the request."""pass
Cortex provides an onnx_client to your Predictor's constructor. onnx_client is an instance of ONNXClient that manages an ONNX Runtime session to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict() function should call onnx_client.predict() to make an inference with your exported ONNX model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict() function as well.
When multiple models are defined using the Predictor's models field, the onnx_client.predict() method expects a second argument model_name which must hold the name of the model that you want to use for inference (for example: self.client.predict(model_input, "text-generator")). There is also an optional third argument to specify the model version. See models and the multi model guide for more information.
For proper separation of concerns, it is recommended to use the constructor's config parameter for information such as configurable model parameters or download links for initialization files. You define config in your API configuration, and it is passed through to your Predictor's constructor.
Your API can accept requests with different types of payloads such as JSON-parseable, bytes or starlette.datastructures.FormData data. Navigate to the API requests section to learn about how headers can be used to change the type of payload that is passed into your predict method.
Your predictor method can return different types of objects such as JSON-parseable, string, and bytes objects. Navigate to the API responses section to learn about how to configure your predictor method to respond with different response codes and content-types.
examples/onnx/iris-classifier uses the ONNX Predictor:
labels = ["setosa", "versicolor", "virginica"]class ONNXPredictor:def __init__(self, onnx_client, config):self.client = onnx_clientdef predict(self, payload):model_input = [payload["sepal_length"],payload["sepal_width"],payload["petal_length"],payload["petal_width"],]prediction = self.client.predict(model_input)predicted_class_id = prediction[0][0]return labels[predicted_class_id]
The following Python packages are pre-installed in ONNX Predictors and can be used in your implementations:
boto3==1.14.53dill==0.3.2fastapi==0.61.1msgpack==1.0.0numpy==1.19.1onnxruntime==1.4.0pyyaml==5.3.1requests==2.24.0
The pre-installed system packages are listed in images/onnx-predictor-cpu/Dockerfile (for CPU) or images/onnx-predictor-gpu/Dockerfile (for GPU).
If your application requires additional dependencies, you can install additional Python packages and system packages.
The type of the payload parameter in predict(self, payload) can vary based on the content type of the request. The payload parameter is parsed according to the Content-Type header in the request. Here are the parsing rules (see below for examples):
For Content-Type: application/json, payload will be the parsed JSON body.
For Content-Type: multipart/form-data / Content-Type: application/x-www-form-urlencoded, payload will be starlette.datastructures.FormData (key-value pairs where the values are strings for text data, or starlette.datastructures.UploadFile for file uploads; see Starlette's documentation).
For all other Content-Type values, payload will be the raw bytes of the request body.
Here are some examples:
Curl
$ curl https://***.amazonaws.com/my-api \-X POST -H "Content-Type: application/json" \-d '{"key": "value"}'
Or if you have a json file:
$ curl https://***.amazonaws.com/my-api \-X POST -H "Content-Type: application/json" \-d @file.json
Python
import requestsurl = "https://***.amazonaws.com/my-api"requests.post(url, json={"key": "value"})
Or if you have a json string:
import requestsimport jsonurl = "https://***.amazonaws.com/my-api"jsonStr = json.dumps({"key": "value"})requests.post(url, data=jsonStr, headers={"Content-Type": "application/json"})
When sending a JSON payload, the payload parameter will be a Python object:
class PythonPredictor:def __init__(self, config):passdef predict(self, payload):print(payload["key"]) # prints "value"
Curl
$ curl https://***.amazonaws.com/my-api \-X POST -H "Content-Type: application/octet-stream" \--data-binary @object.pkl
Python
import requestsimport pickleurl = "https://***.amazonaws.com/my-api"pklBytes = pickle.dumps({"key": "value"})requests.post(url, data=pklBytes, headers={"Content-Type": "application/octet-stream"})
Since the Content-Type: application/octet-stream header is used, the payload parameter will be a bytes object:
import pickleclass PythonPredictor:def __init__(self, config):passdef predict(self, payload):obj = pickle.loads(payload)print(obj["key"]) # prints "value"
Here's an example if the binary data is an image:
from PIL import Imageimport ioclass PythonPredictor:def __init__(self, config):passdef predict(self, payload, headers):img = Image.open(io.BytesIO(payload)) # read the payload bytes as an imageprint(img.size)
Curl
$ curl https://***.amazonaws.com/my-api \-X POST \-F "[email protected]" \-F "[email protected]" \-F "[email protected]"
Python
import requestsimport pickleurl = "https://***.amazonaws.com/my-api"files = {"text": open("text.txt", "rb"),"object": open("object.pkl", "rb"),"image": open("image.png", "rb"),}requests.post(url, files=files)
When sending files via form data, the payload parameter will be starlette.datastructures.FormData (key-value pairs where the values are starlette.datastructures.UploadFile, see Starlette's documentation). Either Content-Type: multipart/form-data or Content-Type: application/x-www-form-urlencoded can be used (typically Content-Type: multipart/form-data is used for files, and is the default in the examples above).
from PIL import Imageimport pickleclass PythonPredictor:def __init__(self, config):passdef predict(self, payload):text = payload["text"].file.read()print(text.decode("utf-8")) # prints the contents of text.txtobj = pickle.load(payload["object"].file)print(obj["key"]) # prints "value" assuming `object.pkl` is a pickled dictionary {"key": "value"}img = Image.open(payload["image"].file)print(img.size) # prints the dimensions of image.png
Curl
$ curl https://***.amazonaws.com/my-api \-X POST \-d "key=value"
Python
import requestsurl = "https://***.amazonaws.com/my-api"requests.post(url, data={"key": "value"})
When sending text via form data, the payload parameter will be starlette.datastructures.FormData (key-value pairs where the values are strings, see Starlette's documentation). Either Content-Type: multipart/form-data or Content-Type: application/x-www-form-urlencoded can be used (typically Content-Type: application/x-www-form-urlencoded is used for text, and is the default in the examples above).
class PythonPredictor:def __init__(self, config):passdef predict(self, payload):print(payload["key"]) # will print "value"
The response of your predict() function may be:
A JSON-serializable object (lists, dictionaries, numbers, etc.)
A string object (e.g. "class 1")
A bytes object (e.g. bytes(4) or pickle.dumps(obj))
An instance of starlette.responses.Response
Here are some examples:
def predict(self, payload):# json-serializable objectresponse = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]return response
def predict(self, payload):# string objectresponse = "class 1"return response
def predict(self, payload):# bytes-like objectarray = np.random.randn(3, 3)response = pickle.dumps(array)return response
def predict(self, payload):# starlette.responses.Responsedata = "class 1"response = starlette.responses.Response(content=data, media_type="text/plain")return response
It is possible to make requests from one API to another within a Cortex cluster. All running APIs are accessible from within the predictor at http://api-<api_name>:8888/predict, where <api_name> is the name of the API you are making a request to.
For example, if there is an api named text-generator running in the cluster, you could make a request to it from a different API by using:
import requestsclass PythonPredictor:def predict(self, payload):response = requests.post("http://api-text-generator:8888/predict", json={"text": "machine learning is"})# ...
Note that the autoscaling configuration (i.e. target_replica_concurrency) for the API that is making the request should be modified with the understanding that requests will still be considered "in-flight" with the first API as the request is being fulfilled in the second API (during which it will also be considered "in-flight" with the second API). See more details in the autoscaling docs.