You can deploy TensorFlow models as web services by defining a class that implements Cortex's TensorFlow Predictor interface.
- name: <string> # API name (required)endpoint: <string> # the endpoint for the API (default: /<api_name>)predictor:type: tensorflowpath: <string> # path to a python file with a TensorFlowPredictor class definition, relative to the Cortex root (required)model: <string> # S3 path to an exported model (e.g. s3://my-bucket/exported_model) (required)signature_key: <string> # name of the signature def to use for prediction (required if your model has more than one signature def)config: <string: value> # dictionary that can be used to configure custom values (optional)python_path: <string> # path to the root of your Python folder that will be appended to PYTHONPATH (default: folder containing cortex.yaml)env: <string: string> # dictionary of environment variablestracker:key: <string> # the JSON key in the response to track (required if the response payload is a JSON object)model_type: <string> # model type, must be "classification" or "regression" (required)compute:min_replicas: <int> # minimum number of replicas (default: 1)max_replicas: <int> # maximum number of replicas (default: 100)init_replicas: <int> # initial number of replicas (default: <min_replicas>)target_cpu_utilization: <int> # CPU utilization threshold (as a percentage) to trigger scaling (default: 80)cpu: <string | int | float> # CPU request per replica (default: 200m)gpu: <int> # GPU request per replica (default: 0)mem: <string> # memory request per replica (default: Null)max_surge: <string | int> # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)max_unavailable: <string | int> # maximum number of replicas that can be unavailable during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
See packaging TensorFlow models for how to export a TensorFlow model.
- name: my-apipredictor:type: tensorflowpath: predictor.pymodel: s3://my-bucket/my-modelcompute:gpu: 1
You can log information about each request by adding a
?debug=true parameter to your requests. This will print:
The value after running the
A TensorFlow Predictor is a Python class that describes how to serve your TensorFlow model to make predictions.
Cortex provides a
tensorflow_client and a config object to initialize your implementation of the TensorFlow Predictor class. The
tensorflow_client is an instance of TensorFlowClient that manages a connection to a TensorFlow Serving container via gRPC to make predictions using your model. Once your implementation of the TensorFlow Predictor class has been initialized, the replica is available to serve requests. Upon receiving a request, your implementation's
predict() function is called with the JSON payload and is responsible for returning a prediction or batch of predictions. Your
predict() function should call
tensorflow_client.predict() to make an inference against your exported TensorFlow model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your
predict() function as well.
class TensorFlowPredictor:def __init__(self, tensorflow_client, config):"""Called once before the API becomes available. Setup for model serving such as downloading/initializing vocabularies can be done here. Required.Args:tensorflow_client: TensorFlow client which can be used to make predictions.config: Dictionary passed from API configuration (if specified)."""passdef predict(self, payload):"""Called once per request. Runs preprocessing of the request payload, inference, and postprocessing of the inference output. Required.Args:payload: The parsed JSON request payload.Returns:Prediction or a batch of predictions."""
labels = ["setosa", "versicolor", "virginica"]class TensorFlowPredictor:def __init__(self, tensorflow_client, config):self.client = tensorflow_clientdef predict(self, payload):prediction = self.client.predict(payload)predicted_class_id = int(prediction["class_ids"])return labels[predicted_class_id]
The following packages have been pre-installed and can be used in your implementations:
Learn how to install additional packages here.