Create HTTP APIs that respond to requests in real-time.
mkdir text-generator && cd text-generatortouch handler.py requirements.txt text_generator.yaml
# handler.py​from transformers import pipeline​class Handler:def __init__(self, config):self.model = pipeline(task="text-generation")​def handle_post(self, payload):return self.model(payload["text"])[0]
# requirements.txt​transformerstorch
# text_generator.yaml​- name: text-generatorkind: RealtimeAPIhandler:type: pythonpath: handler.pycompute:gpu: 1
cortex deploy text_generator.yaml
cortex get text-generator --watch
cortex logs text-generator
curl http://***.elb.us-west-2.amazonaws.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'
cortex delete text-generator
To make the above API use gRPC as its protocol, make the following changes (the rest of the steps are the same):
Create a handler.proto file in your project's directory:
<!-- handler.proto -->​syntax = "proto3";package text_generator;​service Handler {rpc Predict (Message) returns (Message);}​message Message {string text = 1;}
Set the handler.protobuf_path field in the API spec to point to the handler.proto file:
# text_generator.yaml​- name: text-generatorkind: RealtimeAPIhandler:type: pythonpath: handler.pyprotobuf_path: handler.protocompute:gpu: 1
Match the name of the RPC service(s) from the protobuf definition (in this case Predict) with what you're defining in the handler's implementation:
# handler.py​from transformers import pipeline​class Handler:def __init__(self, config, proto_module_pb2):self.model = pipeline(task="text-generation")self.proto_module_pb2 = proto_module_pb2​def Predict(self, payload):return self.proto_module_pb2.Message(text="returned message")
grpcurl -plaintext -proto handler.proto -d '{"text": "hello-world"}' ***.elb.us-west-2.amazonaws.com:80 text_generator.Handler/Predict