processes_per_replica and threads_per_process in your Realtime API configuration. Each model behaves differently, so the best way to find a good value is to run a load test on a single replica (you can set min_replicas to 1 to avoid autocaling). Here is additional information about these fields.min_replicas at (or slightly above) the number of replicas you expect will be necessary to handle the load at steady state. After traffic has been fully shifted to your API, min_replicas can be reduced to allow automatic downscaling.my-api, and route requests to my-api to any number of Realtime APIs (e.g. my-api_v1, my-api_v2, etc). The percentage of traffic that the traffic splitter routes to each API can be updated on the fly.max_replica_concurrency, since if there are many requests in the queue, it will take a long time until newly received requests are processed. See autoscaling docs for more details.api_gateway: none in the networking config in your Realtime API configuration and/or Batch API configuration. Alternatively, you can disable API gateway for all APIs in your cluster by setting api_gateway: none in your cluster configuration file before creating your cluster.