One of the central challenges of MLOps is the differing needs and preferences of machine learning engineers and data scientists.
In most production machine learning pipelines, both roles have responsibilities. Data scientists are typically more involved on the training, while machine learning engineers tend more towards the production side, but there is commonly a degree of fluidity and overlap that varies from team to team.
For an MLOps team, this means that any piece of infrastructure that touches both teams needs to be productive for both sets of users. And in many cases, this means that an MLOps platform needs to have multiple interfaces, optimized for the needs of different users.
We’ve embraced this in recent releases of Cortex, where there are now two different interfaces that can be used to define and deploy models, designed to fit the workflows and preferred ergonomics of both teams.
A declarative, CLI-driven interface for machine learning engineers
The majority of machine learning engineers we’ve worked with have had backgrounds more closely aligned with traditional software engineering than data science. They tend to know their way around popular infrastructure tools (Serverless, CloudFormation, Kubernetes, etc.).
Cortex’s initial CLI-driven interface, which has similar ergonomics common deployment platforms, is familiar and comfortable to this group of users.
For example, prediction APIs (predictors) are just a Python script:
Deployments are then configured via a declarative YAML manifest:
Which exposes dozens of optional fields for configuring everything from autoscaling behavior, compute resources, prediction tracking, and more.
The API is then deployed and monitored directly from the CLI:
On deploy, Cortex packages and containerizes everything, then deploys the API to your Cortex cluster, where it automatically configures load balancing, request routing, autoscaling, and more.
When we initially built Cortex, we thought this interface was all we needed. We mostly had engineering backgrounds, and our initial users—people who felt the pain of GPU inference and autoscaling, for example—also skewed more towards engineering and DevOps roles.
However, as the Cortex community has grown, we have seen more and more teams where data scientists are involved in deployment, and for those data scientists, Cortex was not as helpful as it should have been.
A flexible Python client for deploying models
The typical data science toolchain is very different than that of an engineer’s. In particular, notebook-style environments are far and away the most common interface among data scientists, at least for writing code.
Asking data scientists to switch from this environment to deploy—let alone to configure a YAML manifest, write a requirements file, and deploy from a CLI—introduces a serious amount of friction.
To solve this problem, we’ve designed a new Python client for Cortex, one which offers identical functionality to the original CLI-driven interface, but using pure Python:
Instead of running "$ cortex deploy" in your terminal, you run "cx.create_api()" in your notebook. Instead of configuring a YAML manifest, you pass your specifications in as a dictionary. The interface is comfortable and familiar to people who primarily work out of notebooks, but we’ve also worked very hard to make sure it retains the core benefits of Cortex’s deployment interface, in particular:
- As configurable as you need. In both interfaces, Cortex has opinionated default values for almost every config field, but is fully customizable. You can configure deployments as granularly as you want—or rely on defaults.
- Reproducible by default. Cortex still packages and deploys models with unique identifiers, meaning if you need to debug later, you have complete visibility into each deployment and can associate it with predictions.
That second point is particularly important. One of the most common complaints about notebooks from engineers, and something we had to think a good deal about in building the Python client, is the issue of reproducibility. In fact, we wrote a whole article about why we didn’t support notebooks initially.
However, the Python client works identically to the CLI in that it packages up request serving code, configuration, and requirements into a single, uniquely identifiable deployment. This means that no matter what happens to your notebook, you can always debug a given deployment and see the request handling code directly, giving you a single, reproducible source of truth.
Beyond the nice ergonomics, we’re really excited to see how the Python client allows Cortex to better interface with higher level data science orchestration tools. For example, we recently collaborated with the team behind Netflix’s Metaflow to document how easy it is to define a reproducible end-to-end pipeline, from training to deployment, with Cortex and Metaflow.
Machine learning is a fundamental piece of modern software
Over the last 10 years, there’s been plenty of debate over whether or not machine learning is actually going to change software in a fundamental way, or if it is a nice-to-have optimization for the sort of companies that can benefit from in-depth business intelligence and 1% improvements.
The answer at this point is clear: machine learning is not a niche.
To build a modern content platform—think TikTok—you absolutely need a recommendation engine. To build any logistics or delivery-related application—think Uber—you need ETA prediction. Modern products are built with machine learning, and as this trend continues, machine learning is going to continue to go from a niche technology that is sprinkled into your stack to a core technology on which your product is built.
A result of this transition is that people from more functions are going to be interacting with machine learning. Just like DevOps tooling and workflows need to work for frontend engineers, database architects, and platform engineers, MLOps platforms need to have productive interfaces for data scientists, engineers, and everyone else who plays a role in the pipeline.