Deploying and operating models in production

A lot has been written about the challenges of deploying models to production. Teams often run into issues that fall into these categories:

Articulating requirements - what data will be available during inference? what’s the latency?
Designing an Inference API - inside a backend service? standalone service? HW selection?
Packaging, testing, deploying - how should one apply these to model-serving artifacts?
Model ramp up - what if the model suddenly fails as a result of deployment?
Model monitoring and evaluation - what if quality degrades over time?
Continuous training - how can the model automatically recover from degradation?

The solutions to these challenges differ based on multiple factors like scale, computation complexity, the model’s input type, and the operational environment, to name a few.