Deployment & Monitoring
Getting a model into production is half the job. Keeping it working — as data shifts beneath it — is the other half, and the half teams most often skip.
How models are served
Section titled “How models are served”Two main shapes, picked by how predictions are consumed:
- Online / real-time — the model sits behind an API and answers per request. For interactive features. Needs low latency and scaling. (See AI Infrastructure.)
- Batch — the model scores a large dataset on a schedule, results written to a store. For “score every customer nightly.” Simpler; no latency pressure.
Choose online only when predictions are genuinely needed on demand — batch is cheaper and simpler when freshness allows it.
The model registry
Section titled “The model registry”A model registry is version control for trained models — the source of truth for what exists and what’s live.
Each registered model carries its version, the metrics it earned, a link to the
training run and data, and a stage: staging, production, archived. The
registry makes one critical operation trivial: rollback. When a new model
misbehaves, you re-point production at the previous version immediately — no
retraining, no scramble.
Deploying safely
Section titled “Deploying safely”Never flip 100% of traffic to a new model at once. Roll it out progressively:
| Strategy | How it works |
|---|---|
| Shadow | New model runs on real traffic; its output is logged, not served. Zero-risk validation. |
| Canary | New model serves a small slice (1–5%); watch metrics; widen or roll back. |
| Blue-green | Two environments; switch traffic over, switch back instantly on trouble. |
| A/B test | Two models split traffic to compare a business metric directly. |
Shadow then canary is a strong default: prove the model on live traffic without risk, then expose it gradually.
Monitoring: the part that gets skipped
Section titled “Monitoring: the part that gets skipped”A deployed model degrades silently — no errors, no alerts, just slowly worsening predictions. Monitoring is what makes that visible. Watch four layers:
- Operational — latency, throughput, error rate, cost. Standard service health.
- Data quality — are incoming features valid: schema, ranges, null rates, missing values? Bad inputs are the most common production failure.
- Drift — has the input distribution moved away from training data?
- Model performance — accuracy, precision, recall on live data, once ground truth is available.
Drift is the core reason models decay:
- Data drift — the input distribution changes. A new customer segment, a pricing change, seasonality. The model now sees inputs unlike its training set.
- Concept drift — the relationship between input and output changes. What predicted fraud last year no longer does, because fraud tactics evolved.
Retraining
Section titled “Retraining”When monitoring shows decay, the model is refreshed on newer data — re-entering the lifecycle loop. Triggers:
- Scheduled — retrain every week or month. Simple, predictable.
- Triggered — retrain when drift or a performance metric crosses a threshold. Efficient, but needs reliable monitoring to fire it.
Every retrained model goes through the same evaluation gate and staged rollout. “Retrain” never means “deploy blindly” — newer data does not guarantee a better model.
Key takeaways
Section titled “Key takeaways”Serve models online for on-demand predictions, in batch when freshness allows — batch is simpler. A model registry versions trained models and makes rollback instant. Deploy progressively: shadow, then canary, never a hard cutover. Monitor operations, data quality, drift, and performance — because models fail silently. Drift (in data or concept) is the main cause of decay; watch input distributions since true labels arrive late. Retrain on a schedule or a trigger, always through the same gate and rollout.