
Machine Learning in Production
Deploying machine learning models to production is vastly different from training them. Here's what you need to know to build robust ML systems.
The Production Challenge
Training a model is just the beginning. Production systems must handle:
- Scale: Serving thousands of predictions per second
- Reliability: Maintaining high availability
- Monitoring: Detecting model degradation
- Updates: Deploying new versions safely
Architecture Patterns
Model Serving
Choose the right serving strategy:
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
model = load_model("production-model-v1")
class PredictionRequest(BaseModel):
features: list[float]
@app.post("/predict")
async def predict(request: PredictionRequest):
prediction = model.predict([request.features])
return {"prediction": prediction[0]}FastAPI model serving example
Batch vs Real-Time
Real-time serving:
- Low latency requirements
- Individual predictions
- User-facing applications
Batch processing:
- High throughput
- Scheduled jobs
- Analytical workloads
Data Pipeline Management
Robust data pipelines are critical:
- Data Validation: Ensure input quality
- Feature Engineering: Transform raw data
- Feature Store: Centralize feature management
- Versioning: Track data and model versions
Feature Store Example
class FeatureStore:
def get_features(self, entity_id: str, features: list[str]):
"""Retrieve features for an entity"""
return self.storage.query(
entity_id=entity_id,
features=features,
timestamp="latest"
)
def write_features(self, entity_id: str, features: dict):
"""Write computed features"""
self.storage.insert(
entity_id=entity_id,
features=features,
timestamp=now()
)Simple feature store implementation
Monitoring and Observability
Track these critical metrics:
| Category | Metrics | Threshold |
|---|---|---|
| Model Performance | Latency, Throughput, Error Rate | < 100ms, > 1000 req/s, < 1% |
| Model Quality | Accuracy, Precision, Recall | > 90% |
| Data Drift | Input distribution changes | Alert if > 10% shift |
| Concept Drift | Target distribution changes | Alert if > 5% shift |
Key monitoring metrics for ML systems
def monitor_predictions(y_true, y_pred):
accuracy = calculate_accuracy(y_true, y_pred)
if accuracy < threshold:
alert("Model accuracy below threshold")
trigger_retraining()Automated monitoring and alerting
Continuous Training
Keep models fresh with automated retraining:
- Detect performance degradation
- Collect new training data
- Retrain and validate model
- Deploy with A/B testing
- Monitor new version
A model in production is never finished—it's continuously evolving.
Best Practices
Version Control
- Track model versions
- Store training configs
- Maintain artifact registry
Testing
- Unit test preprocessing
- Integration test endpoints
- Load test production traffic
Security
- Validate inputs
- Rate limit APIs
- Encrypt sensitive data
Deployment Strategies
Blue-Green Deployment
Run two identical production environments:
services:
model-blue:
image: ml-model:v1
replicas: 3
model-green:
image: ml-model:v2
replicas: 3
routing:
production: blue # Switch to green after validationBlue-green deployment configuration
Canary Releases
Gradually roll out new versions:
- Start with 5% of traffic
- Monitor metrics closely
- Increase to 25%, 50%, 100%
- Rollback if issues detected
Production ML requires a different mindset than research. Focus on reliability, monitoring, and continuous improvement to build systems that deliver value over time.