Machine Learning in Production
8 min read

Machine Learning in Production

Deploying machine learning models to production is vastly different from training them. Here's what you need to know to build robust ML systems.

The Production Challenge

Training a model is just the beginning. Production systems must handle:

  • Scale: Serving thousands of predictions per second
  • Reliability: Maintaining high availability
  • Monitoring: Detecting model degradation
  • Updates: Deploying new versions safely

Architecture Patterns

Model Serving

Choose the right serving strategy:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()
model = load_model("production-model-v1")

class PredictionRequest(BaseModel):
    features: list[float]

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict([request.features])
    return {"prediction": prediction[0]}

FastAPI model serving example

Batch vs Real-Time

Real-time serving:

  • Low latency requirements
  • Individual predictions
  • User-facing applications

Batch processing:

  • High throughput
  • Scheduled jobs
  • Analytical workloads

Data Pipeline Management

Robust data pipelines are critical:

  1. Data Validation: Ensure input quality
  2. Feature Engineering: Transform raw data
  3. Feature Store: Centralize feature management
  4. Versioning: Track data and model versions

Feature Store Example

class FeatureStore:
    def get_features(self, entity_id: str, features: list[str]):
        """Retrieve features for an entity"""
        return self.storage.query(
            entity_id=entity_id,
            features=features,
            timestamp="latest"
        )
    
    def write_features(self, entity_id: str, features: dict):
        """Write computed features"""
        self.storage.insert(
            entity_id=entity_id,
            features=features,
            timestamp=now()
        )

Simple feature store implementation

Monitoring and Observability

Track these critical metrics:

CategoryMetricsThreshold
Model PerformanceLatency, Throughput, Error Rate< 100ms, > 1000 req/s, < 1%
Model QualityAccuracy, Precision, Recall> 90%
Data DriftInput distribution changesAlert if > 10% shift
Concept DriftTarget distribution changesAlert if > 5% shift

Key monitoring metrics for ML systems

def monitor_predictions(y_true, y_pred):
    accuracy = calculate_accuracy(y_true, y_pred)
    
    if accuracy < threshold:
        alert("Model accuracy below threshold")
        trigger_retraining()

Automated monitoring and alerting

Continuous Training

Keep models fresh with automated retraining:

  1. Detect performance degradation
  2. Collect new training data
  3. Retrain and validate model
  4. Deploy with A/B testing
  5. Monitor new version
A model in production is never finished—it's continuously evolving.

Best Practices

Version Control

  • Track model versions
  • Store training configs
  • Maintain artifact registry

Testing

  • Unit test preprocessing
  • Integration test endpoints
  • Load test production traffic

Security

  • Validate inputs
  • Rate limit APIs
  • Encrypt sensitive data

Deployment Strategies

Blue-Green Deployment

Run two identical production environments:

services:
  model-blue:
    image: ml-model:v1
    replicas: 3
  
  model-green:
    image: ml-model:v2
    replicas: 3

routing:
  production: blue  # Switch to green after validation

Blue-green deployment configuration

Canary Releases

Gradually roll out new versions:

  • Start with 5% of traffic
  • Monitor metrics closely
  • Increase to 25%, 50%, 100%
  • Rollback if issues detected

Production ML requires a different mindset than research. Focus on reliability, monitoring, and continuous improvement to build systems that deliver value over time.