Production Deployment#

This guide covers best practices for deploying boosters models in production environments.

Model Serialization#

Save and load models for deployment:

Pickle (Recommended)#

import pickle
from boosters.sklearn import GBDTRegressor

# Train
model = GBDTRegressor(n_estimators=100)
model.fit(X_train, y_train)

# Save
with open("model.pkl", "wb") as f:
    pickle.dump(model, f)

# Load (in production)
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

predictions = model.predict(X_new)

Joblib (Large Models)#

For larger models, joblib provides better compression:

import joblib

# Save with compression
joblib.dump(model, "model.joblib", compress=3)

# Load
model = joblib.load("model.joblib")

Inference Optimization#

Batch Processing#

Always batch predictions when possible:

# ❌ Slow
predictions = [model.predict(x.reshape(1, -1))[0] for x in X]

# ✅ Fast
predictions = model.predict(X)

Pre-allocate Output#

For repeated predictions, reuse arrays efficiently:

import numpy as np

# Process in batches
results = []
for batch in batches:
    predictions = model.predict(batch)
    results.append(predictions)

all_predictions = np.concatenate(results)

Warm Starts#

If predicting repeatedly, the model is already in cache:

# First prediction may be slower (loading model into cache)
_ = model.predict(X_sample[:1])

# Subsequent predictions are faster
predictions = model.predict(X_production)

API Deployment#

FastAPI Example#

from fastapi import FastAPI
import numpy as np
import pickle

app = FastAPI()

# Load model once at startup
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

@app.post("/predict")
async def predict(features: list[float]):
    X = np.array([features])
    prediction = model.predict(X)[0]
    return {"prediction": float(prediction)}

Model Versioning#

Track model versions for reproducibility:

import hashlib
import json

def model_signature(model, X_sample):
    """Create a signature for model verification."""
    predictions = model.predict(X_sample[:10])
    return hashlib.md5(predictions.tobytes()).hexdigest()

# Save with metadata
metadata = {
    "version": "1.0.0",
    "created": "2024-01-15",
    "signature": model_signature(model, X_train),
    "n_features": X_train.shape[1],
}

with open("model_metadata.json", "w") as f:
    json.dump(metadata, f)

Input Validation#

Validate inputs before prediction:

import numpy as np

def validate_input(X, expected_features):
    """Validate input array."""
    X = np.asarray(X)

    if X.ndim == 1:
        X = X.reshape(1, -1)

    if X.shape[1] != expected_features:
        raise ValueError(
            f"Expected {expected_features} features, got {X.shape[1]}"
        )

    if np.isnan(X).all(axis=0).any():
        raise ValueError("Some features are entirely NaN")

    return X

# Usage
X_validated = validate_input(user_input, n_features=50)
prediction = model.predict(X_validated)

Monitoring#

Log predictions for monitoring:

import logging
import time

logger = logging.getLogger("model")

def predict_with_logging(model, X):
    start = time.perf_counter()
    predictions = model.predict(X)
    elapsed = time.perf_counter() - start

    logger.info(
        "Prediction completed",
        extra={
            "batch_size": len(X),
            "latency_ms": elapsed * 1000,
            "predictions_mean": float(predictions.mean()),
            "predictions_std": float(predictions.std()),
        }
    )

    return predictions

A/B Testing#

Compare model versions:

import random

class ModelRouter:
    def __init__(self, model_a, model_b, traffic_split=0.5):
        self.model_a = model_a
        self.model_b = model_b
        self.traffic_split = traffic_split

    def predict(self, X, experiment_id=None):
        if random.random() < self.traffic_split:
            model = self.model_a
            variant = "A"
        else:
            model = self.model_b
            variant = "B"

        predictions = model.predict(X)

        # Log for analysis
        logger.info(f"Variant {variant}", extra={"experiment_id": experiment_id})

        return predictions, variant

Performance Checklist#

Before deploying:

☐ Model file size is acceptable for your infrastructure ☐ Inference latency meets requirements (test with production batch sizes) ☐ Memory usage is within limits ☐ Input validation is in place ☐ Logging and monitoring are configured ☐ Model versioning and rollback plan exists ☐ A/B testing framework is ready (if applicable)