How do you deploy a machine learning model?

Deploying a machine learning (ML) model means making it available so real users or systems can send data to it and get predictions. Think of it as moving from “training in your laptop” to “running in production.”

Here’s a clear, practical flow:

1. Train and Save the Model

You first build and train your model using tools like scikit-learn, TensorFlow, or PyTorch.

Then save it:

.pkl / .joblib (scikit-learn)
.h5 or SavedModel (TensorFlow)
.pt (PyTorch)

2. Wrap the Model in an API

You expose the model using a web API so other apps can call it.

Common frameworks:

Flask (simple)
FastAPI (fast & production-ready)

Example flow:

Input → API → Model → Prediction → Response (JSON)

3. Containerize (Optional but Recommended)

Use Docker to package:

Code
Model
Dependencies

This ensures it runs the same everywhere.

4. Choose Deployment Type

A. Cloud Deployment

Popular platforms:

Amazon Web Services (EC2, SageMaker)
Google Cloud Platform (Vertex AI)
Microsoft Azure (Azure ML)

B. Server-based Deployment

Host API on a VM (Linux server)
Use Nginx + Gunicorn for production

C. Serverless Deployment

AWS Lambda / Azure Functions
Good for low-traffic or event-based predictions

D. Edge / On-device

Convert model (e.g., TensorFlow Lite) for mobile or IoT

5. Handle Scaling & Performance

Use load balancers
Add caching (e.g., Redis)
Batch predictions if needed

6. Monitor & Maintain

Track:

Model accuracy (drift)
Latency
Errors

Tools:

Logging systems
Monitoring dashboards

7. CI/CD Pipeline (Advanced)

Automate:

Model retraining
Testing
Deployment

Simple Real-World Architecture

User → Frontend → API (FastAPI/Flask)
                    ↓
                 ML Model
                    ↓
               Prediction

Quick Example (FastAPI)

from fastapi import FastAPI
import joblib

app = FastAPI()
model = joblib.load("model.pkl")

@app.get("/")
def home():
    return {"message": "ML API running"}

@app.post("/predict")
def predict(data: dict):
    prediction = model.predict([data["input"]])
    return {"result": prediction.tolist()}

In Short

Deployment =
Model + API + Hosting + Scaling + Monitoring