How do you deploy a machine learning model?

Asked 1 month ago Updated 8 days ago 120 views

1 Answer


0

Deploying a machine learning (ML) model means making it available so real users or systems can send data to it and get predictions. Think of it as moving from “training in your laptop” to “running in production.”

Here’s a clear, practical flow:

1. Train and Save the Model

You first build and train your model using tools like scikit-learn, TensorFlow, or PyTorch.

Then save it:

  • .pkl / .joblib (scikit-learn)
  • .h5 or SavedModel (TensorFlow)
  • .pt (PyTorch)

2. Wrap the Model in an API

You expose the model using a web API so other apps can call it.

Common frameworks:

  • Flask (simple)
  • FastAPI (fast & production-ready)

Example flow:

  • Input → API → Model → Prediction → Response (JSON)

3. Containerize (Optional but Recommended)

Use Docker to package:

  • Code
  • Model
  • Dependencies

This ensures it runs the same everywhere.

4. Choose Deployment Type

A. Cloud Deployment

Popular platforms:

  • Amazon Web Services (EC2, SageMaker)
  • Google Cloud Platform (Vertex AI)
  • Microsoft Azure (Azure ML)

B. Server-based Deployment

  • Host API on a VM (Linux server)
  • Use Nginx + Gunicorn for production

C. Serverless Deployment

  • AWS Lambda / Azure Functions
  • Good for low-traffic or event-based predictions

D. Edge / On-device

  • Convert model (e.g., TensorFlow Lite) for mobile or IoT

5. Handle Scaling & Performance

  • Use load balancers
  • Add caching (e.g., Redis)
  • Batch predictions if needed

6. Monitor & Maintain

Track:

  • Model accuracy (drift)
  • Latency
  • Errors

Tools:

  • Logging systems
  • Monitoring dashboards

7. CI/CD Pipeline (Advanced)

Automate:

  • Model retraining
  • Testing
  • Deployment

Simple Real-World Architecture

User → Frontend → API (FastAPI/Flask)
                    ↓
                 ML Model
                    ↓
               Prediction

Quick Example (FastAPI)

from fastapi import FastAPI
import joblib

app = FastAPI()
model = joblib.load("model.pkl")

@app.get("/")
def home():
    return {"message": "ML API running"}

@app.post("/predict")
def predict(data: dict):
    prediction = model.predict([data["input"]])
    return {"result": prediction.tolist()}

In Short

  • Deployment =
    Model + API + Hosting + Scaling + Monitoring

Write Your Answer