When you build a Machine Learning model, the next step is serving the model so other applications can use it.
Serving a model means making the model available through an API (Application Programming Interface) so that any app, website, or service can send data and get predictions.
The most common way to serve a model is using a REST API.
In this blog, we will learn:
- What is model serving
- What is REST API
- How to serve ML model using REST API
- Example using C# (.NET)
- Example using Python (FastAPI)
1. What is Model Serving?
Model Serving means:
Making a trained ML model available for real-time prediction.
Example:
You trained a Spam Detection Model
Input → "Win money now!!!"
Output → Spam
Now you want your website to send text to model and get result.
For that → we use API.
2. What is REST API?
REST API is a web service that works using HTTP.
Common methods:
| Method | Use |
|---|---|
| GET | Read data |
| POST | Send data |
| PUT | Update |
| DELETE | Remove |
For ML model:
We usually use POST
Example request:
POST /predict
{
"text": "Free lottery offer"
}
Response:
{
"result": "Spam"
}
3. Architecture of Model Serving
Client (Web / App)
|
|
REST API
|
ML Model
|
Prediction
Steps:
- Train model
- Save model to file
- Load model in API
- Send request
- Return prediction
4. Example — Serving ML.NET Model using REST API (.NET)
This is best for ASP.NET MVC / .NET developers.
Step 1 — Train and Save Model
model.zip
Step 2 — Create Web API Project
Create project:
ASP.NET Core Web API
Step 3 — Install ML.NET
Install-Package Microsoft.ML
Step 4 — Create Prediction Model Class
public class ModelInput
{
public string Text { get; set; }
}
public class ModelOutput
{
public bool Prediction { get; set; }
}
Step 5 — Load Model
using Microsoft.ML;
public class PredictionService
{
private PredictionEngine<ModelInput, ModelOutput> _engine;
public PredictionService()
{
var mlContext = new MLContext();
ITransformer model =
mlContext.Model.Load("model.zip", out var schema);
_engine = mlContext.Model
.CreatePredictionEngine<ModelInput, ModelOutput>(model);
}
public bool Predict(string text)
{
var input = new ModelInput { Text = text };
var result = _engine.Predict(input);
return result.Prediction;
}
}
Step 6 — Create API Controller
[ApiController]
[Route("api/predict")]
public class PredictController : ControllerBase
{
private readonly PredictionService _service;
public PredictController()
{
_service = new PredictionService();
}
[HttpPost]
public IActionResult Predict([FromBody] ModelInput input)
{
var result = _service.Predict(input.Text);
return Ok(new
{
prediction = result
});
}
}
Step 7 — Call API
POST /api/predict
Body:
{
"text": "Free money offer"
}
Response:
{
"prediction": true
}
Your model is now live.
5. Example — Python FastAPI Model Serving
FastAPI is very popular for ML.
Install:
pip install fastapi uvicorn joblib
Example
from fastapi import FastAPI
import joblib
app = FastAPI()
model = joblib.load("model.pkl")
@app.post("/predict")
def predict(data: dict):
text = data["text"]
result = model.predict([text])
return {
"prediction": str(result[0])
}
Run:
uvicorn main:app --reload
Open:
http://localhost:8000/docs
You can test API.
6. Where Model Serving is Used
- Chatbots
- Recommendation system
- Spam detection
- Image recognition
- Article similarity
- Search ranking
- AI assistants
7. Best Practices
- Load model once
- Do not load model per request
- Use async API
- Add logging
- Validate input
- Use caching if needed
- Use Docker for deployment
8. Production Architecture
Client
|
API Gateway
|
REST API
|
Model Service
|
Model File
Large systems use:
- Kubernetes
- Docker
- Redis cache
- Load balancer
Conclusion
Serving a model using REST API allows your ML model to be used by any application.
It is the most important step after training a model.
Without serving → model is useless
With API → model becomes product
Model → API → App → User