Rate limiting is one of those “boring-sounding but super important” API concepts. Here’s the clean explanation
What is Rate Limiting in an API?
Rate limiting is a mechanism that restricts how many API requests a client can make within a specific time period.
In simple terms:
“You can call this API only X times in Y seconds/minutes/hours.”
Example:
- 100 requests per minute per user
- 1,000 requests per hour per IP
- 10 requests per second per API key
If the limit is exceeded, the API rejects further requests temporarily.
Why Rate Limiting Is Important
1. Prevents abuse & attacks
Protects APIs from:
- Brute-force attacks
- DDoS-style flooding
- Bot abuse
2. Ensures fair usage
Stops one user from consuming all server resources and slowing down others.
3. Improves performance & stability
Keeps CPU, memory, and database load under control.
4. Cost control
APIs often pay per request (cloud, SMS, email, maps APIs).
Rate limiting prevents unexpected bills
How Rate Limiting Works (Conceptually)
- Client sends a request
- API checks:
- Who is the client? (IP, user ID, API key, token)
- How many requests have they already made in the time window?
- If under limit → request allowed
- If over limit → request blocked
Blocked responses usually return:
HTTP 429 – Too Many Requests
Often with headers like:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
Retry-After: 60
Real-World Examples
- Login API: 5 attempts per minute (prevents brute force)
- OTP API: 3 requests per 10 minutes
- Public API: 1,000 requests per hour
- Admin API: Higher or unlimited limits
Rate Limiting vs Throttling (Quick Difference)
| Feature | Rate Limiting | Throttling |
|---|---|---|
| Purpose | Block excess requests | Slow down requests |
| Action | Reject (429) | Delay processing |
| Use case | Abuse prevention | Traffic smoothing |
Where Rate Limiting Is Implemented
- API Gateway (NGINX, Kong, AWS API Gateway)
- Application layer (middleware / filters)
- Reverse proxy
- CDN (Cloudflare, Akamai)