What is Rate Limit in API?

Rate limiting is one of those “boring-sounding but super important” API concepts. Here’s the clean explanation

What is Rate Limiting in an API?

Rate limiting is a mechanism that restricts how many API requests a client can make within a specific time period.

In simple terms:
“You can call this API only X times in Y seconds/minutes/hours.”

Example:

If the limit is exceeded, the API rejects further requests temporarily.

Protects APIs from:

Stops one user from consuming all server resources and slowing down others.

Keeps CPU, memory, and database load under control.

APIs often pay per request (cloud, SMS, email, maps APIs).
Rate limiting prevents unexpected bills

Client sends a request
API checks:
- Who is the client? (IP, user ID, API key, token)
- How many requests have they already made in the time window?
If under limit → request allowed
If over limit → request blocked

Blocked responses usually return:

HTTP 429 – Too Many Requests

Often with headers like:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
Retry-After: 60