API Rate Limiting

2 min readFeb 13, 2022

What is rate-limiting?

Look at

Due to misbehaving clients or actual traffic surge, you may see:

Out of Memory(OOM) exceptions
Delayed Responses / Higher latencies
Resource exhaustion
Cascading failure is possible causing system-wide outage( all machines/nodes eventually die)

Consider using GRpc built on HTTP 2.0 which does not have “ahead of line blocking”
Consider an async way to handle many requests (Multiplexing)
Use Kryo (Lossless Compression) to save bandwidth
Look at the number of client connections (Keep HTTP connections open for a certain window to avoid the overhead of creating/destroying connection)
Pull/Push Hybrid model: Normal user posts( update is fan-Out) vs Celebrity posts a message(Users will pull update)
Graceful degradation: Stop/suspend functionality that is not critical for your service during high load
See if using circuit breakers, timeout, retry, bulkhead patterns are helpful

Use a distributed service to know if a request should or should not be accepted
Tricks to use request batching/collapsing( Combine multiple requests to the same request), client-side rate-limiting(exponential back-off), global/distributed cache layer to avoid repeated computations.
Use rate limiting algorithm such as Hierarchical timer wheel a variation of the token bucket algorithm