API Rate Limiting

Amit Shrigondekar
2 min readFeb 13, 2022

What is rate-limiting?

  • Dropping incoming requests if it exceeds the capacity or rate you can handle

Why implement rate-limiting?

  • To avoid getting hammered by clients
  • To avoid system outages

How to know if you have congestion going on?

Look at

  • Average Response Time, p90,p50 response time
  • Age of messages in the queue
  • Count of messages in the dead letter queue
  • Look at the request throughput, node memory, CPU, etc.

What happens if I dont rate limit the incoming requests?

Due to misbehaving clients or actual traffic surge, you may see:

  • Out of Memory(OOM) exceptions
  • Delayed Responses / Higher latencies
  • Resource exhaustion
  • Cascading failure is possible causing system-wide outage( all machines/nodes eventually die)

Algorithms for Rate limiting:

  • Sliding window
  • Timer wheel/ Hierarchical timer wheel
  • Leaky Bucket
  • Token Bucket

Safeguards to put in place before Rate limiting:

  • Consider using GRpc built on HTTP 2.0 which does not have “ahead of line blocking”
  • Consider an async way to handle many requests (Multiplexing)
  • Use Kryo (Lossless Compression) to save bandwidth
  • Look at the number of client connections (Keep HTTP connections open for a certain window to avoid the overhead of creating/destroying connection)
  • Pull/Push Hybrid model: Normal user posts( update is fan-Out) vs Celebrity posts a message(Users will pull update)
  • Graceful degradation: Stop/suspend functionality that is not critical for your service during high load
  • See if using circuit breakers, timeout, retry, bulkhead patterns are helpful

Summary:

  • Use a distributed service to know if a request should or should not be accepted
  • Tricks to use request batching/collapsing( Combine multiple requests to the same request), client-side rate-limiting(exponential back-off), global/distributed cache layer to avoid repeated computations.
  • Use rate limiting algorithm such as Hierarchical timer wheel a variation of the token bucket algorithm

--

--