Skip to content

🪣 Token Bucket Rate Limiting

1. What is Token Bucket Rate Limiting?

  • Concept:
    The Token Bucket algorithm allows requests to be processed as long as there are tokens in the bucket. Tokens are added at a steady rate (the refill rate), up to a maximum capacity. Each request consumes a token. If the bucket is empty, requests are denied (or delayed). This allows for short bursts of traffic while enforcing a steady average rate over time.

  • Example:
    If you set a capacity of 10 tokens and a refill rate of 1 token per second, a client can make up to 10 requests instantly (if the bucket is full), and then 1 request per second thereafter as tokens are refilled.


2. Usage

Single Limiter Example

from fastapicap import TokenBucketRateLimiter
from fastapi import Depends

# Allow bursts up to 10 requests, refilling at 1 token per second
limiter = TokenBucketRateLimiter(capacity=10, tokens_per_second=1)

@app.get("/token-bucket", dependencies=[Depends(limiter)])
async def token_bucket_limited():
    return {"message": "You are within the token bucket rate limit!"}

Multiple Limiters Example

limiter_burst = TokenBucketRateLimiter(capacity=5, tokens_per_second=1)
limiter_long = TokenBucketRateLimiter(capacity=30, tokens_per_minute=10)

@app.get("/multi-token-bucket", dependencies=[Depends(limiter_burst), Depends(limiter_long)])
async def multi_token_bucket_limited():
    return {"message": "You passed both token bucket rate limits!"}

3. Available Configuration Options

You can customize the Token Bucket limiter using the following parameters:

Parameter Type Description Default
capacity int Required. Maximum number of tokens the bucket can hold (burst size). Must be positive.
tokens_per_second float Number of tokens added per second. 0
tokens_per_minute float Number of tokens added per minute. 0
tokens_per_hour float Number of tokens added per hour. 0
tokens_per_day float Number of tokens added per day. 0
key_func Callable Function to extract a unique key from the request. By default, uses client IP and path.
on_limit Callable Function called when the rate limit is exceeded. By default, raises HTTP 429.
prefix str Redis key prefix for all limiter keys. "cap"

Note:
- The total refill rate is the sum of all tokens_per_* arguments, converted to tokens per second. - At least one refill rate must be positive, and capacity must be positive.

Example:

# 100 token burst, refilling at 10 tokens per minute, with a custom prefix
limiter = TokenBucketRateLimiter(capacity=100, tokens_per_minute=10, prefix="myapi")

4. How Token Bucket Works (with Example)

Suppose you set a capacity of 10 tokens and a refill rate of 1 token per second.

  • The bucket starts full (10 tokens).
  • Each request consumes 1 token.
  • If 10 requests arrive instantly, all are allowed (bucket is now empty).
  • Further requests are denied until tokens are refilled (1 per second).
  • After 5 seconds, 5 tokens are available again, allowing 5 more requests.

Visualization:

Time Tokens in Bucket Request? Allowed? Reason
12:00:00 10 Yes Bucket full
12:00:00 9 Yes
... ... ... ... ...
12:00:00 1 Yes
12:00:00 0 Yes Last token used
12:00:01 0 Yes No tokens left
12:00:01 1 (refilled) Yes 1 token refilled
12:00:02 1 (refilled) Yes 1 token refilled

5. Notes, Pros & Cons

Notes:

  • This strategy is excellent for APIs that need to allow short bursts but enforce a steady average rate.
  • The retry_after value is based on how long until the next token is available.

Pros:

  • Allows bursts up to the bucket capacity.
  • Smooths out traffic over time.
  • Flexible and widely used in real-world APIs.

Cons:

  • Slightly more complex than Fixed Window.
  • If refill rate or capacity is misconfigured, can allow more requests than intended in short bursts.

Use Token Bucket when you want to allow bursts but enforce a steady average rate over time.