🪣 Token Bucket Rate Limiting
1. What is Token Bucket Rate Limiting?
-
Concept:
The Token Bucket algorithm allows requests to be processed as long as there are tokens in the bucket. Tokens are added at a steady rate (the refill rate), up to a maximum capacity. Each request consumes a token. If the bucket is empty, requests are denied (or delayed). This allows for short bursts of traffic while enforcing a steady average rate over time. -
Example:
If you set a capacity of 10 tokens and a refill rate of 1 token per second, a client can make up to 10 requests instantly (if the bucket is full), and then 1 request per second thereafter as tokens are refilled.
2. Usage
Single Limiter Example
from fastapicap import TokenBucketRateLimiter
from fastapi import Depends
# Allow bursts up to 10 requests, refilling at 1 token per second
limiter = TokenBucketRateLimiter(capacity=10, tokens_per_second=1)
@app.get("/token-bucket", dependencies=[Depends(limiter)])
async def token_bucket_limited():
return {"message": "You are within the token bucket rate limit!"}
Multiple Limiters Example
limiter_burst = TokenBucketRateLimiter(capacity=5, tokens_per_second=1)
limiter_long = TokenBucketRateLimiter(capacity=30, tokens_per_minute=10)
@app.get("/multi-token-bucket", dependencies=[Depends(limiter_burst), Depends(limiter_long)])
async def multi_token_bucket_limited():
return {"message": "You passed both token bucket rate limits!"}
3. Available Configuration Options
You can customize the Token Bucket limiter using the following parameters:
Parameter | Type | Description | Default |
---|---|---|---|
capacity |
int |
Required. Maximum number of tokens the bucket can hold (burst size). Must be positive. | — |
tokens_per_second |
float |
Number of tokens added per second. | 0 |
tokens_per_minute |
float |
Number of tokens added per minute. | 0 |
tokens_per_hour |
float |
Number of tokens added per hour. | 0 |
tokens_per_day |
float |
Number of tokens added per day. | 0 |
key_func |
Callable |
Function to extract a unique key from the request. | By default, uses client IP and path. |
on_limit |
Callable |
Function called when the rate limit is exceeded. | By default, raises HTTP 429. |
prefix |
str |
Redis key prefix for all limiter keys. | "cap" |
Note:
- The total refill rate is the sum of all tokens_per_*
arguments, converted to tokens per second.
- At least one refill rate must be positive, and capacity
must be positive.
Example:
# 100 token burst, refilling at 10 tokens per minute, with a custom prefix
limiter = TokenBucketRateLimiter(capacity=100, tokens_per_minute=10, prefix="myapi")
4. How Token Bucket Works (with Example)
Suppose you set a capacity of 10 tokens and a refill rate of 1 token per second.
- The bucket starts full (10 tokens).
- Each request consumes 1 token.
- If 10 requests arrive instantly, all are allowed (bucket is now empty).
- Further requests are denied until tokens are refilled (1 per second).
- After 5 seconds, 5 tokens are available again, allowing 5 more requests.
Visualization:
Time | Tokens in Bucket | Request? | Allowed? | Reason |
---|---|---|---|---|
12:00:00 | 10 | Yes | ✅ | Bucket full |
12:00:00 | 9 | Yes | ✅ | |
... | ... | ... | ... | ... |
12:00:00 | 1 | Yes | ✅ | |
12:00:00 | 0 | Yes | ✅ | Last token used |
12:00:01 | 0 | Yes | ❌ | No tokens left |
12:00:01 | 1 (refilled) | Yes | ✅ | 1 token refilled |
12:00:02 | 1 (refilled) | Yes | ✅ | 1 token refilled |
5. Notes, Pros & Cons
Notes:
- This strategy is excellent for APIs that need to allow short bursts but enforce a steady average rate.
- The
retry_after
value is based on how long until the next token is available.
Pros:
- Allows bursts up to the bucket capacity.
- Smooths out traffic over time.
- Flexible and widely used in real-world APIs.
Cons:
- Slightly more complex than Fixed Window.
- If refill rate or capacity is misconfigured, can allow more requests than intended in short bursts.
Use Token Bucket when you want to allow bursts but enforce a steady average rate over time.