Rate Limiting Vulnerabilities

Why Rate Limiting Matters

Imagine a login endpoint with no rate limiting. An attacker can try 10,000 passwords per minute — and given enough time, they will find the correct one. This is called a brute-force attack. Without rate limiting, every authentication endpoint becomes a slot machine where the attacker pulls the lever thousands of times per second.

Rate limiting is the API's first line of defense against automated abuse. It prevents:

Brute-force attacks: Repeated login attempts
Credential stuffing: Trying leaked username/password pairs from other breaches
DDoS attacks: Overwhelming the server with requests
Web scraping: Automated data extraction
Inventory hoarding: Bots reserving limited stock

Why this matters for your career:

Missing rate limiting is one of the most common bug bounty findings
Rate limiting design is a standard interview topic for backend engineers
Understanding bypass techniques helps you build more robust systems
Rate limiting is a key component of any production API gateway

What Is Rate Limiting?

Rate limiting controls how many requests a client can make to an API within a specific time window. Think of it like a bouncer at a club who only lets in 50 people per hour — once the limit is reached, everyone else waits.

Rate Limiting Strategies

| Algorithm | How It Works | Best For | |:-----------|:-------------|:--------| | Token Bucket | Tokens refill at a fixed rate; each request consumes one token. Bursts allowed up to bucket capacity. | APIs with varying traffic patterns | | Sliding Window Log | Tracks timestamps of each request in a window. Rejects if count exceeds limit. | Precise enforcement needed | | Sliding Window Counter | Combines current and previous window counts for smooth rate limiting. | Production APIs (Redis) | | Fixed Window | Resets counter at each window boundary. Simple but allows bursts at boundaries. | Non-critical endpoints | | Leaky Bucket | Processes requests at a constant rate. Queues excess requests. | Stable throughput required |

Client IP vs User-Based Rate Limiting

| Key | Pros | Cons | |:-----|:------|:------| | Client IP | Simple, no auth required | Shared IPs (NAT, office VPNs) punished; attackers rotate IPs | | User ID (JWT sub) | Per-user fairness | Requires authentication | | API Key | Per-application accountability | Attacker can rotate API keys | | Hybrid (IP + User ID) | Best protection — blocks IP rotation + per-user limits | Most complex to implement |

How to Test Rate Limiting

Step 1: Basic Rate Limit Check

Send a burst of requests and observe when the API starts returning 429 (Too Many Requests):

import requests
import time

BASE_URL = "https://api.target.com/login"

def test_rate_limit():
    """Send 50 rapid requests and count 429 responses."""
    headers = {"Content-Type": "application/json"}
    data = {"username": "test", "password": "wrong"}
    
    rate_limited_count = 0
    success_count = 0
    
    for i in range(50):
        r = requests.post(BASE_URL, json=data, headers=headers)
        if r.status_code == 429:
            rate_limited_count += 1
        elif r.status_code in (200, 401):
            success_count += 1
        
        # Print every 10th request to track progress
        if i % 10 == 9:
            print(f"Request {i+1}/50: {success_count} success, {rate_limited_count} rate-limited")
    
    print(f"\nResult: {success_count} successful, {rate_limited_count} rate-limited")
    
    if rate_limited_count == 0:
        print("[!] No rate limiting detected — high risk!")
    elif rate_limited_count < 40:
        print("[!] Weak rate limiting — attacker can still brute force slowly")
    else:
        print("[+] Strong rate limiting in place")

test_rate_limit()

Step 2: Bypass Attempts

Try these common bypass techniques:

# 1. IP rotation with different X-Forwarded-For headers
curl -X POST https://api.target.com/login -H "X-Forwarded-For: 1.1.1.1"
curl -X POST https://api.target.com/login -H "X-Forwarded-For: 2.2.2.2"
curl -X POST https://api.target.com/login -H "X-Forwarded-For: 3.3.3.3"

# 2. Slow drip — stay under the limit
# Instead of 100 req/min, try 1 req/sec for 100 seconds
for i in $(seq 1 100); do
  curl -s -o /dev/null -w "%{http_code}\n" https://api.target.com/search &> /dev/null
  sleep 1
done

# 3. Use multiple endpoints (each may have separate limits)
curl https://api.target.com/login
curl https://api.target.com/api/auth
curl https://api.target.com/v2/authenticate

# 4. Check if limit resets by waiting
for i in $(seq 1 30); do
  curl -s https://api.target.com/api/resource
  sleep 2
done

Step 3: Check Response Headers

Well-configured APIs return rate limit info in headers:

RateLimit-Limit: 100
RateLimit-Remaining: 42
RateLimit-Reset: 3600
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1700000000
Retry-After: 57

Implementing Rate Limiting in FastAPI

from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.trustedhost import TrustedHostMiddleware
import time
from collections import defaultdict

app = FastAPI()

# In-memory rate limiter (for demo only; use Redis in production)
class RateLimiter:
    def __init__(self):
        self.requests = defaultdict(list)
    
    def is_allowed(self, key: str, max_requests: int, window_seconds: int) -> bool:
        now = time.time()
        window_start = now - window_seconds
        
        # Clean old entries
        self.requests[key] = [t for t in self.requests[key] if t > window_start]
        
        if len(self.requests[key]) >= max_requests:
            return False
        
        self.requests[key].append(now)
        return True

rate_limiter = RateLimiter()

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    # Identify client by IP or API key
    client_ip = request.client.host
    api_key = request.headers.get("X-API-Key")
    
    # Use API key if available, otherwise fall back to IP
    key = api_key or client_ip
    
    # Different limits for different endpoints
    if "/login" in request.url.path:
        max_req, window = 5, 60  # 5 requests per minute for login
    elif "/api/" in request.url.path:
        max_req, window = 100, 60  # 100 requests per minute for API
    else:
        max_req, window = 30, 60   # 30 requests per minute for general
    
    if not rate_limiter.is_allowed(key, max_req, window):
        raise HTTPException(
            status_code=429,
            detail="Too many requests. Please try again later.",
            headers={"Retry-After": str(window)}
        )
    
    response = await call_next(request)
    response.headers["X-RateLimit-Limit"] = str(max_req)
    response.headers["X-RateLimit-Remaining"] = str(
        max_req - len(rate_limiter.requests.get(key, []))
    )
    return response


@app.get("/api/resource")
async def get_resource():
    return {"message": "This endpoint is rate limited"}


@app.post("/login")
async def login():
    return {"message": "Login endpoint — aggressively rate limited"}

Production: Redis-Based Rate Limiting

import redis
import time

redis_client = redis.Redis(host="localhost", port=6379, decode_responses=True)

def sliding_window_rate_limit(user_id: str, max_requests: int, window: int) -> bool:
    """
    Sliding window rate limiter using Redis sorted sets.
    
    Args:
        user_id: Unique identifier for the client
        max_requests: Maximum requests allowed in the window
        window: Time window in seconds
    
    Returns:
        True if request is allowed, False if rate limited
    """
    now = time.time()
    key = f"ratelimit:{user_id}"
    window_start = now - window
    
    # Remove entries outside the window
    redis_client.zremrangebyscore(key, 0, window_start)
    
    # Count requests in the window
    current_count = redis_client.zcard(key)
    
    if current_count >= max_requests:
        # Get the oldest entry's timestamp to calculate retry-after
        oldest = redis_client.zrange(key, 0, 0, withscores=True)
        if oldest:
            retry_after = int(window - (now - oldest[0][1]))
            return False, retry_after
        return False, window
    
    # Add current request
    redis_client.zadd(key, {str(now): now})
    redis_client.expire(key, window)
    
    return True, 0

Rate Limiting Bypass Techniques

| Bypass Method | How It Works | Detection | Prevention | |:--------------|:-------------|:----------|:-----------| | IP rotation | Attacker uses proxies/VPN to rotate IPs | Monitor unique IPs per user in a short window | Use user-based (JWT sub) rate limiting, not just IP | | Header spoofing | Attacker sets X-Forwarded-For to fake IP | Validate header chain — use the last trusted proxy IP | Configure reverse proxy (NGINX) to strip untrusted headers | | Slow drip | Stay just under the rate limit over a long period | Monitor sustained low-rate patterns | Set cumulative daily/weekly limits in addition to per-minute limits | | Multiple endpoints | Each endpoint may have a separate limit | Correlate request patterns across endpoints | Use a global rate limiter shared across all endpoints | | Distributed attack | Botnet from many IPs simultaneously | Statistical anomaly detection on user behavior | Use CAPTCHA after suspicious patterns | | Reset exploitation | Wait for window reset and immediately send burst | Track reset timing patterns | Use sliding window instead of fixed window |

Common Pitfalls

| Mistake | Why It Is Dangerous | Fix | |:--------|:-------------------|:----| | Rate limiting by IP only | Office NAT, VPN, or cloud IP pools share one limit | Combine IP + user-based identification | | Fixed window at natural boundaries (per minute) | All requests rush at :00 mark | Use sliding window or randomize window start | | No rate limiting on read endpoints | Attackers can scrape all data via GET endpoints | Rate limit all endpoints, not just writes | | Revealing limit reset time in clear text | Attacker synchronizes attacks with resets | Use sliding window (no fixed reset) | | Rate limit too high | 1000 req/min is useless against automated tools | Start aggressive: 5 req/min for auth, 30 req/min for API | | No rate limiting on error responses | Attacker brute-forces without triggering limits | Count all responses, including 4xx and 5xx |

Summary

Rate limiting is not optional — it is a fundamental security control that every production API must implement.

What rate limiting is: Controlling request volume per client within a time window
Why it matters: Prevents brute-force, credential stuffing, scraping, and DDoS
How to implement: Token bucket for bursts, sliding window for precision, Redis for distributed systems
How to test: Send bursts looking for 429 responses, try bypass techniques (IP rotation, slow drip, header spoofing)
How to bypass: IP rotation, header manipulation, slow sustained rates, distributed attacks
Best practices: Use Redis sliding window, combine IP + user rate limiting, set aggressive limits on auth endpoints

What Is Next: Full API Pentest Report

Now that you understand every major API vulnerability — from SQL injection to JWT attacks to rate limiting bypass — the final chapter ties everything together into a complete penetration testing workflow. You will learn how to chain vulnerabilities for maximum impact, write professional penetration test reports, and communicate findings to development teams and stakeholders.