Rate Limiting Vulnerabilities

Why Rate Limiting Matters

Imagine a login endpoint with no rate limiting. An attacker can try 10,000 passwords per minute — and given enough time, they will find the correct one. This is called a brute-force attack. Without rate limiting, every authentication endpoint becomes a slot machine where the attacker pulls the lever thousands of times per second.

Rate limiting is the API's first line of defense against automated abuse. It prevents:

  • Brute-force attacks: Repeated login attempts
  • Credential stuffing: Trying leaked username/password pairs from other breaches
  • DDoS attacks: Overwhelming the server with requests
  • Web scraping: Automated data extraction
  • Inventory hoarding: Bots reserving limited stock

Why this matters for your career:

  • Missing rate limiting is one of the most common bug bounty findings
  • Rate limiting design is a standard interview topic for backend engineers
  • Understanding bypass techniques helps you build more robust systems
  • Rate limiting is a key component of any production API gateway

What Is Rate Limiting?

Rate limiting controls how many requests a client can make to an API within a specific time window. Think of it like a bouncer at a club who only lets in 50 people per hour — once the limit is reached, everyone else waits.

Rate Limiting Strategies

| Algorithm | How It Works | Best For | |:-----------|:-------------|:--------| | Token Bucket | Tokens refill at a fixed rate; each request consumes one token. Bursts allowed up to bucket capacity. | APIs with varying traffic patterns | | Sliding Window Log | Tracks timestamps of each request in a window. Rejects if count exceeds limit. | Precise enforcement needed | | Sliding Window Counter | Combines current and previous window counts for smooth rate limiting. | Production APIs (Redis) | | Fixed Window | Resets counter at each window boundary. Simple but allows bursts at boundaries. | Non-critical endpoints | | Leaky Bucket | Processes requests at a constant rate. Queues excess requests. | Stable throughput required |

Client IP vs User-Based Rate Limiting

| Key | Pros | Cons | |:-----|:------|:------| | Client IP | Simple, no auth required | Shared IPs (NAT, office VPNs) punished; attackers rotate IPs | | User ID (JWT sub) | Per-user fairness | Requires authentication | | API Key | Per-application accountability | Attacker can rotate API keys | | Hybrid (IP + User ID) | Best protection — blocks IP rotation + per-user limits | Most complex to implement |

How to Test Rate Limiting

Step 1: Basic Rate Limit Check

Send a burst of requests and observe when the API starts returning 429 (Too Many Requests):

import requests
import time

BASE_URL = "https://api.target.com/login"

def test_rate_limit():
    """Send 50 rapid requests and count 429 responses."""
    headers = {"Content-Type": "application/json"}
    data = {"username": "test", "password": "wrong"}
    
    rate_limited_count = 0
    success_count = 0
    
    for i in range(50):
        r = requests.post(BASE_URL, json=data, headers=headers)
        if r.status_code == 429:
            rate_limited_count += 1
        elif r.status_code in (200, 401):
            success_count += 1
        
        # Print every 10th request to track progress
        if i % 10 == 9:
            print(f"Request {i+1}/50: {success_count} success, {rate_limited_count} rate-limited")
    
    print(f"\nResult: {success_count} successful, {rate_limited_count} rate-limited")
    
    if rate_limited_count == 0:
        print("[!] No rate limiting detected — high risk!")
    elif rate_limited_count < 40:
        print("[!] Weak rate limiting — attacker can still brute force slowly")
    else:
        print("[+] Strong rate limiting in place")

test_rate_limit()

Step 2: Bypass Attempts

Try these common bypass techniques:

# 1. IP rotation with different X-Forwarded-For headers
curl -X POST https://api.target.com/login -H "X-Forwarded-For: 1.1.1.1"
curl -X POST https://api.target.com/login -H "X-Forwarded-For: 2.2.2.2"
curl -X POST https://api.target.com/login -H "X-Forwarded-For: 3.3.3.3"

# 2. Slow drip — stay under the limit
# Instead of 100 req/min, try 1 req/sec for 100 seconds
for i in $(seq 1 100); do
  curl -s -o /dev/null -w "%{http_code}\n" https://api.target.com/search &> /dev/null
  sleep 1
done

# 3. Use multiple endpoints (each may have separate limits)
curl https://api.target.com/login
curl https://api.target.com/api/auth
curl https://api.target.com/v2/authenticate

# 4. Check if limit resets by waiting
for i in $(seq 1 30); do
  curl -s https://api.target.com/api/resource
  sleep 2
done

Step 3: Check Response Headers

Well-configured APIs return rate limit info in headers:

RateLimit-Limit: 100
RateLimit-Remaining: 42
RateLimit-Reset: 3600
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1700000000
Retry-After: 57

Implementing Rate Limiting in FastAPI

from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.trustedhost import TrustedHostMiddleware
import time
from collections import defaultdict

app = FastAPI()

# In-memory rate limiter (for demo only; use Redis in production)
class RateLimiter:
    def __init__(self):
        self.requests = defaultdict(list)
    
    def is_allowed(self, key: str, max_requests: int, window_seconds: int) -> bool:
        now = time.time()
        window_start = now - window_seconds
        
        # Clean old entries
        self.requests[key] = [t for t in self.requests[key] if t > window_start]
        
        if len(self.requests[key]) >= max_requests:
            return False
        
        self.requests[key].append(now)
        return True

rate_limiter = RateLimiter()

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    # Identify client by IP or API key
    client_ip = request.client.host
    api_key = request.headers.get("X-API-Key")
    
    # Use API key if available, otherwise fall back to IP
    key = api_key or client_ip
    
    # Different limits for different endpoints
    if "/login" in request.url.path:
        max_req, window = 5, 60  # 5 requests per minute for login
    elif "/api/" in request.url.path:
        max_req, window = 100, 60  # 100 requests per minute for API
    else:
        max_req, window = 30, 60   # 30 requests per minute for general
    
    if not rate_limiter.is_allowed(key, max_req, window):
        raise HTTPException(
            status_code=429,
            detail="Too many requests. Please try again later.",
            headers={"Retry-After": str(window)}
        )
    
    response = await call_next(request)
    response.headers["X-RateLimit-Limit"] = str(max_req)
    response.headers["X-RateLimit-Remaining"] = str(
        max_req - len(rate_limiter.requests.get(key, []))
    )
    return response


@app.get("/api/resource")
async def get_resource():
    return {"message": "This endpoint is rate limited"}


@app.post("/login")
async def login():
    return {"message": "Login endpoint — aggressively rate limited"}

Production: Redis-Based Rate Limiting

import redis
import time

redis_client = redis.Redis(host="localhost", port=6379, decode_responses=True)

def sliding_window_rate_limit(user_id: str, max_requests: int, window: int) -> bool:
    """
    Sliding window rate limiter using Redis sorted sets.
    
    Args:
        user_id: Unique identifier for the client
        max_requests: Maximum requests allowed in the window
        window: Time window in seconds
    
    Returns:
        True if request is allowed, False if rate limited
    """
    now = time.time()
    key = f"ratelimit:{user_id}"
    window_start = now - window
    
    # Remove entries outside the window
    redis_client.zremrangebyscore(key, 0, window_start)
    
    # Count requests in the window
    current_count = redis_client.zcard(key)
    
    if current_count >= max_requests:
        # Get the oldest entry's timestamp to calculate retry-after
        oldest = redis_client.zrange(key, 0, 0, withscores=True)
        if oldest:
            retry_after = int(window - (now - oldest[0][1]))
            return False, retry_after
        return False, window
    
    # Add current request
    redis_client.zadd(key, {str(now): now})
    redis_client.expire(key, window)
    
    return True, 0

Rate Limiting Bypass Techniques

| Bypass Method | How It Works | Detection | Prevention | |:--------------|:-------------|:----------|:-----------| | IP rotation | Attacker uses proxies/VPN to rotate IPs | Monitor unique IPs per user in a short window | Use user-based (JWT sub) rate limiting, not just IP | | Header spoofing | Attacker sets X-Forwarded-For to fake IP | Validate header chain — use the last trusted proxy IP | Configure reverse proxy (NGINX) to strip untrusted headers | | Slow drip | Stay just under the rate limit over a long period | Monitor sustained low-rate patterns | Set cumulative daily/weekly limits in addition to per-minute limits | | Multiple endpoints | Each endpoint may have a separate limit | Correlate request patterns across endpoints | Use a global rate limiter shared across all endpoints | | Distributed attack | Botnet from many IPs simultaneously | Statistical anomaly detection on user behavior | Use CAPTCHA after suspicious patterns | | Reset exploitation | Wait for window reset and immediately send burst | Track reset timing patterns | Use sliding window instead of fixed window |

Common Pitfalls

| Mistake | Why It Is Dangerous | Fix | |:--------|:-------------------|:----| | Rate limiting by IP only | Office NAT, VPN, or cloud IP pools share one limit | Combine IP + user-based identification | | Fixed window at natural boundaries (per minute) | All requests rush at :00 mark | Use sliding window or randomize window start | | No rate limiting on read endpoints | Attackers can scrape all data via GET endpoints | Rate limit all endpoints, not just writes | | Revealing limit reset time in clear text | Attacker synchronizes attacks with resets | Use sliding window (no fixed reset) | | Rate limit too high | 1000 req/min is useless against automated tools | Start aggressive: 5 req/min for auth, 30 req/min for API | | No rate limiting on error responses | Attacker brute-forces without triggering limits | Count all responses, including 4xx and 5xx |

Summary

Rate limiting is not optional — it is a fundamental security control that every production API must implement.

  • What rate limiting is: Controlling request volume per client within a time window
  • Why it matters: Prevents brute-force, credential stuffing, scraping, and DDoS
  • How to implement: Token bucket for bursts, sliding window for precision, Redis for distributed systems
  • How to test: Send bursts looking for 429 responses, try bypass techniques (IP rotation, slow drip, header spoofing)
  • How to bypass: IP rotation, header manipulation, slow sustained rates, distributed attacks
  • Best practices: Use Redis sliding window, combine IP + user rate limiting, set aggressive limits on auth endpoints

What Is Next: Full API Pentest Report

Now that you understand every major API vulnerability — from SQL injection to JWT attacks to rate limiting bypass — the final chapter ties everything together into a complete penetration testing workflow. You will learn how to chain vulnerabilities for maximum impact, write professional penetration test reports, and communicate findings to development teams and stakeholders.

Unlock Full Tutorial

This chapter is paid content. Join the project to unlock over 5000 words of deep analysis, including 10+ god-tier Prompts and real Source Code examples!