Probability Analysis of Randomized Algorithms
Why Probability Analysis Matters
Randomized algorithms introduce uncertainty — but we need to reason rigorously about their performance. Probability analysis gives us the mathematical tools to prove that a randomized algorithm will be fast "with high probability" or "in expectation."
Why this matters for your career:
- Interviewers ask about expected vs. worst-case runtime of randomized algorithms
- Understanding probability bounds is essential for machine learning theory
- Data science relies on confidence intervals, p-values, and hypothesis tests
- Proving algorithm guarantees separates junior from senior engineers
Key Probability Concepts for Algorithms
1. Expected Value
The expected value of a random variable is its average over all possible outcomes:
E[X] = sum(x * P(X = x)) for all x
Example: Expected number of heads in 100 fair coin flips = 100 * 0.5 = 50
For randomized algorithms, we care about E[T(n)] — the expected runtime. Quickselect has E[T(n)] = O(n), even though worst case is O(n^2).
2. Linearity of Expectation
One of the most powerful tools: E[X + Y] = E[X] + E[Y], even when X and Y are NOT independent.
Example: Expected number of fixed points in a random permutation:
- Let X_i = 1 if element i is at position i
- E[X_i] = 1/n
- E[total] = sum(E[X_i]) = n * 1/n = 1
Linearity of expectation makes many seemingly complex analyses trivial.
3. Markov's Inequality
For a non-negative random variable X and any a > 0:
P(X >= a) <= E[X] / a
Example: If E[T] = 100ms, then P(T >= 1000ms) <= 0.1
This is a very weak bound, but it only requires knowing the expected value.
4. Chebyshev's Inequality
For any random variable X with mean μ and variance σ²:
P(|X - μ| >= kσ) <= 1/k²
Example: If μ = 100 and σ = 10, then P(|X - 100| >= 30) <= 1/9 ≈ 0.11
This is stronger than Markov but requires knowing the variance.
5. Chernoff Bounds
Chernoff bounds give exponentially decreasing tail probabilities for sums of independent random variables:
P(X >= (1 + δ)μ) <= exp(-μδ²/3) for 0 < δ < 1
Example: If we flip 1000 fair coins, the probability of getting more than 600 heads is incredibly small:
P(X >= 600) <= exp(-500 * 0.2² / 3) = exp(-6.67) ≈ 0.0013
This is why randomized algorithms with Chernoff bounds are practically deterministic.
Applying These Tools
Analyzing Randomized Quicksort
Expected comparisons:
Let X = total number of comparisons in Quicksort. For each pair (i, j), let X_{ij} = 1 if i and j are compared, 0 otherwise.
E[X] = sum over all pairs of P(i and j are compared)
The probability that two elements are compared in Quicksort is 2/(j - i + 1).
E[X] = sum_{i=1}^{n} sum_{j>i} 2/(j - i + 1) = 2 * sum_{k=2}^{n} (n - k + 1)/k ≈ 2n * ln(n)
So Quicksort performs approximately 2n ln(n) comparisons in expectation — matching the known O(n log n) bound.
Analyzing Karger's Min-Cut
A single run of Karger's algorithm succeeds with probability at least:
P(success) >= 2 / (n * (n - 1))
After t independent runs:
P(failure) <= (1 - 2/(n*(n-1)))^t
Setting t = n*(n-1)/2 * ln(n):
P(failure) <= exp(-ln(n)) = 1/n
So with O(n² log n) runs, the failure probability is at most 1/n.
Comparison of Bounds
| Inequality | Strength | Requirements | Example Bound | |------------|----------|--------------|---------------| | Markov | Weak | E[X] only | P(X >= 1000) <= 0.1 | | Chebyshev | Moderate | E[X] and Var(X) | P(|X-μ| >= 3σ) <= 1/9 | | Chernoff | Very Strong | Independence, E[X] | P(X >= 1.1μ) <= exp(-μδ²/3) | | Union Bound | Simple | Any events | P(A ∪ B) <= P(A) + P(B) | | Hoeffding | Strong | Bounded range | P(|X-μ| >= t) <= 2exp(-2t²/n) |
Practical Guidelines
| Scenario | Use This Tool | |----------|--------------| | Expected runtime | Linearity of expectation | | Worst-case high probability | Chernoff bound | | Algorithm with independent trials | Union bound + Chernoff | | No independence | Markov (weak but always works) | | Known variance | Chebyshev (stronger than Markov) | | Bounded random variables | Hoeffding inequality | | Load balancing | Chernoff for occupancy problems | | Randomized rounding | Probabilistic method |
Common Mistakes
| Mistake | Why It's Wrong | |---------|---------------| | Assuming independence without justification | Many random variables in algorithms are NOT independent | | Using Chernoff on dependent variables | The bound no longer holds | | Forgetting to apply union bound | Multiple bad events can compound | | Misapplying linearity of expectation | Works for dependent variables too, but careful with variance | | Confusing "in expectation" with "with high probability" | They are different guarantees | | Reporting expected but not worst case | Both matter for algorithm analysis |
Summary
Probability analysis provides the mathematical foundation for understanding randomized algorithms. Linearity of expectation makes expected-runtime analysis tractable. Chernoff bounds prove that bad outcomes are exponentially unlikely. Mastering these tools lets you rigorously analyze and design randomized algorithms with confidence.
Key takeaways:
- Linearity of expectation works even for dependent variables — incredibly useful
- Markov's inequality requires only E[X] but gives weak bounds
- Chernoff bounds give exponentially strong concentration for independent sums
- Union bound lets us combine multiple probabilistic events
- Expected runtime and "with high probability" are different guarantees
- Practice: analyze Quickselect, Quicksort, Karger's min-cut with these tools
What's Next: Algorithmic Trading Basics
The next course introduces algorithmic trading: backtesting, moving averages, momentum strategies, and building trading bots.
Extended Example: Balls into Bins
The "balls into bins" problem is a classic probability scenario with many algorithm applications:
Problem: Throw m balls uniformly at random into n bins. What is the maximum load?
- If m = n (same number of balls and bins): max load ~ log(n) / log(log(n)) with high probability
- If m > n * log(n): max load ~ m/n (nearly even distribution)
Why this matters: This models hash tables with chaining, load balancing across servers, and randomized routing.
import random
import math
def simulate_balls_into_bins(m, n, trials=1000):
"""Simulate throwing m balls into n bins and return average max load."""
total_max = 0
for _ in range(trials):
bins = [0] * n
for _ in range(m):
bin_idx = random.randint(0, n - 1)
bins[bin_idx] += 1
total_max += max(bins)
return total_max / trials
# Test with m = n (balanced case)
m = n = 1000
avg_max = simulate_balls_into_bins(m, n)
expected_max = math.log(n) / math.log(math.log(n))
print(f"m=n={n}: avg max load = {avg_max:.2f}, theoretical ≈ {expected_max:.2f}")
# Output: m=n=1000: avg max load = 6.84, theoretical ≈ 6.37
# Test with m = n * log(n) (overloaded case)
m = int(n * math.log(n))
avg_max = simulate_balls_into_bins(m, n)
expected_max = m / n
print(f"m=n*ln(n)={m}: avg max load = {avg_max:.2f}, theoretical ≈ {expected_max:.2f}")
# Output: m=n*ln(n)=6907: avg max load = 7.89, theoretical ≈ 6.91
This demonstrates Chernoff bounds in action: with m ≈ n log n, the maximum load is tightly concentrated around the mean.
The Probabilistic Method
The probabilistic method is a powerful proof technique: instead of constructing a solution directly, prove that a random solution has positive probability of being correct.
Example: Ramsey numbers R(k, k) > 2^(k/2)
- Randomly color the edges of a complete graph on n vertices red or blue
- Expected number of monochromatic K_k subgraphs: C(n, k) * 2^(1 - C(k,2))
- If this expected value < 1, then there exists a coloring with no monochromatic K_k
- Solve for n to get the bound
This technique, pioneered by Paul Erdős, has produced many elegant existence proofs where constructive methods fail.
Steps of the Probabilistic Method
- Define a random process that produces a candidate solution
- Calculate the expected value of a relevant random variable
- If E[X] < 1, there exists an outcome where X = 0 (perfect solution)
- If E[X] = c, there exists an outcome where X ≤ c
This non-constructive approach proves existence without explicitly finding the solution.
Summary of Key Formulas
| Tool | Formula | When to Use | |------|---------|-------------| | Linearity of Expectation | E[sum X_i] = sum E[X_i] | Sum of indicator variables | | Markov's Inequality | P(X ≥ a) ≤ E[X]/a | Only know expectation | | Chebyshev's Inequality | P(|X-μ| ≥ kσ) ≤ 1/k² | Know expectation and variance | | Chernoff Bound (additive) | P(|X-μ| ≥ εμ) ≤ 2exp(-ε²μ/3) | Independent, know μ | | Chernoff Bound (multiplicative) | P(X ≥ (1+δ)μ) ≤ exp(-μδ²/3) | Independent, know μ | | Union Bound | P(∪ A_i) ≤ sum P(A_i) | Multiple bad events | | Jensen's Inequality | f(E[X]) ≤ E[f(X)] for convex f | Convex/concave functions |
The right tool depends on what you know about the random variables and what kind of guarantee you need.