Quickselect — Find the K-th Smallest Element in Expected O(n) Time
Why Quickselect Matters
Finding the k-th smallest element in an unsorted array is a fundamental problem with many applications: finding medians, computing percentiles, filtering outliers, and database query optimization. Quickselect solves this efficiently without sorting the entire array.
Why this matters for your career:
- Quickselect is a classic interview topic (Google, Amazon, Meta)
- Understanding randomized algorithms is essential for system design
- The median-finding application is used in load balancing and data analysis
- Quickselect introduces the important concept of expected vs. worst-case runtime
What Is Quickselect?
Quickselect is a randomized selection algorithm developed by Tony Hoare (the same Hoare who created Quicksort). It finds the k-th smallest element in an unsorted array in expected O(n) time with O(n^2) worst-case time.
How Quickselect Works
- Pick a random pivot element from the array
- Partition the array so all elements smaller than the pivot come before it, and all larger elements come after
- If the pivot is at position k-1 (0-indexed), return it
- If k-1 is less than the pivot position, recurse on the left partition
- If k-1 is greater, recurse on the right partition
Unlike Quicksort which recurses on both sides, Quickselect only recurses on ONE side. This is what gives it linear expected time.
Implementation
import random
def quickselect(arr, k):
"""
Find the k-th smallest element in arr (1-indexed, so k=1 = smallest).
Expected time: O(n).
"""
if not arr:
return None
if k < 1 or k > len(arr):
raise ValueError("k out of bounds")
return _quickselect(arr, k - 1) # convert to 0-indexed
def _quickselect(arr, k_idx):
"""Internal recursive quickselect, 0-indexed k."""
if len(arr) == 1:
return arr[0]
# Randomly choose pivot
pivot = random.choice(arr)
# Partition
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
pivot_start = len(left)
pivot_end = len(left) + len(middle)
if k_idx < pivot_start:
return _quickselect(left, k_idx)
elif k_idx < pivot_end:
return pivot
else:
return _quickselect(right, k_idx - pivot_end)
Example Usage
arr = [7, 10, 4, 3, 20, 15]
# Find 3rd smallest element
result = quickselect(arr, 3)
print(f"3rd smallest: {result}") # Output: 7
# Find the median
k = (len(arr) + 1) // 2
median = quickselect(arr, k)
print(f"Median: {median}") # Output: 10
# Sorted array for verification: [3, 4, 7, 10, 15, 20]
print(f"Sorted: {sorted(arr)}")
Visualization
For arr = [7, 10, 4, 3, 20, 15] and k = 3:
- Choose random pivot, say 10
- Partition: left=[7, 4, 3], middle=[10], right=[20, 15]
- pivot_start=3, pivot_end=4, k_idx=2 (0-indexed)
- k_idx < pivot_start → recurse on left=[7, 4, 3]
- Choose pivot 4: left=[3], middle=[4], right=[7]
- pivot_start=1, pivot_end=2, k_idx=2 (still looking for index 2)
- k_idx >= pivot_end → recurse on right=[7]
- len(arr) == 1 → return 7
Time Complexity Analysis
Expected Case: O(n)
At each recursion, the algorithm works on a smaller array. On average, the pivot splits the array into two parts of roughly equal size. The recurrence is:
T(n) = T(n/2) + O(n)
By the Master Theorem, this gives O(n) expected time.
Worst Case: O(n^2)
If we are extremely unlucky and always pick the smallest or largest element as pivot, the recurrence becomes:
T(n) = T(n-1) + O(n)
This gives O(n^2). However, the probability of this happening with random pivot selection is astronomically low.
| Aspect | Expected | Worst Case | |--------|----------|------------| | Time | O(n) | O(n^2) | | Space | O(1) (iterative) | O(n) (recursive) | | Probability of worst case | 1/n^n (essentially zero) | — | | Pivot strategy | Random | Always min or max |
Comparison: Quickselect vs. Sorting
| Approach | Time | Space | Use Case | |----------|------|-------|----------| | Sort then index | O(n log n) | O(n) | Need sorted array anyway | | Quickselect | O(n) expected | O(1) | Only need one element | | Min-heap | O(n log k) | O(k) | Need k smallest elements | | Max-heap | O(n log (n-k)) | O(n-k) | Need k largest elements |
Quickselect is the best choice when you need exactly one element by rank.
Applications
| Application | How Quickselect Helps | |-------------|---------------------| | Median Finding | k = n/2 — used in statistics, load balancing | | Percentile Calculation | k = p*n/100 — used in performance analysis | | Outlier Detection | Find top 1% or bottom 1% without sorting | | Database Query Optimization | Find median for query cost estimation | | Image Processing | Median filter for noise reduction | | Streaming Data | Combined with reservoir sampling |
Iterative Version (Space-Efficient)
def quickselect_iterative(arr, k):
"""Iterative version uses O(1) extra space."""
if k < 1 or k > len(arr):
raise ValueError("k out of bounds")
k_idx = k - 1
left = 0
right = len(arr) - 1
while left <= right:
# Choose random pivot
pivot_idx = random.randint(left, right)
pivot = arr[pivot_idx]
# Three-way partition
less = [x for x in arr[left:right+1] if x < pivot]
equal = [x for x in arr[left:right+1] if x == pivot]
greater = [x for x in arr[left:right+1] if x > pivot]
# Reconstruct array section
arr[left:right+1] = less + equal + greater
pivot_start = left + len(less)
pivot_end = left + len(less) + len(equal)
if k_idx < pivot_start:
right = pivot_start - 1
elif k_idx < pivot_end:
return pivot
else:
left = pivot_end
k_idx -= pivot_end
return None
The iterative version avoids recursion overhead and stack overflow for large arrays.
Summary
Quickselect is an elegant randomized algorithm for finding the k-th smallest element in expected linear time. It partitions like Quicksort but only recurses on one side. The randomized pivot choice ensures the worst case is astronomically unlikely. Use Quickselect when you need one element by rank — it's much faster than sorting the entire array.
Key takeaways:
- Quickselect finds the k-th smallest element in expected O(n) time
- Random pivot selection makes worst-case O(n^2) extremely unlikely
- Only recurses on one side (unlike Quicksort which does both)
- Median finding is the most common application (k = n/2)
- Iterative version uses O(1) extra space
- Much faster than sorting when you only need one element
What's Next: Randomized Min-Cut
The next chapter presents Karger's randomized algorithm for finding the minimum cut in a graph — a beautiful application of random sampling to graph theory.