title: "Kruskal & Union-Find" description: "Implement Kruskal minimum spanning tree algorithm, and build Union-Find (Disjoint Set) from scratch." order: 1
Kruskal & Union-Find
Union-Find Implementation
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py: return False
if self.rank[px] < self.rank[py]:
px, py = py, px
self.parent[py] = px
if self.rank[px] == self.rank[py]:
self.rank[px] += 1
return True
Kruskal MST
def kruskal(n, edges):
edges.sort(key=lambda x: x[2])
uf = UnionFind(n)
mst = []
total = 0
for u, v, w in edges:
if uf.union(u, v):
mst.append((u, v, w))
total += w
return mst, total
Time Complexity
With path compression + union by rank, Union-Find is nearly constant:
| Operation | Unoptimized | +Path Compression | +Rank | |-----------|:-----------:|:----------------:|:----:| | find() | O(n) | O(log n) | O(alpha(n)) | | union() | O(n) | O(log n) | O(alpha(n)) | | Kruskal total | O(m log m) | O(m log n) | O(m log n) |
alpha(n) inverse Ackermann function, <= 4 for all practical n
Kruskal vs Prim
| Aspect | Kruskal | Prim | |--------|:-------:|:----:| | Type | Edge-based | Node-based | | Best for | Sparse (m~n) | Dense (m~n^2) | | Data structure | Union-Find | Priority Queue | | Time | O(m log m) | O(m log n) | | Needs connectivity | No | Yes |
Real Applications of Union-Find
| Application | Problem | Union-Find Role | |-------------|---------|-----------------| | Social networks | A-B-C friends -> same group | Maintain connected components | | Image processing | Find connected same-color pixels | Merge adjacent pixels | | Maze generation | Open walls until start connects end | Detect when connected | | Circuit testing | Do two pins share a net? | Merge connected nodes | | Database merge | Two records same person? | Merge matching IDs |
Next Chapter: Prim Algorithm
Kruskal is edge-centered. Prim is node-centered - starting from a seed, adding the nearest node each step.
Union-Find (Disjoint Set Union)
Union-Find is a data structure that tracks elements partitioned into disjoint sets. It supports two operations:
- Find: Determine which set an element belongs to
- Union: Merge two sets into one
Implementation
class UnionFind:
"""Union-Find with path compression and union by rank."""
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
"""Find the root of x with path compression."""
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
"""Union the sets containing x and y."""
root_x, root_y = self.find(x), self.find(y)
if root_x == root_y:
return False # Already connected
# Union by rank: attach shorter tree under taller
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
elif self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1
return True
Kruskal's Algorithm
def kruskal_mst(n, edges):
"""
Kruskal's MST algorithm.
Args:
n: number of vertices
edges: list of (u, v, weight)
Returns:
(total_weight, mst_edges)
"""
uf = UnionFind(n)
edges.sort(key=lambda e: e[2]) # Sort by weight
total_weight = 0
mst_edges = []
for u, v, w in edges:
if uf.union(u, v):
total_weight += w
mst_edges.append((u, v, w))
if len(mst_edges) == n - 1:
break # MST complete
return total_weight, mst_edges
# Example
edges = [
(0, 1, 4), (0, 2, 3),
(1, 2, 1), (1, 3, 2),
(2, 3, 5),
]
n = 4
weight, mst = kruskal_mst(n, edges)
print(f"MST Weight: {weight}")
print(f"MST Edges: {mst}")
Time Complexity
| Operation | Complexity | |-----------|-----------| | Sort edges | O(E log E) | | Union operations | O(α(V)) each (inverse Ackermann) | | Find operations | O(α(V)) each | | Total | O(E log E + E α(V)) ≈ O(E log E) |
Applications of Union-Find
| Domain | Use Case | |--------|---------| | Network connectivity | Do two computers belong to the same network? | | Image processing | Find connected pixels (segmentation) | | Maze generation | Connect cells randomly until start reaches end | | Social networks | Find friend groups (connected components) | | Database merge | Determine if two records are related |
Comparison: Kruskal vs. Prim
| Aspect | Kruskal | Prim | |--------|--------|------| | Approach | Edge-based: sort all edges, add smallest | Vertex-based: grow from a seed | | Data Structure | Union-Find | Priority queue (heap) | | Best for | Sparse graphs (few edges) | Dense graphs (many edges) | | Time | O(E log E) | O(E log V) with heap | | Implementation | Medium (Union-Find) | Easy (heap + visited set) |
When Greedy Works
Both Kruskal and Prim are greedy algorithms. At each step, they make the locally optimal choice:
- Kruskal: pick the cheapest edge that doesn't create a cycle
- Prim: pick the cheapest edge connecting visited to unvisited
The reason greedy works for MST is the cut property: the lightest edge crossing any cut belongs to some MST.
Summary
Kruskal's algorithm finds the Minimum Spanning Tree by sorting edges by weight and adding the cheapest edge that doesn't create a cycle. Union-Find efficiently detects cycles. The greedy approach works because of the cut property of MSTs.
Key takeaways:
- Kruskal: sort edges, pick cheapest non-cycling edge
- Union-Find: find (path compression) + union (by rank)
- Time: O(E log E) dominated by sorting
- Kruskal is best for sparse graphs
- MST cut property: lightest edge crossing any cut is part of some MST
- Greedy works for MST because of optimal substructure
- Applications: network design, clustering, circuit design
What's Next: Huffman Coding
The next chapter covers Huffman coding — using character frequency to build optimal variable-length prefix codes for data compression.