The Disjoint Set ADT¶
Equivalence Relations¶
在集合 \(S\) 上定义的关系 \(R\),如果满足以下性质,则称 \(R\) 为等价关系:
- 对称性
- 自反性
- 传递性
如果 \(x\) 和 \(y\) 等价,则称 \(x\) 和 \(y\) 处于同一等价类中,记为 \(x \sim y\)。
The Dynamic Equivalence Problem¶
Example
Given \(S = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12\}\), and 9 relations:
Algorithm:
- Elements: \(1, 2, 3, \ldots, N\)
- Sets: \(S_1, S_2, \ldots, S_k\)
不相交:\(S_i \cap S_j = \varnothing\) for \(i \neq j\)
Example 2
Basic Data Structure¶
Union¶
\(S_i \cup S_j\):让 \(S_j\) 成为 \(S_i\) 的子树,反之亦然。也就是将 \(S_j\) 根节点的父节点设为 \(S_i\) 的根节点。
约定:\(\cup\) 右边的树成为左边的子树。
Linked List¶
Array¶
S[element] = element's parent
Note: S[root] = 0
三个树:
graph TD
subgraph C
A1(10) --- B1(6)
A1 --- C1(7)
A1 --- D(8)
end
subgraph B
E(4) --- F(1)
E --- G(9)
end
subgraph A
H(2) --- I(3)
H --- J(5)
end
元素 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
指向的元素 | 4 | 0 | 2 | 10 | 2 | 10 | 10 | 10 | 4 | 0 |
Find¶
Analysis¶
通常来说,并集和查找都是成对出现!考虑并查对(Union-Find)
- \(k\mathcal{O}(d)\)
- \(k\) 个关系
- 查找:\(\mathcal{O}(d)\)
Worst case
让树非常斜:
Smart Union Algorithm¶
减小树的深度 \(d\)
- Union-by-Size: Always change the smaller tree!
子节点指向父节点,这种树不好遍历!
S[root] = -size
Lemma
Let \(T\) be a tree created by union-by-size with \(N\) nodes. Then
最坏情况是完全二叉树,最好情况是压扁的 \(N-1\) 叉树。
- \(N\) 次 Union,\(M\) 次 Find:\(\mathcal{O}(N + M \log_2 N)\)
- Union-by-Height: Always change the shallow tree!
合并后的树能否更矮?
Path Compression¶
SetType Find(ElementType X, DisjSet S)
{
if (S[X] < 0)
return X;
else
return S[X] = Find(S[X], S);
}
通过赋值操作,每次 Find
将路径上的每个节点都指向根节点
SetType Find(ElementType X, DisjSet S)
{
ElementType root, trail, lead;
for (root = X; S[root] > 0; root = S[root])
; // find root
for (trail = X; trail != root; trail = lead) {
lead = S[trail];
S[trail] = root;
} // path compression
return root;
}
路径压缩改变树的高度,与 Union-by-Height 不兼容
Union-by-Size 和 Union-by-Height 统称为 Union-by-Rank。
Worst Case for Union-by-Rank and Path Compression¶
Ackermann Function
例:\(A(4, 2) = 2^{2^{2^{2^2}}} = 2^{65536}\)
\(\alpha(M,N)\)
Lemma (Tarjan)
Let \(T(M,N)\) be the maximum time required to process an intermixed sequence of \(M \geq N\) finds and \(N-1\) unions. Then
for some positive constants \(k_1\) and \(k_2\).