And check set¶

There are a lot of questions about merge check, the official data is 30 (as of 2020-02-20), but there are some questions that are not officially labeled with join check, but it is very simple to use merge check . If you master the template for this type of question, then it will be very fast to brush this type of question, and the probability of making a mistake will be greatly reduced. This is the advantage of the template.

Here I have summarized a few and collected topics:

-547. Circle of Friends -721. Account Merge -990. Satisfaction of equations -1202. Swap elements in a string -1697. Check whether the path with edge length limitation exists

The first four questions of the above question are the connectivity of unweighted graphs, and the fifth question is the connectivity of weighted graphs. Everyone has to know both types. The keywords above are all Connectivity, and the codes are all templates. After reading the content here, it is recommended to practice with the above topics and check the learning results.

Overview¶

Union query set is a tree-type data structure, used to deal with some disjoint sets (Disjoint Sets) merge and query problems. There is a Union-find Algorithm that defines two operations for this data structure:

-Find: Determine which subset the element belongs to. It can be used to determine whether two elements belong to the same subset. -Union: Combine two sub-collections into the same collection.

Initially, each point is a connected domain, similar to the following figure:

Because of the support for these two operations, a disjoint set is often called Union-find Data Structure or Merge-find Set. In order to define these methods more precisely, it is necessary to define how to represent the collection. A commonly used strategy is to select a fixed element for each collection, called a representative, to represent the entire collection. Next, Find(x) returns the representative of the set to which x belongs, and Union uses the representatives of the two sets as parameters.

Image explanation¶

For example, there are two commanders. There are several commanders under the commander, and several division commanders under the commander. . .

How do we judge whether two division commanders belong to the same commander (connectivity)?

It's very simple. We follow the teacher, look up, and find the commander. If two division commanders find the same commander, they belong to the same commander. We use parent[x] = y to indicate that the parent of x is y. The root is found by searching for the parent and then comparing whether the roots are the same.

The above process involves two basic operations find and connnected. In addition to these two basic operations, there is also a union. That is, the two sets are merged into one.

In order to make the merged tree as balanced as possible, generally choose to mount the small tree on top of the big tree, and the following code templates will reflect this

There are two commanders in the picture:

We merge it into a Unicom domain, the easiest way is to directly point one of the commanders to the other:

The above is the visual explanation of the three core APIs find, connnected and union, let's take a look at the code implementation.

Core API¶

find¶

def find(self, x):
    while x != self.parent[x]:
        x = self.parent[x]
    return x

It can also be implemented using recursion.

def find(self, x):
    if x != self.parent[x]:
        self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    return x

(Here I have compressed the path)

For example, for the following picture:

Calling find(0) will gradually find 3, and in the process of finding 3, all nodes on the path will point to the root node.

In the extreme case, each path will be compressed. In this case, the time complexity of continuing to search is \(O(1)\).

connected¶

You can directly use the find method implemented above. If the ancestors of two nodes are the same, then they are connected.

def connected(self, p, q):
    return self.find(p) == self.find(q)

union¶

Hang one of the nodes to the ancestor of the other node, so that the ancestors of the two are the same. In other words, the two nodes are connected.

For a picture like the following:

In the figure, r:1 means the rank is 1, and r is the abbreviation of rank. The rank here actually corresponds to the size above.

If we merge 0 and 7 once. That is, union(0, 7), the following process will occur.

Code:

def union(self, p, q):
    if self.connected(p, q): return
    self.parent[self.find(p)] = self.find(q)

Unauthorized and check collection¶

In the process of doing the questions in the ordinary time, the more often encountered are the unauthorised combined check. The realization process is also simpler than that of combined search and collection with power.

Code template with path compression¶

class UF:
    def __init__(self, M):
        self.parent = {}
        self.size = {}
        self.cnt = 0
        # Initialize parent, size and cnt
        for i in range(M):
            self.parent[i] = i
            self.cnt += 1
            self.size[i] = 1

    def find(self, x):
        if x != self.parent[x]:
            self.parent[x] = self.find(self.parent[x])
            return self.parent[x]
        return x
    def union(self, p, q):
        if self.connected(p, q): return
        # Hang the small tree on the big tree to make the tree as balanced as possible
        leader_p = self.find(p)
        leader_q = self.find(q)
        if self.size[leader_p] <self.size[leader_q]:
            self.parent[leader_p] = leader_q
            self.size[leader_q] += self.size[leader_p]
        else:
            self.parent[leader_q] = leader_p
            self.size[leader_p] += self.size[leader_q]
        self.cnt -= 1
    def connected(self, p, q):
        return self.find(p) == self.find(q)

Take the right and check the collection¶

In fact, the union search set is a graph structure, and we use a hash table to simulate the relationship of this graph. The above mentioned are actually directed and unpowered graphs, so only use parent to represent the node relationship. And what if you are using a directed weighted graph? In fact, in addition to maintaining the node pointing relationship such as parent, we also need to maintain the weight of the node. A simple idea is to use another hash table weight to store the weight relationship of the node. For example, weight[a] = 1 means that the weight from a to its parent node is 1.

If it is a weighted union search set, the path compression and merging process of the query process will be slightly different, because we not only care about the change of node pointing, but also how to update the weight. such as:

ab
^ ^
| |
| |
xy

As shown above, the parent node of x is a, and the parent node of y is b. Now I need to merge x and y.

ab
^ ^
| |
| |
x -> y

Suppose the weight from x to a is w(xa), the weight from y to b is w(yb), and the weight from x to y is w(xy). After merging, it will become as shown in the figure:

a -> b
^ ^
| |
| |
xy

So why should the weights from a to b be updated? We know that w(xa) + w(ab) = w(xy) + w(yb), which means that the weight of a to b is w(ab) = w(xy) + w(yb)-w(xa).

Of course, the above relational expressions are addition, subtraction, modulus or multiplication, and division, etc. are completely determined by the topic. I am just giving an example here. In any case, this kind of operation must satisfy conductivity.

Code template with path compression¶

Here is an example of adding weighted and searched collections to describe how the code should be written.

class UF:
    def __init__(self, M):
        # Initialize parent, weight
        self.parent = {}
        self.weight = {}
        for i in range(M):
            self.parent[i] = i
            self.weight[i] = 0

   def find(self, x):
        if self.parent[x] != x:
            ancestor, w = self.find(self.parent[x])
            self.parent[x] = ancestor
            self.weight[x] += w
        return self.parent[x], self.weight[x]
    def union(self, p, q, dist):
        if self.connected(p, q): return
        leader_p, w_p = self.find(p)
        leader_q, w_q = self.find(q)
        self.parent[leader_p] = leader_q
        self.weight[leader_p] = dist + w_q-w_p
    def connected(self, p, q):
        return self.find(p)[0] == self.find(q)[0]

Typical topic:

-399. Division Evaluation

Application¶

-Check whether the graph has a ring

Idea: You only need to merge the edges and determine whether they have been connected before the merger. If they have been connected before the merger, it means that there is a ring.

Code:

uf = UF()
for a, b in edges:
    if uf.connected(a, b): return False
    uf.union(a, b)
return True

Topic recommendation:

-684. Redundant Connection -Forest Detection

-Classic algorithm of minimum spanning tree Kruskal

to sum up¶

If the topic has a connection and equivalence relationship, then you can consider the combined search set. In addition, pay attention to path compression when using the combined search set, otherwise the complexity will gradually increase as the height of the tree increases.

It is more complicated to implement weighted merge search, mainly because the path compression and merging are different, but we only need to pay attention to the node relationship and draw the following diagram:

a -> b
^ ^
| |
| |
xy

It is not difficult to see how to update the pull.

The topic template provided in this article is Western France. I use it a lot. Not only does it greatly reduce the error probability, but also the speed is much faster. The whole person is more confident. ^_