Skip to content

Heap (On)

Hi everyone, my name is lucifer. What I bring to you today is the topic of "Heap". First up and down the outline of this article, this is a brain map I drew with mindmap, and then I will continue to improve, and gradually improve other topics.

You can also use vscode blink-mind to open the source file for viewing. There are some notes in it that you can click to open and view. The source files can be obtained by replying to the mind map on my public account "Like Plus", and the mind map will continue to update more content in the future. vscode plug-in address: https://marketplace.visualstudio.com/items?itemName=awehook.vscode-blink-mind

This series includes the following topics:

-Almost finished reading all the linked list questions, I found these things. . . -Almost finished all the tree questions, I found these things. . . -After almost finishing all the problems in Likou, I found these things. . . (This is this article)

A little bit of talk

Heap Tag There are a total of 42 questions in leetcode. In order to prepare for this topic, I reviewed almost all the heap topics of LeetCode.

It can be seen that, except for the 3 locked ones, I did all of them. By focusing on these questions, I found some interesting information, which I will share with you today.

It should be noted that this article does not distinguish between heap and priority queue. Therefore, the heap and priority queue mentioned in this article can be regarded as the same thing. If you are interested in the academic difference between the two, you can check the relevant information.

If no special instructions are given, the heap in this article refers to the small top heap.

How difficult is the pile of questions?

Heap is indeed a difficult topic. Judging from the official difficulty label, there are only 42 questions in the pile, and the difficulty is nearly 50%. There is no harm without comparison, and the difficulty of the tree topic is less than 10%.

From the perspective of the pass rate, the average pass rate of more than half of the questions is below 50%. In comparison, only less than one-third have a pass rate of less than 50%.

But don't be too pressured. Lucifer brings you a formula One Center, Two Implementations, Three Skills, Four Major Applications. We not only talk about the implementation and principles, but also the background, routines and templates of the problem.

The template involved in the article can be downloaded from my Leetcode-cheatsheet.

Heap usage scenario analysis

The heap is actually a kind of data structure. The data structure serves the algorithm. What kind of algorithm does this data structure serve? What are its applicable scenarios? This is the first problem that everyone who learns the heap needs to solve. Under what circumstances would we use the heap? What is the principle of the heap? How to implement a heap? Don't worry, this article will reveal the secrets for you one by one.

Before entering the text, I will give you a learning suggestion-Don't worry about how the heap is implemented first, let's first understand what problems the heap solves. After you understand the use background and the problem to be solved, then become a package man, directly use the ready-made heap api to solve the problem. After you understand it, let's look at the principle and implementation of the heap. This is how I learned heap, so I will share this learning experience with you here.

In order to illustrate the usage scenario of the heap, here I have a virtual scenario.

The following example is very important, and I will compare it with this example again and again later.

A registered system

Problem Description

Suppose you are a technical person in charge of a queuing registration system. The system needs to issue a queuing code (enter the queue) to each person who comes to queue, and call the number (dequeue) according to the principle of first come, first.

In addition, we can also distinguish several types of customers, namely, ordinary customers, VIP customers and supreme VIP customers.

-If different customers use different windows, how should I design and implement my system? (Everyone gets different services, for example, VIP customers are expert doctors, and ordinary customers are ordinary doctors) -If different clients all use one window, how should I design and implement my system? (Everyone gets the same service, but the priority is different. For example, if other conditions are the same (for example, they are all registered at the same time), the priority of VIP customers is higher than that of ordinary customers)

How can I design my system to meet the needs and obtain better scalability?

Preliminary solution

If different clients use different windows. Then we can design three queues to store the three kinds of people who are in the queue. This design meets the requirements of the problem and is simple enough.

If we only have one window, all patients need to use the same queue, and the same customer types follow the first come first served principle mentioned above, but different customer types may jump in the queue.

For simplicity, I introduced the concept of virtual time. Specifically:

-The virtual time of ordinary customers is the real time. -The virtual time of VIP customers is reduced by one hour based on the actual arrival time. For example, a VIP customer arrived at 14:00, I think he arrived at 13:00. -The virtual time of the Supreme VIP customer is based on the actual arrival time minus two hours. For example, a supreme VIP customer arrived at 14:00, I think he arrived at 12:00.

In this way, we only need to perform first come, first served according to the "virtual arrival time" above.

So we can continue to use the three queues just now, but the queue stores not real time, but virtual time. Every time we start calling, we use virtual time comparison, and the one with the smaller virtual time can be served first.

It is not difficult to see that the time inside the queue is in order.

The virtual time here is actually the priority weight in the priority queue, the smaller the virtual time, the greater the weight.

What can I do if I can jump in the queue?

This algorithm fulfilled our needs very well, and the complexity is quite good. But the matter is not over yet, this time we encountered new product demand:

-If a patient from another outpatient clinic is transferred to our clinic, it will be counted according to his previous queuing information. For example, if ta is registered in another hospital at 12:00, then transfer to this hospital will still be counted as registered at 12:00 . -If the called number does not answer within three minutes, it will be invalidated. But if the patient comes back later, he is considered to be the current time minus one hour of virtual time and queued again. For example, ta is the called number at 13:00, and there is no answer. If he comes back at 13:30, he is considered to be in line at 12:30 and re-enters the queue.

In this way, there is a situation of "jumping the queue". what can we do about it? A simple way is to insert it in the correct position and re-adjust the queue position of everyone behind.

The following figure shows the situation of inserting an ordinary customer who starts to queue at 1:30.

(Find the insert position)

(Insert it)

If the queue is implemented using an array, the time complexity of the above jump process is \(O(N)\), where \(N\) is the length of the queue being cut. If the team is very long, the number of adjustments increases significantly.

However, we found that, in essence, we are maintaining an ordered list. The advantage of using an array to maintain an ordered list is that it can be accessed randomly, but it is obvious that this requirement does not require this feature. If you use a linked list to implement, then the time complexity is theoretically \(O(1)\), but how to locate the position that needs to be inserted? The naive thinking is traversal search, but this time complexity has degenerated to \(O(N)\). Is there a better time complexity approach? The answer is the protagonist of this article, the priority queue.

As mentioned above, the core of the realization of the linked list is that the search also requires \(O(N)\). Can we optimize this process? In fact, this is the implementation of the linked list of priority queues. Because it is ordered, we can use the jump table to speed up the search, and the time complexity can be optimized to \(O(logN)\).

In fact, there are many similar problems in algorithm circles. For example, in the algorithm of establishing database indexes, if you add an index to a certain ordered column, you can't adjust all the data every time you insert a piece of data (implemented by the array above)? Therefore, we can use a balanced tree to achieve this, so that each insertion can adjust at most \((O(logN))\). Another implementation of priority queue-binary heap is the idea, the time complexity can also be optimized to \(O(logN)\)

This article only explains the implementation of common binary heaps, and will not talk about skip tables and red-black trees here. Regarding the implementation of the binary heap of the priority queue, we will give you a detailed introduction later. Everyone here only needs to understand what the problem solved by the priority queue is.

Use heap to solve problems

The two core APIs of the heap are push and pop.

Don't think about how it is implemented, you can temporarily imagine ta as a black box, providing two APIs:

-push: Push a piece of data, I don't care how to organize it internally. Corresponding to the queuing and jumping the queue in the scene above. -pop: Pop a data, the data must be the smallest, I don't care how to implement it internally. Corresponds to the call number in the scene above.

The example here is actually a small top pile. And if the pop-up data must be the largest, then the corresponding implementation is a large top heap.

With the help of these two APIs, the above requirements can be achieved.

# 12:00 An ordinary customer came (push)
heapq.heappush(normal_pq, '12:00')
# 12:30 A regular customer came (push)
heapq.heappush(normal_pq, '12:30')
# 13:00 A regular customer came (push)
heapq.heappush(normal_pq, '13:00')
# Jump in the queue (push). The time complexity can reach O(logN). Regardless of how to do it, we will use it first, and the specific implementation details will be discussed later.
heapq.heappush(normal_pq, '12: 20')
# Call number (pop). Those who come at 12:00 will be called first. It should be noted that the pop-up time complexity here has also become O(logN), which may be the price of happiness.
heapq.heappop(normal_pq)

Summary

The above scenario simply uses arrays and linked lists to meet the needs, but using other data structures will perform better in dealing with "queuing" situations.

Specifically:

-It is easy to take extreme values ​​if you always maintain an ordered array, but it is troublesome to jump in the queue.

-It is easy to take extreme values ​​if you always maintain an ordered linked list. However, if you want to search fast enough, instead of linear scanning, you need to use the index. This implementation corresponds to the jump table implementation of the priority queue.

-If you always maintain a tree to take the extreme value, for example, the root node is the extreme value, so O(1) can also take the extreme value, but the adjustment process requires \(O(logN)\). This implementation corresponds to the binary heap implementation of the priority queue.

A brief summary is that the ** heap is the ** that dynamically helps you find the extreme value. Use it when you need to dynamically find the maximum or minimum value. The specific implementation and complexity analysis will be discussed later. Now you only need to remember the usage scenarios, how the heap solves these problems, and the heap api.

Queue VS Priority Queue

The above example takes everyone to understand the priority queue. So before I talk about the specific implementation, I think it is necessary to answer the next question that everyone is generally concerned about, that is, is the priority queue a queue?

Many people think that queues and priority queues are completely different things, just like Java and JavaScript. I have read many articles that say so.

And I don’t think so. In fact, an ordinary queue can also be regarded as a special priority queue, which is different from most online sayings that priority queues have nothing to do with queues. I think the queue is nothing more than a priority queue with the variable of time as the priority. The earlier the time, the higher the priority, and the higher the priority, the first to leave the queue.

Everyone usually uses queues when writing BFS to help you deal with the order of node access. Is it okay to use the priority queue? of course! Let me give you an example:

Example-513. Find the value of the lower left corner of the tree

Title description

Define a binary tree and find the leftmost value in the last row of the tree.

Example 1:

enter:

    2
   / \
  1 3

Output:
1


Example 2:

enter:

        1
       / \
      twenty three
     / / \
    4 5 6
       /
      7

Output:
7


Note: You can assume that the tree (ie the given root node) is not NULL.

Ideas

We can use BFS to do a level traversal, and at each level we traverse from right to left, so that the last node of the level traversal is the node in the lower left corner of the tree.

The conventional approach is to use a double-ended queue (that is, a queue) to achieve it. Due to the first-in first-out principle of the queue, the effect of level traversal can be easily achieved.

Code

For students who don’t understand the code, don’t worry. It is much easier to look back after you have read this article in its entirety. The following is the same and will not be repeated here.

Python Code:

class Solution:
    def findBottomLeftValue(self, root: TreeNode) -> int:
        if root is None:
            return None
        queue = collections.deque([root])
        ans = None
        while queue:
            size = len(queue)
            for _ in range(size):
                ans = node = queue.popleft()
                if node.right:
                    queue.append(node.right)
                if node.left:
                    queue.append(node.left)
        return ans.val

In fact, we can also use the priority queue method, and the idea and code are almost exactly the same as above.

class Solution:
    def findBottomLeftValue(self, root: TreeNode) -> int:
        if root is None:
            return None
        queue = []
        # Heap storage triples (a, b, c), a represents the level, b represents the node number (numbered in the form of a complete binary tree, and empty nodes are also numbered), c is the node itself
        heapq.heappush(queue, (1, 1, root))
        ans = None
        while queue:
            size = len(queue)
            for _ in range(size):
                level, i, node = heapq.heappop(queue)
                ans = node
                if node.right:
                    heapq.heappush(queue, (level + 1, 2 * i + 1, node.right))
                if node.left:
                    heapq.heappush(queue, (level + 1, 2 * i + 2, node.left))
        return ans.val

Summary

**All places where queues are used can be done using priority queues, but not necessarily vice versa. **

Since the priority queue is so powerful, isn't it enough to always use the priority queue? Why haven't you seen other people using the heap when you use the queue? The core reason is that the time complexity is worse.

For example, in the above example, the original enqueue and dequeue can be easily completed in the time of \(O(1)\). And now? The complexity of enqueue and dequeue is both \(O(logN)\), where N is the size of the current queue. Therefore, using the heap where it is not necessary will greatly increase the time complexity of the algorithm, which is of course inappropriate. To be vulgar is to take off your pants and fart.

But is BFS really not implemented with priority queues? of course not! For example, for the shortest path problem with weighted graphs, if you use queues for BFS, you need priority queues, because there are weight differences between paths. Isn't this the original intention of priority queue design? The typical implementation of BFS using priority queues is the dijkstra algorithm.

This once again applied to my sentence Queue is a special priority queue. The weight of the special to everyone is determined according to the order of arrival, and whoever comes first has the higher priority. In this special case, we do not have to maintain the heap to complete, and thus obtain better time complexity.

One Center

There is only one core point of the heap problem, that is, dynamic extremum. Both dynamic and extreme values ​​are indispensable.

It's easier to understand how to find extreme values. It's nothing more than finding the maximum or minimum, but dynamic is not the case. For example, if you ask for the k-th smallest number in an array, is this dynamic? It really depends on how you understand it. In our case, this situation is dynamic.

How to understand the above example is dynamic?

You can think so. Because the heap can only find the extreme value. For example, the minimum value can be obtained, but the k-th smallest value cannot be obtained directly.

Then do we first find the smallest value and then dequeue it (corresponding to the call number in the above example). Then continue to find the smallest value, this time the second smallest value is sought. If the kth is required to be smaller, then repeat this k times.

In this process, you will find that the data is dynamically changing, which corresponds to the change in the size of the heap.

Next, we use a few examples to illustrate.

Example One-1046. The weight of the last stone

Title description

There is a pile of stones, and the weight of each stone is a positive integer.

Each round, choose the two heaviest stones from them, and then smash them together. Suppose the weight of the stone is x and y, and x <= y. The possible results of crushing are as follows:

If x == y, then both stones will be completely crushed;
If x != y, then the stone of weight x will be completely crushed, and the new weight of the stone of weight y is yx.
In the end, at most only one stone will remain. Returns the weight of this stone. If there are no stones left, return 0.



Example:

Input: [2,7,4,1,8,1]
Output: 1
Explanation:
First select 7 and 8 to get 1, so the array is converted to [2,4,1,1,1],
Then select 2 and 4 to get 2, so the array is converted to [2,1,1,1],
Followed by 2 and 1, we get 1, so the array is converted to [1,1,1],
Finally, 1 and 1 are selected, and 0 is obtained. The final array is converted to [1], which is the weight of the last remaining stone.


prompt:

1 <= stones.length <= 30
1 <= stones[i] <= 1000

Ideas

The question is relatively simple, just simulate it directly. It should be noted that every time the two heaviest stones are selected for crushing, the weight of the heaviest stone changes. This affects the next picking of the heaviest stone. Simply put, the heaviest stone is dynamically changed during the simulation.

This kind of dynamic extreme value scenario using heap is very suitable.

Of course, look at the data range 1 <= stones.length <= 30 and 1 <= stones[i] <= 1000, and it should be possible to use the counting method.

Code

Java Code:

import java.util.PriorityQueue;

public class Solution {

    public int lastStoneWeight(int[] stones) {
        int n = stones.length;
        PriorityQueue<Integer> maxHeap = new PriorityQueue<>(n, (a, b) -> b-a);
        for (int stone: stones) {
            maxHeap.add(stone);
        }

        while (maxHeap.size() >= 2) {
            Integer head1 = maxHeap.poll();
            Integer head2 = maxHeap.poll();
            if (head1.equals(head2)) {
                continue;
            }
            maxHeap.offer(head1-head2);
        }

        if (maxHeap.isEmpty()) {
            return 0;
        }
        return maxHeap.poll();
    }
}

Example 2-313. Super Ugly Numbers

Title description

Write a program to find the nth super ugly number.

A super ugly number means that all prime factors are positive integers in primes, a list of prime numbers of length k.

Example:

Input: n = 12, primes = [2,7,13,19]
Output: 32
Explanation: Given a prime number list of length 4 primes = [2,7,13,19], the first 12 super ugly number sequence is: [1,2,4,7,8,13,14,16,19, 26,28,32].
Description:

1 is the super ugly number for any given primes.
 The numbers in the given primes are sorted in ascending order.
0 <k ≤ 100, 0 <n ≤ 10^6, 0 <primes[i] <1000.
The nth super ugly number is guaranteed to be within the range of 32-bit signed integers.

Ideas

This question seems to have nothing to do with dynamically seeking extreme values. Actually not, let us analyze this topic.

Can we generate a lot of ugly numbers, for example, first generate N ugly numbers from small to large, and then directly take the Nth number?

Take this question as an example. The question has a data range limit 0 <n ≤ 10^6, so do we pre-generate a super ugly number array with a size of \(10^6\) so that we can pass $O( 1) The Nth super ugly number is obtained in the time of $.

The first problem is the waste of time and space. We don't actually need to calculate all the super ugly numbers every time, because the preprocessing space and time are very poor.

The second question is, how do we generate the super ugly numbers that \(10^6\) thinks?

Through the definition of ugly numbers, we can know that super ugly numbers must be written in the following form.

if primes = [a,b,c,....]
then f(ugly) = a * x1 * b * x2 * c * x3 ...
Among them, x1, x2, and x3 are all positive integers.

May wish to simplify the problem first. Consider the example given by the title: [2,7,13,19].

We can use four pointers to deal with. Just look at the code:

public class Solution {
    public int solve(int n) {
        int ans[]=new int[n+5];
        ans[0]=1;
        int p1=0, p2=0, p3=0, p4=0;
        for(int i=1;i<n;i++){
            ans[i]=Math.min(ans[p1]*2,Math.min(ans[p2]*7,Math.min(ans[p3]*13,ans[p4]*19)));
            if(ans[i]==ans[p1]*2) p1++;
            if(ans[i]==ans[p2]*7) p2++;
            if(ans[i]==ans[p3]*13) p3++;
            if(ans[i]==ans[p3]*19) p4++;
        }
        return ans[n-1];
    }
}

I call this technique multi-way merge (I can't think of any good names), and I will also use the heap to optimize this method in the following three techniques.

Since the pointers here are dynamic, the number of pointers is actually the same as the length of the primes array given by the title. Therefore, in fact, we can use the form of memoized recursion to complete, recursive body and recursive stack maintain an iteration variable separately. This problem can actually be seen as a state machine, so it is intuitive to use dynamic programming to solve it. Here, I will introduce a heap solution, which is simpler and more intuitive than dynamic programming.

Regarding the state machine, I have an article here Original state machine can also be used to brush LeetCode? , you can refer to it.

In fact, we can dynamically maintain a current minimum super ugly number. Find the first one, we remove it, and then find the next smallest super ugly number** (that is, the second smallest super ugly number in the world). In this way, after n rounds, we get the nth smallest super ugly number. This kind of dynamic maintenance of extreme values ​​is exactly where the heap comes in.

Do you think it is similar to the topic of the stone above?

Take the example given by the title [2,7,13,19].

  1. Put [2,7,13,19] into the heap in turn.
  2. Make a pile of numbers, which is 2. At this time, the first super ugly number was obtained.
  3. Then put the product of 2 and [2,7,13,19], which is [4,14,26,38] into the heap in turn.
  4. Repeat this until the nth super ugly number is obtained.

The correctness of the above is unquestionable, because each time the smallest heap can be taken, we will also remove the smallest from the heap every time. Therefore, taking n times is naturally the nth largest super ugly number.

The solution of the heap is not too difficult, the only thing that needs to be paid attention to is the deduplication. For example, 2 * 13 = 26, and 13 * 2 is also 26. We cannot stack 26 twice. The solution is also very simple:

-Either use the hash table to record all the numbers that have been taken out, and no longer take the numbers that have been taken out. -Another method is to record the number taken out last time. Since the number taken out is not strictly increasing according to the number size, you only need to compare the number taken out last time with the number taken out this time. .

Needless to say which method to use?

Code

Java Code:

class Solution {
    public int nthSuperUglyNumber(int n, int[] primes) {
        PriorityQueue<Long> queue=new PriorityQueue<>();
        int count = 0;
        long ans = 1;
        queue.add(ans);
        while (count <n) {
            ans=queue.poll();
            while (!queue.isEmpty() && ans == queue.peek()) {
                queue.poll();
            }
            count++;
            for (int i = 0; i <primes.length; i++) {
                queue.offer(ans * primes[i]);
            }
        }
        return (int)ans;
    }
}

Ans initialized to 1 is equivalent to a virtual head, which only simplifies the operation

Summary

There is only one center of the heap, which is dynamic extreme value.

The extreme value is nothing more than the maximum value or the minimum value, which is not difficult to see. If you want the maximum value, we can use the large top heap, and if we want the minimum value, we can use the smallest heap.

In fact, if there is no dynamic word, there is no need to use the heap in many cases. For example, you can traverse directly to find the largest one. The dynamic is not easy to see, this is the difficulty of the problem. This requires you to analyze the problem first, and analyze that this problem is actually seeking extreme values ​​dynamically. Then the use of heap for optimization should be thought of. There are many similar examples, and I will give you more explanations in the following sections.

Two implementations

Several implementations of the heap are briefly mentioned above. Here are two common implementations, one is based on linked list implementation-jump list, and the other is based on array implementation-binary heap.

Using the implementation of the jump table, if your algorithm is not carefully crafted, the performance will be relatively unstable, and the memory usage will increase significantly when the amount of data is large. Therefore, we only describe the implementation of the binary heap in detail, and for the implementation of the jump table, only the basic principles are described, and the more detailed content such as the code implementation is not discussed here because of the relative bias.

Jump table

The jump table is also a data structure, so ta actually serves a certain algorithm.

Although the frequency of skip meters in interviews is not large, in the industry, skip meters are often used. There is only one question about jumping tables in Likou. However, the design idea of ​​the skip meter is worthy of our study and thinking. There are many algorithms and data structure skills worth learning. For example, the idea of ​​space for time, such as the issue of efficiency.

As mentioned above, dealing with the problem of skipping the queue is the first issue that should be considered in the design of the reactor. How does the heap jump table implementation solve this problem?

We know that to find a value in the linked list without using extra space, you need to find one by one in order, and the time complexity is \(O(N)\), where N is the length of the linked list.

(Single list)

When the length of the linked list is large, this kind of time is difficult to accept. A common optimization method is to build a hash table, put all nodes in the hash table, and reduce the time complexity by changing space for time. The time complexity of this approach is \(O(1 )\), but the space complexity is O(N)$.

(Single linked list + hash table)

In order to prevent the problems caused by duplicate nodes in the linked list, we need to serialize the nodes and build a hash table. This space occupancy will be higher. Although it is only an increase in the coefficient level, the overhead is not small. More importantly, the hash table cannot solve the problem of finding extreme values, it is only suitable for obtaining content based on the key.

In order to solve the above problems, jump tables came into being.

As shown in the figure below, we extract every two elements in the linked list and add a first-level index. The first-level index points to the original linked list, that is, the 7 of the original linked list can be found through the down pointer of the first-level index 7. How to find 10?

Note that this algorithm requires the linked list to be in order.

(Create a first-level index)

We can:

-Searching for 7 in the current level jump table, it is found that the next 18 is greater than 10, which means that the 10 we are looking for is between the two. -Go back to the original linked list through the down pointer, and find 10 through the next pointer of the original linked list.

This example shows no performance improvement. But if the elements continue to increase, continue to increase the number of levels of the index, and establish secondary and tertiary levels. . . Indexes enable the linked list to achieve binary search, thereby achieving better efficiency. But correspondingly, we need to pay the price of extra space.

(Increase the number of index levels)

After understanding the above points, you can visually imagine the jump table as a save of a game.

A game has 10 levels. If I want to play a certain place in level 5, then I can start directly from level five, which is faster than starting from level one. We can even set up many archives at the same time for each level. In this way, if I want to play a certain place in level 5, I don’t need to start from the beginning of level 5, but directly select save closer to the place you want to play, which is equivalent to jumping the watch Secondary index.

The time complexity and space complexity of skip tables are not well analyzed. Since the time complexity = the height of the index* The average number of elements traversed by each level of index, and the height is about \(logn\), and the elements traversed at each level are constant, so the time complexity is \(logn\), and binary search The space complexity is the same.

The space complexity is equivalent to the number of index nodes. Taking an index for every two nodes as an example, it is probably n/2 + n/4 + n/8 +… + 8 + 4 + 2, so the space complexity is \(O(n)\). Of course, if you create an index node every three, the space will be saved, but the complexity will remain the same.

After understanding the above content, it is not difficult to use the jump table to implement the heap.

-In the heap operation, you only need to insert into the linked list according to the index and update the index (optional). -For out of the heap, you only need to delete the head (or tail) and update the index (optional).

If you want to check whether there is a problem with your own implementation, you can go to Leetcode's 1206. Design Skip List check.

Next, let's look at the next more common implementation-binary heap.

Two fork pile

For the implementation of binary heap, we only explain the two core operations: heappop (out of the heap) and heappush (into the heap). I won’t explain the other operations anymore, but I believe you know these two core operations, and the others should not be difficult.

After the implementation, the effect of use is probably like this:

h = min_heap()
h.build_heap([5, 6, 2, 3])

h.heappush(1)
h.heappop() # 1
h.heappop() # 2
h.heappush(1)
h.heappop() # 1
h.heappop() # 3

Fundamental

Essentially, a binary heap is a special complete binary tree. Its particularity is only reflected in one point, that is, the weight of the parent node is not greater than the weight of the son (small top pile).

(A small top pile)

The above sentence needs everyone to remember that everything comes from the above sentence.

Since the weight of the parent node is not greater than the weight of the son (small top heap), it is natural to deduce that the root node of the tree is the minimum value. This plays the role of the extreme value of the heap.

What about dynamics? How does the binary heap do it?

Out of the heap

If I remove the root node of the tree from the heap, isn't the root node vacant? I should replace the second smallest. How to replace it? Everything is still the same sentence The weight of the parent node is not greater than the weight of the son (small top pile).

If just delete, then one heap will become two heaps, the problem becomes more complicated.

(After the heap is shown in the figure above, two new heaps will be generated)

A common operation is to swap the root node with the last node. However, the new root node may not satisfy ** the weight of the parent node is not greater than the weight of the son (small top pile) **.

As shown in the figure below, after we exchange the 2 of the root node and the number at the tail, the heap property is not satisfied at this time.

At this time, you only need to sink the new root node to the correct position. The correct position here refers to the sentence The weight of the parent node is not greater than the weight of the son (small top pile). If this is not satisfied, we continue to sink until we are satisfied.

We know the sinking process of the root node. In fact, there are two directions to choose from. Should it sink to the left child node? Or sink to the right child node? In the case of a small top pile, the answer should be sinking to a smaller child node, otherwise the correct answer will be missed. Taking the above heap as an example, if it sinks to the right child node 4, then the correct heap top 3 cannot be obtained. So we need to sink to the left child node.

After sinking to the position shown in the figure, it is still not satisfied that the weight of the parent node is not greater than the weight of the son (small top pile), so we continue to perform the same operation.

Some students may have questions. The heap before the pop-up root node satisfies the nature of the heap, but after the pop-up, after the sink operation you mentioned above, must it still be satisfied?

The answer is yes. This is not difficult to understand. Since the last leaf node is mentioned as the root node, it is uncertain where it will end up, but after the above operation, we can see:

-The nodes on the sinking path must all satisfy the nature of the heap. -The nodes that are not on the sinking path maintain the relative relationship before the heap, so they also satisfy the nature of the heap.

Therefore, after the root node is ejected, the nature of the heap must still be satisfied after the above sinking operation.

In terms of time complexity, it can be proved that sinking is positively correlated with the height of the tree, so the time complexity is \(logh\), where h is the height of the tree. Since the binary heap is a complete binary tree, the height of the tree is approximately \(logN\), where N is the number of nodes in the tree.

Into the heap

The heap is similar to the heap. We can insert a node directly to the end of the tree. Similar to the above, such an operation may also destroy the nature of the heap.

One of the reasons for this is that the time complexity is lower, because we use an array for simulation, and the time complexity of adding an element to the end of the array is \(O(1)\).

This time we found that the node that does not satisfy the heap is currently the tail node of the node that has just been inserted, so the sink operation cannot be performed. This time we need to perform a float operation.

Leaf nodes can only float up (the root node can only sink, other nodes can either sink or float)

Basically similar to the above, if the nature of the heap is not satisfied, we exchange it with the parent node (float up), and continue this process until the nature of the heap is satisfied.

(Floating for the first time, still does not meet the heap characteristics, continue to float)

(Satisfying the heap characteristics, the floating process is completed)

After such operations, it is still a heap that satisfies the nature of the heap. The proof process is similar to the above and will not be repeated here.

It should be noted that since floating ** only needs to compare the current node with the parent node, ** it is simpler because it saves the process of judging which of the left and right child nodes is smaller.

Implementation

It is very convenient to use an array to achieve a complete binary tree. because:

-If the subscript of the node in the array is i, then the subscript of the left child node is \(2 \times i\), and the right node is \(2 \times i\)+1. -If the subscript of the node in the array is i, then the subscript of the parent node is i//2 (the floor is divided).

Of course this requires your array to store data starting from 1. If not, the above formula can actually be fine-tuned to achieve the same effect. However, this is an industry habit, and it is better for us to be consistent with the industry. Another advantage of storing from 1 is that we can vacate the position where the index is 0 to store information such as heap size. This is the practice in some university textbooks, and you can just understand it.

As shown in the figure is a complete binary tree and the array representation of the tree.

(Pay attention to the correspondence of the array index)

From a visual point of view, we can draw the following corresponding relationship diagram:

In this way, is it almost the same as the tree above? Is it easier to understand?

The process of floating and sinking has already been discussed above. Just now I also talked about the relationship between the parent-child node coordinates. Then the code is ready to come out. Let's get to the core code implementation of Floating and Sinking.

Fake code:

// x is the element to float up, starting from the bottom of the tree to float up
private void shift_up(int x) {
  while (x> 1 && h[x]> h[x / 2]) {
    // swqp is to exchange the values ​​of the two positions of the array
    swap(h[x], h[x / 2]);
    x /= 2;
  }
}
// x is the element to sink, sinking from the top of the tree
private void shift_down(int x) {
  while (x * 2 <= n) {
    // minChild is to get the index of the smaller child node and return
    mc = minChild(x);
    if (h[mc] <= h[x]) break;
    swap(h[x], h[mc]);
    x = mc;
  }
}

Here is the Java language as an example to describe the writing of the code. Binary heap implementations in other languages ​​can be obtained from my cheatsheet plug-in leetcode-cheatsheet. The way to get the plug-in is in the official account Like Plus, just reply to the plug-in.

import java.util.Arrays;
import java.util.Comparator;

/**
 * Use a complete binary tree to build a heap
 * The starting point of the precondition is 1
 * Then the child nodes are i <<1 and i<<1 + 1
 * The core method is
 * shiftdown exchange sink
 * shiftup exchange float up
 * <p>
 * build build heap
 */

public class Heap {

    int size = 0;
    int queue[];

    public Heap(int initialCapacity) {
        if (initialCapacity <1)
            throw new IllegalArgumentException();
        this.queue = new int[initialCapacity];
    }

    public Heap(int[] arr) {
        size = arr.length;
        queue = new int[arr.length + 1];
        int i = 1;
        for (int val: arr) {
            queue[i++] = val;
        }
    }

    public void shiftDown(int i) {

        int temp = queue[i];

        while ((i << 1) <= size) {
            int child = i << 1;
            // child!=size determines whether the current element contains the right node
            if (child != size && queue[child + 1] <queue[child]) {
                child++;
            }
            if (temp> queue[child]) {
                queue[i] = queue[child];
                i = child;
            } else {
                break;
            }
        }
        queue[i] = temp;
    }


    public void shiftUp(int i) {
        int temp = queue[i];
        while ((i >> 1)> 0) {
            if (temp <queue[i >> 1]) {
                queue[i] = queue[i >> 1];
                i >>= 1;
            } else {
                break;
            }
        }
        queue[i] = temp;
    }

    public int peek() {

        int res = queue[1];
        return res;
    }

    public int pop() {

        int res = queue[1];

        queue[1] = queue[size--];
        shiftDown(1);
        return res;
    }

    public void push(int val) {
        if (size == queue.length-1) {
            queue = Arrays.copyOf(queue, size << 1+1);
        }
        queue[++size] = val;
        shiftUp(size);
    }

    public void buildHeap() {
        for (int i = size >> 1; i >= 0; i--) {
            shiftDown(i);
        }
    }

    public static void main(String[] args) {

        int arr[] = new int[]{2,7,4,1,8,1};
        Heap heap = new Heap(arr);
        heap.buildHeap();
        System.out.println(heap.peek());
        heap.push(5);
        while (heap.size> 0) {
            int num = heap.pop();
            System.out.printf(num + "");
        }
    }
}

Summary

There are many implementations of the heap. For example, jump tables based on linked lists, binary heaps based on arrays, and implementations based on red-black trees. Here we describe the implementation of the binary heap in detail. Not only is it simple to implement, but it also performs well in many situations. It is recommended that you focus on the implementation of the binary heap.

For the realization of the binary heap, the core point is that it always maintains the nature of the heap. What is the specific nature? That is ** the weight of the parent node is not greater than the weight of the son (small top pile) **. In order to achieve this goal, we need to use floating and sinking operations when entering and exiting the heap, and properly complete the element exchange. Specifically, the floating process exchanges with the larger parent node, and the sinking process exchanges with the smaller of the two child nodes. Of course, the premise is that it has child nodes and the child nodes are smaller than it.

We have not done a detailed analysis on heaping. But if you understand the heap operation in this article, it's actually very easy. Therefore, heapization itself is a process of continuously entering the heap, but it turns a discrete operation in time into a one-time operation.

Preview

This article is expected to be published in two parts. This is the first part, and the following content is more dry, which are Three Tips and Four Applications.

-Three tips

  1. Multiple merge
  2. Fixed heap
  3. After the fact, the little Zhuge

-Four major applications

  1. topK
  2. Weighted shortest distance
  3. Factorization
  4. Heap sort

These two topics are specifically designed to teach you how to solve problems. Once you have mastered it, most of the heap topics in Leikou are not a problem (of course, I only refer to the part of the topic that involves the heap).

If you have any thoughts on this, please leave me a message. I will check the answers one by one when I have time. More algorithm routines can visit my LeetCode problem solution warehouse: https://github.com/azl397985856/leetcode. Currently it has 37K stars. You can also pay attention to my public account "Like Plus" to take you to the hard bones of algorithm.