Heap (below)¶

Last time I did a small survey for everyone on my public account "cast the programming language you want to solve the problem~". The following are the results of the survey:

Voting Results

As for the others, most of them are Go language.

! What did the other people write?

Since Java and Python account for more than 60%, I will try to write in both Java and Python this time. Thanks to @CaptainZ for the Java code. At the same time, in order to not make the article smelly and long, I put all the code (Java and Python) of the Java article on the official website of Leetcode.com, the website address: https://leetcode-solution.cn/solution- code

If you don't use the Internet scientifically, it may be slow to open.

Body¶

Hi everyone, my name is lucifer. What I bring to you today is the topic of "Heap". First up and down the outline of this article, this is a brain map I drew with mindmap, and then I will continue to improve, and gradually improve other topics.

You can also use vscode blink-mind to open the source file for viewing. There are some notes in it that you can click to open and view. The source files can be obtained by replying to the mind map on my public account "Like Plus", and the mind map will continue to update more content in the future. vscode plug-in address: https://marketplace.visualstudio.com/items?itemName=awehook.vscode-blink-mind

This series includes the following topics:

-Almost finished reading all the linked list questions, I found these things. . . -Almost finished all the tree questions, I found these things. . . -Almost finished all the puzzles of Likou, I found these things. . . (First bullet)

This is the second part. Students who have not read the previous part strongly recommend to read the first part [I have almost finished all the piles of questions, and I found these things. . . (First bullet))(https://lucifer.ren/blog/2020/12/26/heap/)

This is the second part, and the following content is more dry, which are Three Tips and Four Applications. These two topics are specifically designed to teach you how to solve problems. Once you have mastered it, most of the heap topics in Leikou are not a problem (of course, I only refer to the part of the topic that involves the heap).

Warning: The topics in this chapter are basically hard difficulty. This is because many of the pile topics are not easy to mark. This is also introduced in the previous section.

A little explanation¶

Before serving the main course, let’s give everyone an appetizer.

Here are two concepts to introduce to you, they are Tuple and Simulated Big Top Reactor. The reason for these explanations is to prevent everyone from not understanding them later.

Tuple¶

Using the heap can not only store a single value, for example, [1,2,3,4] of 1, 2, 3, 4 are all single values. In addition to single values, compound values can also be stored, such as objects or tuples.

Here we introduce a way to store tuples, this technique will be widely used later, please be sure to master it. For example [(1,2,3), (4,5,6), (2,1,3),(4,2,8)].

h = [(1,2,3), (4,5,6), (2,1,3),(4,2,8)]
heapq.heappify(h) # Heaping (small top heap)

heapq.heappop() # Pop (1,2,3)
heapq.heappop() # pop(2,1,3)
heapq.heappop() # Pop (4,2,8)
heapq.heappop() # Pop (4,5,6)

The graph to represent the heap structure is as follows:

Small top heap using tuples

Briefly explain the execution result of the above code.

Using the tuple method, the first value of the tuple is used as the key to compare by default. If the first one is the same, continue to compare the second one. For example, the above (4,5,6) and (4,2,8), because the first value is the same, so continue to compare the latter, and because 5 is greater than 2, so (4,2,8) out of the heap first .

Using this technique has two effects:

Bring some additional information. For example, I want to find the kth decimal in a two-dimensional matrix, of course using the value as the key. But the processing needs to use its row and column information, so it is appropriate to use tuples, such as (val, row, col).
I want to sort by two keys, one primary key and one secondary key. There are two typical usages here,

2.1 One is that both are in the same order, such as both in order or both in reverse order.

2.2 The other is sorting in two different orders, that is, one is the reverse order and the other is the order.

Due to space reasons, the details will not be discussed here. You can pay attention to it in the process of doing the questions. I will open a separate article to explain it when I have the opportunity.

If the programming language you are using does not have a heap or the implementation of the heap does not support tuples, you can also make it support through simple transformations, mainly by customizing the comparison logic.

Simulate big top pile¶

Because Python does not have a large top heap. Therefore, I used a small top heap for simulation implementation. That is to say, all the original numbers are reversed. For example, if the original number is 5, then -5 is added to the pile. After this treatment, the small top pile can be used as a large top pile. But it should be noted that when you pop out, remember to reverse it and restore it back.

Code example:

h = []
A = [1,2,3,4,5]
for a in A:
    heapq.heappush(h, -a)
-1 * heapq.heappop(h) # 5
-1 * heapq.heappop(h) # 4
-1 * heapq.heappop(h) # 3
-1 * heapq.heappop(h) # 2
-1 * heapq.heappop(h) # 1

It is as follows to show with a picture:

Small top pile simulation large top pile

That's it for the foreshadowing, and then move on to the topic.

Three tips¶

Tip One-Fixed Heap¶

This technique refers to the fixed heap size k unchanged, and the code can be implemented by ** each pop out and push in one . Since the initial heap may be 0, we just need to push into the heap one by one to reach the heap size of k, so strictly speaking, it should be **maintain the size of the heap not to be greater than k.

A typical application of fixed heap is to find the k-th smallest number. In fact, the easiest way to find the k-th smallest number is to build a small top pile, put all the numbers into the pile first, and then out of the pile one by one, for a total of k times. The last time that the heap is released is the k-th smallest number.

However, we can also create a large top heap (not the small top heap above) instead of adding all of them to the heap first, and maintain the size of the heap at k. If the heap size is greater than k after the new number is added to the heap, you need to compare the number at the top of the heap with the new number, and remove the larger number. This can ensure that the number in the heap is the smallest k in the whole number, and the largest of the smallest k (that is, the top of the heap) is not the k-th smallest one? This is the reason for choosing to build a large top pile instead of a small top pile.

Fixed large top pile to find the 5th smallest number

A simple one-sentence summary is that fixing a large top heap of size k can quickly find the k-th smallest number, while fixing a small top heap of size k can quickly find the k-th largest number. For example, Leetcode 2020-02-24 Weekly Contest Question 3 5663. Find the Kth largest XOR coordinate value can be achieved with the fixed small top pile technique (this question lets you find the k-th largest number).

In this way, your feelings may not be strong. Next, I will give you two examples to help you deepen your impression.

295. The median of the data stream¶

Title description¶

The median is the number in the middle of the ordered list. If the length of the list is even, the median is the average of the middle two numbers.

E.g,

The median of [2,3,4] is 3

The median of [2,3] is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

void addNum(int num)-Add an integer from the data stream to the data structure.
double findMedian()-Returns the median of all current elements.
Example:

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3)
findMedian() -> 2
Advanced:

If all integers in the data stream are in the range of 0 to 100, how would you optimize your algorithm?
If 99% of the integers in the data stream are in the range of 0 to 100, how would you optimize your algorithm?

Ideas¶

This question can actually be seen as a special case of finding the k-th smallest number.

-If the length of the list is odd, then k is (n + 1) / 2, and the median is the kth number. For example, n is 5, k is (5 + 1)/ 2 = 3. -If the length of the list is even, then k is (n + 1) / 2 and (n + 1) / 2 + 1, and the median is the average of these two numbers. For example, n is 6, k is (6 + 1) / 2 = 3 and (6 + 1) / 2 + 1 = 4.

Therefore, we can maintain two fixed heaps, the size of the fixed heaps is $(n + 1) \div 2$ and $n-(n + 1)\div2$, that is, the size of the two heaps max The difference is 1, more specifically, $ 0 <= (n + 1) \div 2-(n-(n + 1) \div 2) <= 1$.

Based on the above-mentioned knowledge, we can:

-Create a large top pile and store the smallest number of $(n + 1) \div 2$, so that the top number of the pile is the smallest number of $(n + 1) \div 2$, which is the odd number case median. -Create a small top pile and store the largest number of n-$(n + 1) \div 2$, so that the top number of the pile is the n-th largest number-$(n + 1) \div 2$, combined For the large top pile above, the median of the even number can be found.

With such a knowledge, all that remains is how to maintain the size of the two heaps.

-If the number of large top piles is less than that of small top piles, the smallest of the small top piles is transferred to the large top pile. And because the small top pile maintains the maximum number of k, and the large top pile maintains the minimum number of k, the top of the small top pile must be greater than or equal to the top of the large top pile, and the tops of these two piles are ** The median of ** at this time. -If the number of large top piles is 2 more than the number of small top piles, then the largest of the large top piles is transferred to the small top pile for the same reason as above.

At this point, you may have understood why two heaps were created separately, and a large top heap and a small top heap were needed. The reason for this is as described above.

The application of fixed heap is more than that, let's continue to look at a question.

Code¶

class MedianFinder:
    def __init__(self):
        self.min_heap = []
        self.max_heap = []
    def addNum(self, num: int) -> None:
        if not self.max_heap or num <-self.max_heap[0]:
            heapq.heappush(self.max_heap, -num)
        else:
            heapq.heappush(self.min_heap, num)
        if len(self.max_heap)> len(self.min_heap) + 1:
            heappush(self.min_heap, -heappop(self.max_heap))
        elif len(self.min_heap)> len(self.max_heap):
            heappush(self.max_heap, -heappop(self.min_heap))
    def findMedian(self) -> float:
        if len(self.min_heap) == len(self.max_heap): return (self.min_heap[0]-self.max_heap[0]) / 2
        return -self.max_heap[0]

(Code 1.3.1)

857. The lowest cost of hiring K workers¶

Title description¶

There are N workers. The work quality of the i-th worker is quality[i], and its minimum expected wage is wage[i].

Now we want to hire K workers to form a salary group. When hiring a group of K workers, we must pay them wages in accordance with the following rules:

Each worker in the wage group shall be paid according to the ratio of the quality of his work to that of other workers in the same group.
Every worker in the wage group should receive at least their minimum expected wage.
Return how much money is required to form a salary group that meets the above conditions.



Example 1:

Input: quality = [10,20,5], wage = [70,50,30], K = 2
Output: 105.00000
Explanation: We pay 70 to worker 0 and 35 to worker 2.
Example 2:

Input: quality = [3,1,10,10,1], wage = [4,8,2,2,7], K = 3
Output: 30.66667
Explanation: We pay 4 to worker 0 and 13.33333 to worker 2 and 3.


prompt:

1 <= K <= N <= 10000, where N = quality.length = wage.length
1 <= quality[i] <= 10000
1 <= wage[i] <= 10000
Answers that are within 10^-5 of the correct answer will be considered correct.

Ideas¶

The problem requires us to select k people and pay the wages in proportion to the quality of their work and the quality of other workers in the same group, and each worker in the wage group should receive at least their minimum expected wage.

In other words, the quality of work and wage ratio of k people in the same group can be a fixed value in order to make the wages paid the least. Please understand this sentence first, the rest of the content is based on this premise.

We might as well set an index work efficiency, whose value is equal to q / w. As mentioned earlier, the q / w of these k individuals must be the same to ensure the minimum wage, and this q / w must be the lowest (short board) of these k individuals, otherwise some people must not get the minimum expected wage.

So we can write the following code:

class Solution:
    def mincostToHireWorkers(self, quality: List[int], wage: List[int], K: int) -> float:
        eff = [(q / w, q, w) for a, b in zip(quality, wage)]
        eff.sort(key=lambda a: -a[0])
        ans = float('inf')
        for i in range(K-1, len(eff)):
            h = []
            k = K-1
            rate, _, total = eff[i]
            # Find out k people whose work efficiency is higher than it, and the salary of these k people is as low as possible.
            # Since the work efficiency has been sorted in reverse order, the previous ones are higher than it, and then the k lowest wages can be obtained by using the heap.
            for j in range(i):
                heapq.heappush(h, eff[j][1] / rate)
            while k> 0:
                total += heapq.heappop(h)
                k -= 1
            ans = min(ans, total)
        return ans

(Code 1.3.2)

This approach pushes a lot of numbers each time and pops it k times. It does not make good use of the dynamic characteristics of the heap, but only uses its extreme value characteristics.

A better approach is to use the fixed heap technique.

This question can be considered from another angle. In fact, this question is not to let us choose k people, compare their work efficiency with the lowest among them, and calculate the total wages based on this lowest work efficiency, and find the lowest total wages? Therefore, this question can fix a large top pile of size k, and through certain operations, ensure that the top of the pile is the k-th smallest (the operation is similar to the previous question).

In addition, triples (q / w, q, w) are used in the previous solution, which is actually unnecessary. Since two of them are known, the other can be deduced, so it is enough to store two, and since we need to according to the work efficiency compared to the key of the heap, we can choose any q or w. Here I If q is selected, the (q/2, q) two-tuple is stored.

Specifically: the total salary of k individuals with rate as the minimum work efficiency ratio = $\displaystyle \sum_{n=1}^{k}{q}_{n}/rate$, where rate is the current q / w is also the minimum value of q / w for k individuals.

Code¶

class Solution:
    def mincostToHireWorkers(self, quality: List[int], wage: List[int], K: int) -> float:
        effs = [(q / w, q) for q, w in zip(quality, wage)]
        effs.sort(key=lambda a: -a[0])
        ans = float('inf')
        h = []
        total = 0
        for rate, q in effs:
            heapq.heappush(h, -q)
            total += q
            if len(h)> K:
                total += heapq.heappop(h)
            if len(h) == K:
                ans = min(ans, total / rate)
        return ans

(Code 1.3.3)

Technique two-multiple merge¶

This technique has already been mentioned when I talked about super ugly numbers, but I didn't give a name to this type of topic.

In fact, this technique, called multi-pointer optimization, may be more appropriate, but the name is too simple and easy to confuse with double pointers, so I gave ta a unique name-multi-way merge.

-Multi-path is embodied in: There are multiple candidate routes. In code, we can use multiple pointers to represent. -Merging is reflected in: the result may be the longest or shortest among multiple candidate routes, or it may be the kth, etc. Therefore, we need to compare the results of multiple routes, and discard or select one or more routes based on the topic description.

This description is more abstract. Next, I will use a few examples to deepen everyone's understanding.

Here I have carefully prepared four hard questions for everyone. After mastering this routine, you can happily AC these four questions.

1439. The k-th smallest array sum in an ordered matrix¶

Title description¶

Given an m * n matrix mat, and an integer k, each row in the matrix is arranged in non-decreasing order.

You can select 1 element from each row to form an array. Returns the k-th smallest array sum among all possible arrays.



Example 1:

Input: mat = [[1,3,11],[2,4,6]], k = 5
Output: 7
Explanation: To select an element from each row, the first k and smallest arrays are:
[1,2], [1,4], [3,2], [3,4], [1,6]. The sum of the fifth one is 7.
Example 2:

Input: mat = [[1,3,11],[2,4,6]], k = 9
Output: 17
Example 3:

Input: mat = [[1,10,10],[1,4,5],[2,3,6]], k = 7
Output: 9
Explanation: To select an element from each row, the first k and smallest arrays are:
[1,1,2], [1,1,3], [1,4,2], [1,4,3], [1,1,6], [1,5,2], [1 ,5,3]. The sum of the seventh is 9.
Example 4:

Input: mat = [[1,1,10],[2,2,9]], k = 7
Output: 12


prompt:

m == mat.length
n == mat.length[i]
1 <= m, n <= 40
1 <= k <= min(200, n ^ m)
1 <= mat[i][j] <= 5000
mat[i] is a non-decreasing array

Ideas¶

In fact, this question is to give you m one-dimensional arrays of the same length. Let us select a number from these m arrays, that is, select m numbers in total, and find the sum of these m numbers to be **all selections The probability ** neutralizes the k-th smallest.

A simple idea is to use multiple pointers to solve. For this problem, m pointers are used to point to m one-dimensional arrays respectively. The position of the pointers indicates the number of the one-dimensional arrays currently selected.

Take the example of mat = [[1,3,11],[2,4,6]], k = 5 in the title.

-First initialize the two pointers p1, p2, which point to the beginning of the two one-dimensional arrays respectively, and the code means that they are all initialized to 0. -At this time, the sum of the numbers pointed to by the two pointers is 1 + 2 = 3, which is the first smallest sum. -Next, we move one of the pointers. At this point we can move p1 or p2. -Then the second smallest value must be the smaller of the two cases of movement p1 and movement p2. And here moving p1 and p2 will actually get 5, which means that the sum of the second and third smallest is 5.

At this point, it has branched, and there are two situations (note the position in bold, the bold indicates the position of the pointer):

The sum of [1,3,11],[2,4,6] is 5
[1,3,11],[2,4,6] sum is 5

Next, these two situations should go hand in hand and go on together.

For case 1, there are two more cases for the next move.

The sum of [1,3,11],[2,4,6] is 13
The sum of [1,3,11],[2,4,6] is 7

For case 2, there are also two cases for the next move.

The sum of [1,3,11],[2,4,6] is 7
[1,3,11],[2,4,6] sum is 7

By comparing these four situations, we come to the conclusion: The 4^th, 5^th, and 6^th small numbers are all 7. But the 7^th smallest number is not necessarily 13. The reason is similar to the above, maybe the seventh smallest is hidden in the new situation after the previous 7 split, which is actually the case. So we need to continue to execute the above logic.

Further, we can extend the above ideas to general situations.

The problem mentioned above is actually the k-th smallest sum, and the smallest one is easy to know, that is, the sum of the first items of all one-dimensional arrays. We also found that according to the smallest one, we can deduce the second smallest. The way of derivation is to move one of the pointers. This splits a total of n cases, where n is the length of the one-dimensional array, and the second smallest is Among the n cases in this split, the selection method is the n cases and the smallest, and the latter case is similar. It is not difficult to see that the extremum has also changed after each split, so this is an obvious signal for seeking dynamic extremum, and using a heap is a good choice.

How to write the code?

As mentioned above, we must first initialize m pointers and assign them to 0. Corresponding pseudo code:

# Initialize the heap
h = []
# sum(vec[0] for vec in mat) is the sum of the first items of m one-dimensional arrays
# [0] * m is to initialize an array with length m and all filled with 0.
# We assemble the above two information into Yuanzu cur for easy use
cur = (sum(vec[0] for vec in mat), [0] * m)
# Put it in the heap
heapq.heappush(h, cur)

Next, we move a pointer each time to form a new branch. Every time the smallest one is popped from the heap, the k-th smallest is popped k times. Fake code:

for 1 to K:
    # acc The current sum, pointers is the pointer situation.
    acc, pointers = heapq.heappop(h)
    # Every time a pointer in the pointer array is moved roughly. Fork every time a pointer is moved, the total possible movement is n, where n is the length of a one-dimensional array.
    for i, pointer in enumerate(pointers):
        # If pointer == len(mat[0])-1 is the end of the description and cannot be moved
        if pointer != len(mat[0])-1:
            # The meaning of the following two sentences is to modify the pointer of pointers[i] to pointers[i] + 1
            new_pointers = pointers.copy()
            new_pointers[i] += 1
            # Reload the updated acc and pointer array into the heap
            heapq.heappush(h, (acc + mat[i][pointer + 1]-mat[i][pointer], new_pointers))

This is the core code of the multiplexing problem, please remember it.

The code looks like a lot, but in fact it is only seven lines without comments.

There is a problem with the pseudo code above. For example, if there are two one-dimensional arrays, the pointers are all initialized to 0. The pointer of the first one-dimensional array is moved for the first time, and the pointer of the second array is moved for the second time. At this time, the pointer array is [1, 1], that is, all the pointers point to the element with the subscript 1. And if the pointer of the second one-dimensional array is moved for the first time, and the pointer of the first array is moved for the second time, the pointer array is still [1, 1]. This is actually a situation, if it is not controlled, it will be calculated twice and cause an error.

A possible solution is to use hashset to record all pointers, so as to avoid the problem of the same pointer being calculated multiple times. In order to do this, we need to make some fine-tuning of the use of pointer arrays, that is, use tuples instead of arrays. The reason is that arrays cannot be hashed directly. Please refer to the code area for specific content.

Multiple Merge questions, ideas and codes are relatively similar. In order to have a better understanding of the following topic, please be sure to get this one done. We will not analyze it in such detail later.

Code¶

class Solution:
    def kthSmallest(self, mat, k: int) -> int:
        h = []
        cur = (sum(vec[0] for vec in mat), tuple([0] * len(mat)))
        heapq.heappush(h, cur)
        seen = set(cur)

        for _ in range(k):
            acc, pointers = heapq.heappop(h)
            for i, pointer in enumerate(pointers):
                if pointer != len(mat[0])-1:
                    t = list(pointers)
                    t[i] = pointer + 1
                    tt = tuple(t)
                    if tt not in seen:
                        seen.add(tt)
                        heapq.heappush(h, (acc + mat[i][pointer + 1]-mat[i][pointer], tt))
        return acc

(Code 1.3.4)

719. Find the k-th smallest distance pair¶

Title description¶

Given an integer array, return the k-th smallest distance between all pairs. The distance of a pair (A, B) is defined as the absolute difference between A and B.

Example 1:

enter:
nums = [1,3,1]
k = 1
Output: 0
Explanation:
All pairs are as follows:
(1,3) -> 2
(1,1) -> 0
(3,1) -> 2
Therefore, the number pair of the first minimum distance is (1,1), and the distance between them is 0.
prompt:

2 <= len(nums) <= 10000.
0 <= nums[i] <1000000.
1 <= k <= len(nums) * (len(nums)-1) / 2.

Ideas¶

It is not difficult to see that all pairs may have a total of $C_n^2$, which is $n\times(n-1)\div2$.

Therefore, we can use two loops to find all the number pairs, sort them in ascending order, and then take the kth number.

In fact, we can use the fixed heap technique to maintain a large top heap with a size of k, so that the element at the top of the heap is the k-th smallest. This has been discussed in the previous fixed heap and will not be repeated.

class Solution:
    def smallestDistancePair(self, nums: List[int], k: int) -> int:
        h = []
        for i in range(len(nums)):
            for j in range(i + 1, len(nums)):
                a, b = nums[i], nums[j]
                # Maintain the heap size not to exceed k
                if len(h) == k and -abs(a-b)> h[0]:
                    heapq.heappop(h)
                if len(h) <k:
                    heapq.heappush(h, -abs(a-b))

        return -h[0]

(Code 1.3.5)

However, this optimization is of little significance, because the bottleneck of the algorithm lies in the enumeration of the $N^2$ part, and we should try to optimize this.

If we sort the number pairs, then the smallest number pair distance must be in nums[i]-nums[i-1], where i is an integer from 1 to n, and which one depends on who is smaller. Then you can use the idea of multi-way merging above to solve it.

If the difference between nums[i]-nums[i-1] is the smallest, then the second smallest must be the remaining n-1 cases and the new case of nums[i]-nums[i-1] split. Regarding how to split, similar to the above, we only need to move the pointer of i to i + 1. The length of the pointer array here is fixed at 2, instead of m in the question above. Here I named the two pointers fr and to respectively, representing from and to respectively.

Code¶

class Solution(object):
    def smallestDistancePair(self, nums, k):
        nums.sort()
        # n candidate answers
        h = [(nums[i+1]-nums[i], i, i+1) for i in range(len(nums)-1)]
        heapq.heapify(h)

        for _ in range(k):
            diff, fr, to = heapq.heappop(h)
            if to + 1 <len(nums):
                heapq.heappush((nums[to + 1]-nums[fr], fr, to + 1))

        return diff

(Code 1.3.6)

Since the time complexity is related to k, and k may reach the order of $N^2$ at most, this method will actually time out. But this proves the correctness of this idea. If the topic is slightly changed, it may be used.

This problem can be solved by dichotomy. Because of the deviation from the heap theme, let's briefly talk about it here.

It is easier to think of the k-th smallest number that is the heap and dichotomy. The reason for the dichotomy is to find the k-th smallest number. The essence is to find the number with k-1 that is no greater than itself. And this problem often meets monotonicity, so it can be solved by dichotomy.

For this problem, the largest number pair difference is the maximum-minimum value of the array, which may be recorded as max_diff. We can ask questions like this:

-How many of the logarithmic differences are less than max_diff? -How many have the number pair difference less than max_diff-1? -How many have the number pair difference less than max_diff-2? -How many have the number pair difference less than max_diff-3? -How many have the number pair difference less than max_diff-4? -. . .

And we know that the answers to questions are not strictly decreasing, so the use of dichotomy should be thought of. We keep asking questions until we ask if ** is less than x, there are k-1 **. However, there are problems with such questions. There are two reasons:

There may be more than one k-1 number less than x
We cannot be sure that k-1 numbers less than x must exist. For example, the number pair differences are [1,1,1,1,2], let you find the third largest number, then there are no two numbers less than x.

Our thinking can be adjusted to find that ** is less than or equal to x** there are k, then we can use the leftmost template of the dichotomy to solve it. For the leftmost template, please refer to my binary search topic

Code:

class Solution:
    def smallestDistancePair(self, A: List[int], K: int) -> int:
        A.sort()
        l, r = 0, A[-1]-A[0]

        def count_ngt(mid):
            slow = 0
            ans = 0
            for fast in range(len(A)):
                while A[fast]-A[slow]> mid:
                    slow += 1
                ans += fast-slow
            return ans

        while l <= r:
            mid = (l + r) // 2
            if count_ngt(mid) >= K:
                r = mid-1
            else:
                l = mid + 1
        return l

(Code 1.3.7)

632. Minimum interval¶

Title description¶

You have k non-decreasing lists of integers. Find a minimum interval such that each of the k lists contains at least one number.

We define that if ba <dc or a <c when ba == dc, the interval [a, b] is smaller than [c, d].



Example 1:

Input: nums = [[4,10,15,24,26], [0,9,12,20], [5,18,22,30]]
Output: [20,24]
Explanation:
List 1: [4, 10, 15, 24, 26], 24 is in the interval [20,24].
Listing 2: [0, 9, 12, 20], 20 is in the interval [20,24].
Listing 3: [5, 18, 22, 30], 22 is in the interval [20,24].
Example 2:

Input: nums = [[1,2,3],[1,2,3],[1,2,3]]
Output: [1,1]
Example 3:

Input: nums = [[10,10],[11,11]]
Output: [10,11]
Example 4:

Input: nums = [[10],[11]]
Output: [10,11]
Example 5:

Input: nums = [[1],[2],[3],[4],[5],[6],[7]]
Output: [1,7]


prompt:

nums.length == k
1 <= k <= 3500
1 <= nums[i].length <= 50
-105 <= nums[i][j] <= 105
nums[i] arranged in non-decreasing order

Ideas¶

The essence of this question is to take out a number from each of the m one-dimensional arrays and re-form a new array A so that the difference (diff) between the maximum value and the minimum value in the new array A is the smallest**.

This question is a bit similar to the above question, but slightly different. This question is a matrix, and the above question is a one-dimensional array. But we can see the two-dimensional matrix as a one-dimensional array, so we can follow the above ideas.

The smallest diff of the above idea must be generated between adjacent elements after sorting. In this question, we cannot directly sort a two-dimensional array, and even if it is sorted, it is not easy to determine the principle of sorting.

In fact, we can continue to use the ideas of the previous two questions. Specifically, it is to use the small top heap to obtain the minimum value in the heap, and then use a variable to record the maximum value in the heap, so that the diff is known, and a new diff is generated every time the pointer is updated. Just keep repeating this process and maintaining the global minimum diff.

The premise of the establishment of this algorithm is that the k lists are arranged in ascending order. Here, the principle of ascending order of the array needs to be the same as the above topic. After ordering, a pointer can be maintained for each list, and then the above ideas can be used to solve the problem.

Take nums = [[1,2,3],[1,2,3],[1,2,3]] in the title as an example:

-[1,2,3] -[1,2,3] -[1,2,3]

We first select the minimum value of all rows, which is [1,1,1]. At this time, the diff is 0, the global maximum value is 1, and the minimum value is also 1. Next, continue to look for spare tires to see if there is a better spare tire for us to choose from.

The next spare tire may arise in case 1:

-[1,2,3] -[1,2,3] -[1,2,3] moved the pointer of this row, moving it from 0 to 1 by one unit.

Or case 2:

-[1,2,3] -[1,2,3] moved the pointer of this row, moving it from 0 to 1 by one unit. -[1,2,3]

. . .

These kinds of situations continue to split more situations, this is the same as the above topic, so I won't repeat it.

Code¶

class Solution:
    def smallestRange(self, martrix: List[List[int]]) -> List[int]:
        l, r = -10**9, 10**9
        # Put the smallest row of each row in the heap, and record the row number and column number where it is located, a total of n
        h = [(row[0], i, 0) for i, row in enumerate(martrix)]
        heapq.heapify(h)
        # Maintain maximum
        max_v = max(row[0] for row in martrix)

        while True:
            min_v, row, col = heapq.heappop(h)
            # max_v-min_v is the current maximum and minimum difference, r-l is the global maximum and minimum difference. Because if the current one is smaller, we update the global result
            if max_v-min_v <r-l:
                l, r = min_v, max_v
            if col == len(martrix[row])-1: return [l, r]
            # Update the pointer, continue to move one place back
            heapq.heappush(h, (martrix[row][col + 1], row, col + 1))
            max_v = max(max_v, martrix[row][col + 1])

(Code 1.3.8)

1675. The minimum offset of the array¶

Title description¶

Give you an array of n positive integers nums.

You can perform any number of two types of operations on any element of the array:

If the element is even, divide by 2
For example, if the array is [1,2,3,4], then you can do this for the last element to make it [1,2,3,2]
If the element is odd, multiply by 2
For example, if the array is [1,2,3,4], then you can do this for the first element to make it [2,2,3,4]
The offset of the array is the maximum difference between any two elements in the array.

Returns the smallest offset the array can have after performing certain operations.

Example 1:

Input: nums = [1,2,3,4]
Output: 1
Explanation: You can convert the array into [1,2,3,2], and then into [2,2,3,2], the offset is 3-2 = 1
Example 2:

Input: nums = [4,1,5,20,3]
Output: 3
Explanation: After two operations, you can convert the array to [4,2,5,5,3], the offset is 5-2 = 3
Example 3:

Input: nums = [2,10,8]
Output: 3

prompt:

n == nums.length
2 <= n <= 105
1 <= nums[i] <= 109

Ideas¶

The title says that you can perform any number of operations on each item in the array, but in fact the operations are limited.

-We can only double the odd number once, because after double it becomes an even number. -We can divide an even number by 2 several times until it is equal to an odd number. It is not difficult to see that this is also a limited number of operations.

Take [1,2,3,4] in the title. We can:

-Change 1 to 2 (it can also be unchanged) -Change 2 to 1 (it can also be unchanged) -Change 3 to 6 (it can also be unchanged) -Change 4 to 2 or 1 (it can also be unchanged)

It is shown as follows with a graph:

One-dimensional array to two-dimensional array

This is not equivalent to: select a number from each row in a two-dimensional array such as [[1,2], [1,2], [3,6], [1,2,4]], and Minimize the difference? Isn't this exactly the same as the topic above?

Here I directly encapsulated the above problem solution into an api call, depending on the code.

Code¶

class Solution:
    def smallestRange(self, martrix: List[List[int]]) -> List[int]:
        l, r = -10**9, 10**9
        # Put the smallest row of each row in the heap, and record the row number and column number where it is located, a total of n
        h = [(row[0], i, 0) for i, row in enumerate(martrix)]
        heapq.heapify(h)
        # Maintain maximum
        max_v = max(row[0] for row in martrix)

        while True:
            min_v, row, col = heapq.heappop(h)
            # max_v-min_v is the current maximum and minimum difference, r-l is the global maximum and minimum difference. Because if the current one is smaller, we update the global result
            if max_v-min_v <r-l:
                l, r = min_v, max_v
            if col == len(martrix[row])-1: return [l, r]
            # Update the pointer, continue to move one place back
            heapq.heappush(h, (martrix[row][col + 1], row, col + 1))
            max_v = max(max_v, martrix[row][col + 1])
    def minimumDeviation(self, nums: List[int]) -> int:
        matrix = [[] for _ in range(len(nums))]
        for i, num in enumerate(nums):
            if num & 1 == 1:
                matrix[i] += [num, num * 2]
            else:
                temp = []
                while num and num & 1 == 0:
                    temp += [num]
                    num //= 2
                temp += [num]
                matrix[i] += temp[::-1]
        a, b = self.smallestRange(matrix)
        return b-a

(Code 1.3.9)

Tip Three-Little Zhuge After the Event¶

This technique refers to: when traversing from left to right, we don't know what the right is, we need to wait until you get to the right to know.

If you want to know what is on the right, a simple way is to traverse twice. The first traversal will record the data, and the second traversal will use the data recorded during the last traversal. This is the method we use the most. But sometimes, after traversing to the specified element, we can backtrack forward, so that we can traverse and store while using one traversal. Specifically, it is to collect all the data from left to right, and when it is necessary to use it, pick one from it. If we want to take the maximum or minimum value and the extreme value will change, we can use heap acceleration. Intuitively, I used the time machine to go back to the previous, achieving the purpose of hindsight.

Say you certainly don't understand what you mean. It doesn't matter, let's talk about it through a few examples. After you have read these examples, look back at this sentence.

871. Minimum number of refueling¶

Title description¶

The car departs from the starting point to the destination, which is located target miles east of the starting position.

There are gas stations along the way, and each station[i] represents a gas station, which is located at station[i][0] miles east of the departure location and has station[i][1] liters of gasoline.

Suppose the capacity of the car’s fuel tank is unlimited, and there is initially startFuel liters of fuel. It uses 1 liter of gasoline for every mile it travels.

When the car reaches the gas station, it may stop to refuel and transfer all gasoline from the gas station to the car.

In order to reach the destination, what is the minimum number of refueling times a car needs? If the destination cannot be reached, -1 is returned.

Note: If the remaining fuel is 0 when the car arrives at the gas station, it can still be refueled there. If the remaining fuel is 0 when the car reaches the destination, it is still considered to have reached the destination.



Example 1:

Input: target = 1, startFuel = 1, stations = []
Output: 0
Explanation: We can reach the destination without refueling.
Example 2:

Input: target = 100, startFuel = 1, stations = [[10,100]]
Output: -1
Explanation: We could not reach the destination, or even the first gas station.
Example 3:

Input: target = 100, startFuel = 10, stations = [[10,60],[20,30],[30,30],[60,40]]
Output: 2
Explanation:
We had 10 liters of fuel when we set off.
We drove to a gas station 10 miles from the starting point, consuming 10 liters of fuel. Add gasoline from 0 liters to 60 liters.
Then, we drove from the gas station at 10 miles to the gas station at 60 miles (consuming 50 liters of fuel),
And add gasoline from 10 liters to 50 liters. Then we drove to our destination.
We stopped at 1 two gas stations along the way, so we returned to 2.


prompt:

1 <= target, startFuel, stations[i][1] <= 10^9
0 <= stations.length <= 500
0 <stations[0][0] <stations[1][0] <... <stations[stations.length-1][0] <target

Ideas¶

In order to be able to obtain the minimum number of refueling, we definitely hope that we can not refuel without refueling. When do I have to refuel? The answer should be If you don't refuel, you won't be able to reach the next destination.

The pseudo code description is:

cur = startFuel # There is startFuel at the beginning, liters of gasoline
last = 0 # Last position
for i, fuel in stations:
    cur -= i-last # The fuel consumption of two statons is the distance of two stations, that is, i-last
    if cur <0:
        # We must cheer up ahead, otherwise we can't get here
        # But at which station in front do you refuel?
        # Intuition tells us that we should be greedy to choose the station i that can add the most gasoline. If the gasoline with i is still cur <0, continue to add the next largest station j until there is no more gasoline to add or cur> 0

The above mentioned that you have to choose the station with the most gasoline i. If the gas is still not enough, continue to choose the station with the second most gasoline. This kind of dynamic extreme value scenario is very suitable for using heap.

Specifically:

-Every time you pass a station, add its fuel to the pile. -Drive as far forward as possible and continue to drive as long as the oil is not less than 0. -If the amount of oil is less than 0, take the largest amount from the pile and add it to the fuel tank. If the amount of oil is still less than 0, continue to repeat the maximum amount of oil in the pile. -If the oil level is greater than 0 after filling the oil, continue to open and repeat the above steps. Otherwise, it returns -1, indicating that the destination cannot be reached.

So how does this algorithm embody the afterthought of the little Zhuge? You can substitute yourself into the problem to simulate. Imagine yourself as you are driving. Your goal is the requirement in the title: Minimum number of refueling. When you drive to a station, you don’t know if your fuel is enough to support the next station, and even if you can’t support the next station, it might be better to refuel at the last station. So in reality you have no way of knowing whether I should refuel or not at the current station** because there is too little information.

What would I do? If I was driving, I could only refuel every time, so that I could not reach the destination, then I would definitely not be able to reach the destination. But if we can reach the destination in this way, I can say If we refuel at that station, we can reach the destination with the least number of refuelings if we choose not to refuel at this station. Why didn't you say it earlier? Isn't this Zhuge Liang after the fact?

This hindsight is reflected in We waited until we ran out of gas before thinking that we should refuel at a previous station.

Therefore, what Zhuge Liang essentially solved after the fact is that the optimal solution cannot be obtained based on the current information, and we must backtrack after grasping all the information. For this question, we can first traverse one side of the station, and then record the amount of fuel in each station into an array. Every time we "foresee" that we cannot reach the next station, we take the largest value from this array. of. . . . Based on this, we can consider the process of using heap optimization to take extreme values instead of using arrays.

Code¶

class Solution:
    def minRefuelStops(self, target: int, startFuel: int, stations: List[List[int]]) -> int:
        stations += [(target, 0)]
        cur = startFuel
        ans = 0

        h = []
        last = 0
        for i, fuel in stations:
            cur -= i-last
            while cur <0 and h:
                cur -= heapq.heappop(h)
                ans += 1
            if cur <0:
                return -1
            heappush(h, -fuel)

            last = i
        return ans

(Code 1.3.10)

1488. Avoid flooding¶

Title description¶

There are countless lakes in your country, all of which are empty at first. When it rains in the nth lake, if the nth lake is empty, then it will be filled with water, otherwise the lake will be flooded. Your goal is to avoid flooding in any lake.

Gives you an integer array rains, where:

rains[i]> 0 means that the rains[i]th lake will rain on the i-th day.
rains[i] == 0 means that no lake will rain on the i day. You can choose a lake and drain the water.
Please return an array ans that satisfies:

ans.length == rains.length
If rains[i]> 0, then ans[i] == -1.
If rains[i] == 0, ans[i] is the lake you choose to drain on the i day.
If there are multiple feasible solutions, please return to any of them. If there is no way to stop the flood, please return an empty array.

Please note that if you choose to drain a lake full of water, it will become an empty lake. But if you choose to drain an empty lake, nothing will happen (see Example 4 for details).



Example 1:

Input: rains = [1,2,3,4]
Output: [-1,-1,-1,-1]
Explanation: After the first day, the lake filled with water included [1]
After the second day, the lakes filled with water included [1,2]
After the third day, the lakes filled with water include [1,2,3]
After the fourth day, the lakes filled with water include [1,2,3,4]
There is no day you can drain any lake, and no lake will be flooded.
Example 2:

Input: rains = [1,2,0,0,2,1]
Output: [-1,-1,2,1,-1,-1]
Explanation: After the first day, the lake filled with water included [1]
After the second day, the lakes filled with water included [1,2]
After the third day, we drained the lake 2. So the remaining lakes filled with water include [1]
After the fourth day, we drained the lake 1. So there is no lake full of water for the time being.
After the fifth day, the lakes filled with water included [2].
After the sixth day, the lakes filled with water included [1,2].
It can be seen that there will be no floods under this plan. At the same time, [-1,-1,1,2,-1,-1] is another feasible solution without flooding.
Example 3:

Input: rains = [1,2,0,1,2]
Output: []
Explanation: After the next day, the lakes filled with water included [1,2]. We can drain a lake on the third day.
But after the third day, lakes 1 and 2 will rain again, so no matter which lake we drain on the third day, another lake will flood.
Example 4:

Input: rains = [69,0,0,0,69]
Output: [-1,69,1,1,-1]
Explanation: Any shape like [-1,69,x,y,-1], [-1,x,69,y,-1] or [-1,x,y,69,-1] is possible Solution, where 1 <= x,y <= 10^9
Example 5:

Input: rains = [10,20,20]
Output: []
Explanation: As Lake 20 will rain for 2 consecutive days, there is no way to stop the flooding.


prompt:

1 <= rains.length <= 10^5
0 <= rains[i] <= 10^9

Ideas¶

If the above question is described as far-fetched as After the event Zhuge Liang, then the latter two questions can be said to be very suitable.

The title states that we can drain a lake when it is not raining. If there are multiple lakes that are full of rain, which lake should we drain? Obviously it should be draining the lake that was about to be flooded recently. But in reality, it is impossible for us to know which lake will rain in the future, even if there is a weather forecast, so it is not 100% reliable.

But the code is ok. We can traverse the rain array first to know which lake rained on the first few days. With this information, we will be able to dedicate ourselves afterwards.

"The weather today is very good. I opened my eyes. Lake 2 will be flooded tomorrow. We will drain it today, otherwise it will be flooded.".

As with the above problem, we can also simulate the daily changes instead of traversing the rain array first. Even if it is sunny, we do not drain any lakes. Then in the process of simulation record the conditions of sunny days, when the flood occurs, we then consider which lake should be drained before which sunny day. Therefore, this hindsight is reflected in the fact that we waited until the flood was over before thinking about what measures should be taken some day before.

algorithm:

-Traverse the rain to simulate daily changes -If rain is currently 0, it means it is sunny, and we will not drain any lakes. But we record the current day into the sunny array. -If rain is greater than 0, it means that a lake is raining. Let's see if this raining lake has flooded. In fact, it is to see if there is water before it rains. This prompts us to use a data structure lakes to record the situation of each lake. We can use 0 to indicate that there is no water, and 1 to indicate that there is water. In this way, when lake i rains and lakes[i] = 1, flooding will occur. -If the current lake is flooded, then go to the sunny array to find a sunny day to drain it, so that it will not be flooded, and then just keep lakes[i] = 1.

This question did not use the heap, I did it on purpose. The reason for doing this is to let everyone understand that the technique of after the fact that Zhuge Liang is not peculiar to piles, in fact, this is an ordinary algorithmic idea, just like traversing from back to front. It’s just that, many times, we need to dynamically take the maximum and minimum values in the scenes of our afterwards Zhuge Liang. At this time, we should consider using the heap. This is actually back to the one center at the beginning of the article, so everyone These techniques must be used flexibly and not mechanically.

The next question is an out-and-out after the fact Zhuge Liang + heap optimization.

Code¶

class Solution:
    def avoidFlood(self, rains: List[int]) -> List[int]:
        ans = [1] * len(rains)
        lakes = collections.defaultdict(int)
        sunny = []

        for i, rain in enumerate(rains):
            if rain> 0:
                ans[i] = -1
                if lakes[rain-1] == 1:
                    if 0 == len(sunny):
                        return []
                    ans[sunny.pop()] = rain
                lakes[rain-1] = 1
            else:
                sunny.append(i)
        return ans

(Code 1.3.11)

1642. The farthest building that can be reached¶

Title description¶

Gives you an integer array heights, which represents the height of the building. There are also some bricks and ladders.

You start your journey from building 0 and continue to move to the buildings behind, during which you may use bricks or ladders.

When moving from building i to building i+1 (the subscript starts from 0):

If the height of the current building is greater than or equal to the height of the next building, no ladders or bricks are needed
If the height of the current building is less than the height of the next building, you can use a ladder or (h[i+1]-h[i]) bricks
If the given ladder and bricks are used in the best way, return the index of the farthest building you can reach (the index starts at 0).

Example 1:


Input: heights = [4,2,7,6,9,14,12], bricks = 5, ladders = 1
Output: 4
Explanation: Starting from building 0, you can complete the journey according to this plan:
-Do not use bricks or ladders to reach building 1, because 4 >= 2
-Use 5 bricks to reach building 2. You must use bricks or ladders because 2 <7
-Do not use bricks or ladders to reach building 3 because 7 >= 6
-Use the only ladder to reach the building 4. You must use bricks or ladders because 6 <9
Can't get over building 4 because there are no more bricks or ladders.
Example 2:

Input: heights = [4,12,2,7,3,18,20,3,19], bricks = 10, ladders = 2
Output: 7
Example 3:

Input: heights = [14,3,19,3], bricks = 17, ladders = 0
Output: 3


prompt:

1 <= heights.length <= 105
1 <= heights[i] <= 106
0 <= bricks <= 109
0 <= ladders <= heights.length

Ideas¶

We can see that the ladder is an infinite brick, but it can only be used once. Of course, we hope that a good ladder can be used on the blade. As above, if it is in real life, we cannot know when to use ladders and when to use bricks.

It doesn't matter, we continue to use the post-event Zhuge Liang method, one traversal can be completed. Similar to the previous idea, that is, I don’t use ladders brainlessly, and when the ladders are not enough, we will start to make things better afterwards. If only bricks are used in the front. When should bricks be used? Obviously, when the ladder was used, the height difference was smaller than the current height difference.

To put it bluntly, I used a ladder to climb a 5-meter wall. Now there is a 10-meter wall. I don’t have a ladder anymore and can only use 10 bricks. If you used 5 bricks before, wouldn't you be able to use a ladder now, and save 5 bricks?

This prompts us to save the height difference of the building straddled by the ladder in the front, and when the ladder in the back is used up, we "exchange" the ladder used in the front to bricks and continue to use it. In the above example, we can exchange 10 bricks first, and then use up 5 bricks, which is equivalent to adding 5 bricks.

If the ladder has been used many times before, which time shall we "redeem" first? Obviously, it is priority to exchange the height difference, so that the most bricks can be exchanged. This reminds you to choose the largest height difference from the previously stored height difference and remove it after "Exchange". What data structure is appropriate for this dynamic extreme value scenario? Of course it's a pile.

Code¶

class Solution:
    def furthestBuilding(self, heights: List[int], bricks: int, ladders: int) -> int:
        h = []
        for i in range(1, len(heights)):
            diff = heights[i]-heights[i-1]
            if diff <= 0:
                continue
            if bricks <diff and ladders> 0:
                ladders -= 1
                if h and -h[0]> diff:
                    bricks -= heapq.heappop(h)
                else:
                    continue
            bricks -= diff
            if bricks <0:
                return i-1
            heapq.heappush(h, -diff)
        return len(heights)-1

(Code 1.3.12)

Four major applications¶

Next is the last part of this article "Four Applications", the purpose is to help you consolidate the previous knowledge through these few examples.

1. topK¶

Solving topK is a very important function of the heap. This has actually been introduced to you in the previous fixed heap part.

Here is a direct quote from the previous words:

"Actually, the simplest way to find the k-th smallest number is to build a small top heap, put all the numbers into the heap first, and then leave the heap one by one, for a total of k times. The last time the number is released is the k-th smallest number. However, instead of entering all the heaps first, we can create a large top heap (note that it is not the small top heap above) and maintain the heap size at k. If the heap size is greater than k after the new number is added to the heap, you need Compare the number at the top of the heap with the new number, and remove the larger one. This will ensure that the number in the heap is the smallest k in the whole number, and the largest of the smallest k (ie the top of the heap) Isn’t it the k-th smallest one? This is the reason why you chose to build a large top pile instead of a small top pile."

In fact, in addition to the k-th smallest number, we can also collect all the numbers in the middle, which can find the smallest k number. The only difference from the k-th smallest number above is that you need to collect all the numbers from popp.

It should be noted that sometimes the weight is not the size of the original array value itself, but can also be the distance, frequency, etc.

2. Weighted shortest distance¶

Regarding this point, I actually mentioned it in the previous part, but it was just a brief introduction at the time. The original sentence is "But is BFS really not implemented with priority queues? Of course not! For example, the shortest path problem with weighted graphs. If you use queues for BFS, you need priority queues, because there are ** between paths. The difference in weights**, isn’t this the original intention of the priority queue. The typical implementation of BFS using priority queues is the dijkstra algorithm.”

The DIJKSTRA algorithm mainly solves the shortest distance between any two points in the graph.

The basic idea of the algorithm is greedy. It traverses all neighbors every time and finds the one with the smallest distance. It is essentially a breadth-first traversal. Here we use the data structure of the heap to make it possible to find the point with the smallest cost in the time of $logN$, where N is the size of the heap.

Code template:

def dijkstra(graph, start, end):
    # The data in the heap are all binary ancestors of (cost, i), which means "the distance from start to i is cost".
    heap = [(0, start)]
    visited = set()
    while heap:
        (cost, u) = heapq.heappop(heap)
        if u in visited:
            continue
        visited.add(u)
        if u == end:
            return cost
        for v, c in graph[u]:
            if v in visited:
                continue
            next = cost + c
            heapq.heappush(heap, (next, v))
    return -1

(Code 1.4.1)

It can be seen that the code template and BFS are basically similar. If you set the heap key to steps, you can also simulate BFS. This has already been discussed before, so I won't repeat it here.

For example, a picture is like this:

E - 1 --> B - 1 --> C - 1 --> D - 1 --> F
 \ /\
  \ ||
    -------- 2 ---------> G ------- 1 ------

We use the adjacency matrix to construct:

G = {
    "B": [["C", 1]],
    "C": [["D", 1]],
    "D": [["F", 1]],
    "E": [["B", 1], ["G", 2]],
    "F": [],
    "G": [["F", 1]],
}

shortDistance = dijkstra(G, "E", "C")
print(shortDistance) # E - 3 --> F - 3 --> C == 6

After knowing this algorithm template, you can go to AC 743. Network delay time.

Complete code:

class Solution:
    def dijkstra(self, graph, start, end):
        heap = [(0, start)]
        visited = set()
        while heap:
            (cost, u) = heapq.heappop(heap)
            if u in visited:
                continue
            visited.add(u)
            if u == end:
                return cost
            for v, c in graph[u]:
                if v in visited:
                    continue
                next = cost + c
                heapq.heappush(heap, (next, v))
        return -1
    def networkDelayTime(self, times: List[List[int]], N: int, K: int) -> int:
        graph = collections.defaultdict(list)
        for fr, to, w in times:
            graph[fr-1].append((to-1, w))
        ans = -1
        for to in range(N):
            # Call the encapsulated dijkstra method
            dist = self.dijkstra(graph, K-1, to)
            if dist == -1: return -1
            ans = max(ans, dist)
        return ans

(Code 1.4.2)

Have you learned it?

The above algorithm is not the optimal solution. I just want to embody the idea of encapsulating dijkstra as an api call. A better approach is to traverse and record all the distance information at once instead of repeating the calculation every time. The time complexity will be greatly reduced. This is of great significance when calculating the distance from a point to all points in the graph. In order to achieve this goal, what adjustments will our algorithm have?

Tip: You can use a dist hash table to record the shortest distance from the starting point to each point. If you figure it out, you can use 882 questions to verify it~

In fact, only a small adjustment is needed. Since the adjustment is small, it is better to look at the code directly.

Code:

class Solution:
    def dijkstra(self, graph, start, end):
        heap = [(0, start)] # cost from start node,end node
        dist = {}
        while heap:
            (cost, u) = heapq.heappop(heap)
            if u in dist:
                continue
            dist[u] = cost
            for v, c in graph[u]:
                if v in dist:
                    continue
                next = cost + c
                heapq.heappush(heap, (next, v))
        return dist
    def networkDelayTime(self, times: List[List[int]], N: int, K: int) -> int:
        graph = collections.defaultdict(list)
        for fr, to, w in times:
            graph[fr-1].append((to-1, w))
        ans = -1
        dist = self.dijkstra(graph, K-1, to)
        return -1 if len(dist) != N else max(dist.values())

(Code 1.4.3)

It can be seen that we just replaced visitd with dist, and the others remain unchanged. In addition, dist is actually only the visited with the key, and it also plays the role of visited here.

If you need to calculate the shortest path from a node to all other nodes, you can use a dist (a hashmap) to record the shortest path information from the starting point to all points, instead of using visited (a hashset).

There are many similar questions, I will give you one more 787. The cheapest flight within K stop. Title description:

There are n cities connected by m flights. Each flight starts from city u and arrives at v at price w.

Now given all the cities and flights, as well as the departure city src and destination dst, your task is to find the cheapest price from src to dst through k stops at most. If there is no such route, -1 is output.



Example 1:

enter:
n = 3, edges = [[0,1,100],[1,2,100],[0,2,500]]
src = 0, dst = 2, k = 1
Output: 200
Explanation:
The city flight map is as follows

The cheapest price within 1 stop from city 0 to city 2 is 200, as shown in red in the figure.
Example 2:

enter:
n = 3, edges = [[0,1,100],[1,2,100],[0,2,500]]
src = 0, dst = 2, k = 0
Output: 500
Explanation:
The city flight map is as follows

The cheapest price within stop 0 from city 0 to city 2 is 500, as shown in blue in the figure.


prompt:

n range is [1, 100], city label is from 0 to n-1
The range of the number of flights is [0, n * (n-1) / 2]
Format of each flight (src, dst, price)
The price range of each flight is [1, 10000]
k range is [0, n-1]
The flight is not repeated, and there is no self-loop

There is no essential difference between this question and the above. I still encapsulate it as an API to use, depending on the code.

The only special point of this question is that if the number of transfers is greater than k, it is considered unreachable. This is actually very easy. We only need to use tuples in the heap to carry one more steps. This step is the distance in the weightless BFS. If the pop comes out with steps greater than K, it is considered illegal and we can skip and continue processing.

class Solution:
    # Transform it, add parameter K, and carry one more steps in the stack
    def dijkstra(self, graph, start, end, K):
        heap = [(0, start, 0)]
        visited = set()
        while heap:
            (cost, u, steps) = heapq.heappop(heap)
            if u in visited:
                continue
            visited.add((u, steps))
            if steps> K: continue
            if u == end:
                return cost
            for v, c in graph[u]:
                if (v, steps) in visited:
                    continue
                next = cost + c
                heapq.heappush(heap, (next, v, steps + 1))
        return -1
    def findCheapestPrice(self, n: int, flights: List[List[int]], src: int, dst: int, K: int) -> int:
        graph = collections.defaultdict(list)
        for fr, to, price in flights:
            graph[fr].append((to, price))
         # Call the encapsulated dijkstra method
        return self.dijkstra(graph, src, dst, K + 1)

(Code 1.4.4)

3. Factoring¶

Apply it with the above two, which I also mentioned in the previous section "313. Super Ugly Numbers".

Recall the definition of an ugly number: An ugly number is a positive integer whose prime factor only contains 2, 3, and 5. ** Therefore, the essence of an ugly number is that after a number is **factorized, only the integers of 2, 3, 5 are left, and no other factors are carried.

There are many problems about ugly numbers, and most of them can be solved from the perspective of heap. It's just that sometimes the number of factors is limited, and it is easy to solve without using a heap. For example: 264. Ugly Number II You can use three pointers to record. This technique is I have talked about it before, so I won't repeat it.

Some questions are not ugly numbers, but they explicitly mention information like factor, and ask you to find the k-th largest xx. At this time, priority should be given to using the heap to solve it. If there are some other information in the question, such as ordered, the dichotomy can also be considered. Which method to use depends on specific analysis of specific issues, but before that, everyone must be familiar enough with both methods.

4. Heap Sorting¶

The previous three applications are more or less mentioned above. The heap sorting has not been mentioned before.

There are almost no problems directly examining heap sort. However, the interview may be investigated. In addition, learning heap sort is of great significance to your understanding of important algorithmic thinking such as divide and conquer. Personally, algorithms such as heap sorting, binary tree construction, and line segment tree construction all have great similarities. If you master one, you can comprehend the others.

In fact, after the previous heap learning, we can encapsulate a heap sort, the method is very simple.

Here I put a simple sample code that uses the heap api to implement heap sorting:

h = [9,5,2,7]
heapq.heapify(h)
ans = []

while h:
    ans.append(heapq.heappop(h))
print(ans) # 2,5,7,9

Understand the example, it is not difficult to encapsulate it into a general heap sort.

def heap_sort(h):
    heapq.heapify(h)
    ans = []
    while h:
        ans.append(heapq.heappop(h))
    return ans

This method is simple enough. If you understand the principle of the previous heap, it is not difficult for you to sort a heap by hand. But this method has a drawback. It is not an in-situ algorithm, which means that you have to use extra space to accept the result, and the space complexity is $O(N)$. But in fact, after calling the heap sort method, the original array memory can be released, so in theory, the space is not wasted, but when we calculate the space complexity, we take the time that uses the most memory, so we use the original The local algorithm is undoubtedly better. If you really feel uncomfortable with this implementation, you can also use in-situ modification. This is not difficult, just a little modification of the implementation of the previous heap, due to space limitations, not much to talk about here.

to sum up¶

Pile and queue are inextricably linked. For many topics, I think about using the heap to complete. Then I found that every time it enters the heap is +1, instead of skipping updates, for example, the next one is +2, +3, etc., so it is better to use the queue to complete the performance. For example, 649. Dota2 Senate and 1654. Minimum number of jumps to get home etc.

There is only one center of the heap, which is dynamic extreme value.

The extreme value is nothing more than the maximum value or the minimum value, which is not difficult to see. If you want the maximum value, we can use the large top heap, and if we want the minimum value, we can use the smallest heap. In fact, if there is no dynamic word, there is no need to use the heap in many cases. For example, you can traverse directly to find the largest one. The dynamic is not easy to see, this is the difficulty of the problem. This requires you to analyze the problem first, and analyze that this problem is actually seeking extreme values dynamically. Then the use of heap for optimization should be thought of.

There are many implementations of the heap. For example, jump tables based on linked lists, binary heaps based on arrays, and implementations based on red-black trees. Here we introduced two main implementations and described the implementation of the binary heap in detail. Not only is its implementation simple, but it performs well in many situations. It is recommended that you focus on the implementation of the binary heap.

For the realization of the binary heap, the core point is only one thing, that is, to always maintain the nature of the heap. What is the specific nature? That is ** the weight of the parent node is not greater than the weight of the son (small top pile) **. In order to achieve this goal, we need to use floating and sinking operations when entering and exiting the heap, and properly complete the element exchange. Specifically, the floating process exchanges with the larger parent node, and the sinking process exchanges with the smaller of the two child nodes. Of course, the premise is that it has child nodes and the child nodes are smaller than it.

We have not done a detailed analysis on heaping. But if you understand the heap operation in this article, it's actually very easy. Therefore, heapization itself is a process of continuously entering the heap, but it turns a discrete operation in time into a one-time operation.

In addition, I have introduced three stacks of problem-solving skills to you, namely:

-The fixed heap can not only solve the kth problem, but also effectively use the calculated results to avoid double calculations. -Multi-path merge is essentially a violent solution, and there is no essential difference from violent recursion. If you turn it into recursion, it is also a recursion that cannot be memorized. So it is more like a backtracking algorithm. -Little Zhuge afterwards. Some information that we have no way to obtain at present can be stored in a data structure so that it can be checked later when the "Eastern Window Incident" occurs. This kind of data solution can be many, the common ones are hash tables and heaps. You can also think of this technique as "regret after the fact". Some people are more acceptable to this term, but no matter what the name is, it refers to this meaning.

Finally, I introduced four applications to you. In addition to heap sorting, these four applications have been more or less discussed in the previous section. They are:

-topK -Weighted shortest path -Factorization -Heap sort

These four applications actually revolve around a center of the heap ** dynamically taking extreme values , these four applications are just flexible use of this feature. So when you do the questions, you only need to remember **dynamically seek extreme values. If you can analyze that this question is related to the dynamic extreme value, then please be sure to consider heap. Next, we have to think about the complexity in our minds, and compare the scope of the question data to roughly estimate whether it is feasible.

If you have any thoughts on this, please leave me a message. I will check the answers one by one when I have time. More algorithm routines can visit my LeetCode problem solution warehouse: https://github.com/azl397985856/leetcode. Currently, it has 39K stars. You can also pay attention to my public account "Like Plus" to take you to the hard bones of algorithm.