Basic data structure (overview)¶

This article is not an article about data structure, but a combination of realistic scenarios to help everyone understand and review data structure and algorithms. If your data structure foundation is very poor, it is recommended to read some basic tutorials first, and then turn to it.

The positioning of this article is focused on the front-end, by learning the data structure of the actual scene in the front-end, so as to deepen everyone's understanding and understanding of the data structure.

Linear structure¶

Data structure can be logically divided into linear structure and non-linear structure. Linear structures include arrays, stacks, linked lists, etc., and nonlinear structures include trees, graphs, and so on.

In fact, we can call the tree a semi-linear structure.

It should be noted that linearity and non-linearity do not mean whether the storage structure is linear or non-linear. There is no relationship between the two, it is only a logical division. For example, we can use arrays to store binary trees.

Generally speaking, linear data structures have predecessors and successors. Such as arrays and linked lists.

In fact, a fork tree is a linked list.

Array¶

Array is the simplest data structure, and it is used in many places. For example, there is a data list, etc., it is more appropriate to use it. In fact, many of the data structures behind have the shadow of arrays.

The stacks and queues that we will talk about later can actually be regarded as a kind of restricted arrays, how is the restricted method? We will discuss later.

Let's talk about a few interesting examples to deepen everyone's understanding of the data structure of arrays.

React Hooks¶

The essence of Hooks is an array, pseudo code:

So why use arrays for hooks? We can explain it from another angle. What if we don't use arrays?

function Form() {
  // 1. Use the name state variable
  const [name, setName] = useState('Mary');

  // 2. Use an effect for persisting the form
  useEffect(function persistForm() {
    localStorage.setItem('formData', name);
  });

  // 3. Use the surname state variable
  const [surname, setSurname] = useState('Poppins');

  // 4. Use an effect for updating the title
  useEffect(function updateTitle() {
    document.title = name + '' + surname;
  });

  // ...
}

Based on the array method, the hooks of the Form are [hook1, hook2, hook3, hook4].

Then we can get this relationship, hook1 is the pair of [name, setName], hook2 is the persistentForm.

If you do not use arrays, such as objects, the hooks of Form are

{
  'key1': hook1,
  'key2': hook2,
  'key3': hook3,
  'key4': hook4,
}

Then the question is how to select key1, key2, key3, and key4? This is the problem. For more research on the nature of React hooks, please see React hooks: not magic, just arrays

However, there is also a problem with using arrays, that is, how to ensure the correspondence between the states stored in the internal hooks of the component. React gives the developers the work to ensure that you must ensure that the order of HOOKS is strictly consistent. You can see React for details. About the Hooks Rule section of the official website.

Queue¶

Queue is a restricted sequence. It can only manipulate the tail and head of the team, and can only add elements at the end of the team and delete elements at the head of the team.

Queue as one of the most common data structure also has a very wide range of applications, such as message queue

The name "queue" can be analogous to queuing in real life (the kind that does not jump in the queue)

In computer science, a queue is a special type of abstract data type or collection in which the entities in the collection are stored in order.

There are two basic queue operations:

-Add an entity to the back end of the queue, called enqueue -The removal of an entity from the front position of the queue is called dequeue.

Schematic of FIFO (first in, first out) of elements in the queue:

When we optimize the performance of our front-end, we often mention the "HTTP 1.1 head-of-line blocking problem", specifically It is HTTP2 that solves the problem of header blocking in HTTP1.1, but why HTTP1.1 has the problem of header blocking and how HTTP2 solves this problem is not clear to many people.

In fact, Head of Line Blocking is a proper term, not only in HTTP, but also in other places such as switches. In fact, the root cause of this problem is the use of a data structure of queue.

The protocol stipulates that for the same tcp connection, all http1.0 requests are put into the queue, and the next request can be sent only when the response of the previous request is received. At this time, blocking occurs, and this blocking mainly occurs Client.

It's like we are waiting for the traffic lights, even if the green light next to it turns on, your lane is a red light, you still can't go, you still have to wait.

basic-data-structure-queue-1

HTTP/1.0 and HTTP/1.1:

In HTTP/1.0, a TCP connection needs to be established for each request, and the connection is immediately disconnected after the request is completed.

In HTTP/1.1, each connection is a persistent connection by default. For the same tcp connection, multiple http1.1 requests are allowed to be sent at one time, that is, the next request can be sent without waiting for the previous response to be received. This solves the head-of-line blocking of the http1.0 client, and this is the concept of Pipeline in HTTP/1.1.

However, http1.1 stipulates that the server-side response should be queued according to the order in which the requests are received, that is, the response to the request received first should also be sent first. The problem caused by this is that if the processing time of the first received request is long, the response generation is also slow, which will block the sending of the response that has been generated, which will also cause the head of the line to block. It can be seen that the head of line blocking of http1.1 occurs on the server side.

If it is represented by a diagram, the process is probably:

basic-data-structure-queue-2

HTTP/2 and HTTP/1.1:

In order to solve the server-side head-of-line blocking in HTTP/1.1, HTTP/2 adopts methods such as binary framing and multiplexing.

Frame is the smallest unit of HTTP/2 data communication. The data packet in HTTP/1.1 is in text format, while the data packet in HTTP/2 is in binary format, which is a binary frame.

The frame transmission method can divide the request and response data into smaller pieces, and the binary protocol can be efficiently parsed. In HTTP/2, all communications under the same domain name are completed on a single connection, which can carry any number of bidirectional data streams. Each data stream is sent in the form of a message, and the message is composed of one or more frames. Multiple frames can be sent out of order, and can be reassembled according to the stream identifier in the frame header.

Multiplexing is used to replace the original sequence and congestion mechanism. In HTTP/1.1, multiple simultaneous requests require multiple TCP links, and a single domain name has 6-8 TCP link request limits (this limit is limited by browsers, and different browsers may not be the same). In HTTP/2, all communications under the same domain name are completed on a single link, occupying only one TCP link, and requests and responses can be parallel on this link without interfering with each other.

This website You can intuitively feel the performance comparison between HTTP/1.1 and HTTP/2.

Stack¶

The stack is also a restricted sequence, it can only operate on the top of the stack, regardless of whether it is pushed or popped, it is operated on the top of the stack.

In computer science, a stack is an abstract data type used to represent a collection of elements and has two main operations:

-push, add an element to the top (end) of the stack -pop, remove the top (end) element of the stack

The above two operations can be simply summarized as Last In, First Out (LIFO = last in, first out).

In addition, there should be a peek operation to access the element at the current top (end) of the stack. (Only return, not pop up)

The name "stack" can be compared to a stack of objects (a stack of books, a stack of plates, etc.).

Illustration of stack push and pop operations:

basic-data-structure-stack

The stack has applications in many places. For example, the familiar browser has many stacks. In fact, the execution stack of the browser is a basic stack structure. In terms of data structure, it is a stack. This also explains that our recursive solution is essentially the same as the loop + stack solution.

For example, the following JS code:

function bar() {
  const a = 1;
  const b = 2;
  console.log(a, b);
}
function foo() {
  const a = 1;
  bar();
}

foo();

When it is actually executed, it looks like this internally:

basic-data-structure-call-stack

The diagram I drew does not draw other parts of the execution context (this and scope, etc.). This part is the key to closures. I am not talking about closures here, but to explain the stack.

There are many statements in the community that "scope in the execution context refers to the variables declared by the parent in the execution stack". This is completely wrong. JS is the lexical scope, and scope refers to the parent when the function is defined, and execution It's ok

Common applications of stacks include base conversion, bracket matching, stack shuffling, infix expressions (rarely used), postfix expressions (reverse Polish expressions), etc.

The legal stack shuffling operation is also a classic topic. In fact, there is a one-to-one correspondence with the legal bracket matching expression, that is to say, how many kinds of stack shuffling of n elements are there, and n is legal for parentheses. There are many kinds of expressions. Those who are interested can find relevant information.

Linked List¶

Linked list is one of the most basic data structure, and mastering the structure and common operations of linked list is the foundation of the foundation.

React Fiber¶

Many people say that fiber is based on a linked list, but why is it based on a linked list? Maybe many people don't have the answer. Then I think we can put these two points (fiber and linked list) together.

The purpose of fiber is actually to solve the problem that react cannot be stopped during execution and needs to be executed in one go.

fiber-intro

Picture from Lin Clark shared at ReactConf 2017

The problem before the introduction of fiber has been pointed out above, that is, react will prevent high-priority code (such as user input) from executing. Therefore, they plan to build their own virtual execution stack to solve this problem. The underlying implementation of this virtual execution stack is a linked list.

The basic principle of Fiber is to divide the coordination process into small pieces, execute one piece at a time, and then save the result of the operation, and judge whether there is time to continue executing the next piece (react implements a function similar to requestIdleCallback by itself). If there is time, continue. Otherwise, jump out and let the browser main thread rest for a while and execute other high-priority code.

When the coordination process is completed (all the small blocks have been calculated), then it will enter the commit phase to perform real side effect operations, such as updating the DOM. There is no way to cancel this process because this part has side effects. .

The key to the problem is to divide the coordination process into pieces, and finally they can be merged together, a bit like Map/Reduce.

React must re-implement the algorithm of traversing the tree, from relying on the synchronous recursion model of built-in stack to asynchronous model with linked lists and pointers.

Andrew said this: If you only rely on the [built-in] call stack, it will continue to work until the stack is empty.

Wouldn't it be great if we could interrupt the call stack at will and manipulate the stack frame manually? This is the purpose of React Fiber. Fiber is a reimplementation of the stack, specifically for React components. You can think of a single Fiber as a virtual stack frame.

The react fiber looks like this:

let fiber = {
  tag: HOST_COMPONENT,
  type: 'div',
  return: parentFiber,
  children: childFiber,
  sibling: childFiber,
  alternate: currentFiber,
  stateNode: document.createElement('div'),
  props: { children: [], className: 'foo' },
  partialState: null,
  effectTag: PLACEMENT,
  effects: [],
};

It can be seen from here that fiber is essentially an object. Use parent, child, and sibling attributes to construct the fiber tree to represent the structure tree of the component. Return, children, and sibling are also a fiber, so the fiber looks like a linked list.

A careful friend may have discovered that alternate is also a fiber, so what is it used for? In fact, its principle is a bit like git, it can be used to perform git revert, git commit and other operations. This part is quite interesting, I will explain it in my "Developing Git from Zero"

Friends who want to know more can read this article

If you can get over the wall, you can see English original

This article is also an excellent article about fiber architecture in the early days

I am currently also writing about the fiber architecture part in the "Developing React Series from Zero Tutorials". If you are interested in specific implementations, please pay attention.

Non-linear structure¶

So with a linear structure, why do we need a non-linear structure? The answer is to efficiently balance static and dynamic operations. You can compare the complexity of various operations of various data structures to intuitively feel.

Tree¶

The application of tree is also very extensive, as small as the file system, as large as the Internet, organizational structure, etc. can be expressed as a tree structure, and the familiar DOM tree in our front-end eyes is also a tree structure, and HTML is described as a DSL. The concrete manifestation of this tree structure. If you have been exposed to AST, then AST is also a kind of tree, and XML is also a tree structure. There are far more applications for trees than most people think.

Tree is actually a special kind of graph, a kind of acyclic connected graph, a maximal acyclic graph, and a minimal connected graph.

From another perspective, the tree is a recursive data structure. And the different representation methods of the tree, such as the less commonly used eldest son + brother method, for You understand that the data structure of the tree is very useful, and it is not an exaggeration to say that it is a deeper understanding of the nature of the tree.

The basic algorithm of the tree includes front-middle-post-order traversal and hierarchical traversal. Some students have a rather vague access order for these three specific performances. In fact, I was the same at the beginning. I learned a little later, you just need to remember Live: The so-called front, middle and back refer to the position of the root node, and other positions can be arranged in order of left and right. For example, the preorder traversal is root left and right, the middle order is left root right, and the post order is left root right. Isn't it simple?

I just mentioned that a tree is a recursive data structure, so the tree traversal algorithm is very simple to use recursion. Fortunately, the tree algorithm basically depends on the tree traversal.

But the performance of recursion in computers has always been problematic, so it is useful to master the not so easy to understand "imperative iterative" traversal algorithm in some cases. If you use an iterative way to traverse, you can use the stack mentioned above to do so, which can greatly reduce the amount of code.

If you use the stack to simplify calculations, since the stack is FILO, you must pay attention to the push order of the left and right subtrees.

Important properties of the tree:

-If the tree has n vertices, then it has n-1 edges, which means that the number of vertices and the number of edges of the tree are of the same order. -There is a unique path from any node to the root node, and the length of the path is the depth of the node

The actual tree used may be more complicated. For example, the collision detection used in the game may use a quadtree or an octree. And the k-dimensional tree structure kd tree, etc.

Binary Tree¶

A binary tree is a tree with a node degree of no more than two. It is a special subset of the tree. What is interesting is that the restricted tree structure of the binary tree can represent and implement all trees. The principle behind it is the eldest son + brother method. In the words of Teacher Deng, binary tree is a special case of polytree, but when rooted and orderly, its description ability is enough to cover the latter.

Actually, when you use the eldest son + brother method to represent the tree, you can rotate it at a 45 degree angle.

A typical binary tree:

The node labeled 7 has two child nodes, labeled 2 and 6; a parent node, labeled 2, is the root node, at the top, there is no parent node.

For the general tree, we usually go to traverse, there will be many variants here.

Below I list some related algorithms for binary tree traversal:

-94.binary-tree-inorder-traversal -102.binary-tree-level-order-traversal -103.binary-tree-zigzag-level-order-traversal -144.binary-tree-preorder-traversal -145.binary-tree-postorder-traversal -199.binary-tree-right-side-view

Related concepts:

-True binary tree (the degree of all nodes can only be an even number, that is, it can only be 0 or 2)

In addition, I also specially set up the Binary Tree Traversal chapter, where you can check the specific details and algorithms.

Heap¶

The heap is actually a priority queue. Many languages have corresponding built-in data structures. Unfortunately, JavaScript does not have such a native data structure. But this will not affect our understanding and application.

The characteristics of the heap:

-In a min heap, if P is a parent node of C, then the key (or value) of P should be less than or equal to the corresponding value of C. Because of this, the top element of the heap must be the smallest, and we will use this feature to find the smallest value or the k-th smallest value.

min-heap

-In a max heap, the key (or value) of P is greater than or equal to the corresponding value of C.

max-heap

It should be noted that there is not only one type of priority queue, but also more complicated ones, but generally speaking, we will make the two equivalents.

Related algorithms:

-295.find-median-from-data-stream

Binary search tree¶

Binary Sort Tree, also known as Binary Search Tree, also known as Binary Search Tree.

Binary search tree A binary tree with the following properties:

-If the left subtree is not empty, the values of all nodes on the left subtree are less than the value of its root node; -If the right subtree is not empty, the values of all nodes on the right subtree are greater than the value of its root node; -The left and right subtrees are also binary sort trees respectively; -There is no node with the same key value.

For a binary search tree, the usual operations include insert, search, delete, find the parent node, find the maximum value, and find the minimum value.

Binary search tree is called search tree because it is very suitable for search. For example, In the following binary search tree, we want to find the node whose node value is less than and closest to 58. The search process is shown in the figure:

bst

In addition, our binary search tree has a property: The result of the ordinal traversal is an ordered array. Sometimes we can take advantage of this nature.

Binary balanced tree¶

Balanced tree is a kind of data structure in computer science, and it is an improved binary search tree. The query complexity of a general binary search tree depends on the distance (ie depth) from the target node to the root of the tree, so when the depth of the node is generally larger, the amortized complexity of the query will increase. In order to achieve a more efficient query, a balanced tree is produced.

Here, balance means that the depth of all leaves tends to balance, and more broadly means that the amortized complexity of all possible searches on the tree is low.

This data structure is used internally by some database engines, and its goal is to reduce the query operation to logn (the depth of the tree), which can be simply understood as the tree constructs a binary search algorithm at the data structure level.

Basic operation:

-Rotate

-Insert

-Delete

-Query precursor

-Query successor

AVL¶

It is the first self-balancing binary search tree invented. In an AVL tree, the maximum height difference between two subtrees corresponding to any node is 1, so it is also called a height-balanced tree. The time complexity of lookup, insertion, and deletion is O(logn) in average and worst cases. The operations of adding and deleting elements may require one or more tree rotations to rebalance the tree. The AVL tree is named after its inventors GM Adelson-Velsky and Evgenii Landis, who disclosed this data structure in their 1962 paper An algorithm for the organization of information. The balance factor of a node is the height of its left subtree minus the height of its right subtree (sometimes the opposite). Nodes with a balance factor of 1, 0, or -1 are considered to be balanced. A node with a balance factor of -2 or 2 is considered unbalanced and the tree needs to be rebalanced. The balance factor can be stored directly in each node or calculated from the height of the subtree that may be stored in the node.

Red Black Tree¶

It was invented by Rudolph Bell in 1972 and is called "Symmetric Binary B-tree". Its modern name is derived from a paper written by Leo J. Guibas and Robert Sedgewick in 1978. The structure of the red-black tree is complex, but its operation has a good worst-case running time, and it is efficient in practice: it can complete search, insertion and deletion in O(logn) time, where n is the element in the tree number

Dictionary tree (prefix tree)¶

Also known as Trie tree, it is a tree structure. The typical application is to count, sort and save a large number of strings (but not limited to strings), so it is often used by search engine systems for text word frequency statistics. Its advantages are: use the common prefix of the string to reduce the query time, minimize the unnecessary string comparison, and the query efficiency is higher than the hash tree.

It has 3 basic properties:

-The root node does not contain characters, and each node except the root node contains only one character; -From the root node to a node, the characters passing on the path are connected to form the string corresponding to the node; -All child nodes of each node contain different characters.

immutable and dictionary tree¶

The bottom layer of immutableJS is share + tree. In this way, it is actually consistent with the dictionary tree.

Related algorithms:

-208.implement-trie-prefix-tree -211.add-and-search-word-data-structure-design -212.word-search-ii

Figure¶

The data structures mentioned above can all be regarded as special cases of graphs. As mentioned earlier, binary trees can fully implement other tree structures. In fact, directed graphs can also implement undirected graphs and mixed graphs. Therefore, the research on directed graphs has always been the focus of investigation.

Graph Theory is a branch of mathematics. It uses pictures as the research object. A graph in graph theory is a graph composed of a number of given points and a line connecting two points. This graph is usually used to describe a certain relationship between certain things. A point is used to represent a thing, and a point is used to connect two points. The line indicates that there is such a relationship between the corresponding two things.

basic concepts¶

-Undirected Graph & Directed Graph -Entitled map & Unauthorized map -In Degree & Out Degree -Path & Ring -Connected graph & strongly connected graph

In an undirected graph, if any two vertices i and j have paths connected, the undirected graph is called a connected graph.

In a directed graph, if any two vertices i and j have paths connected, the directed graph is called a strongly connected graph.

-Spanning tree

The spanning tree of a connected graph refers to a connected subgraph, which contains all n vertices in the graph, but only has n-1 edges sufficient to form a tree. A spanning tree with n vertices has only n-1 edges. If one more edge is added to the spanning tree, it must form a ring. Among all spanning trees of the connected network, the cost and the smallest spanning tree of all edges is called the minimum spanning tree, where cost and refer to the sum of the weights of all edges.

Figure creation¶

The general picture title will not give you a ready-made picture structure. When you know that this is a picture problem, the first step in solving the problem is usually to build a picture. Here I briefly introduce two common methods of mapping.

Adjacency matrix (common)¶

Use a n * n matrix to describe the graph graph, which is a two-dimensional matrix, where graph[i][j] describes the relationship between the edges.

Generally speaking, I use graph[i][j] = 1 to indicate that there is an edge between vertex i and vertex j, and the direction of the edge is from i to j. Use graph[i][j] = 0 to indicate that there is no edge between vertex i and vertex j. For the power graph, we can store other numbers, which represent the weight.

The space complexity of this storage method is O(n^2), where n is the number of vertices. If it is a sparse graph (the number of edges in the graph is much smaller than the number of vertices), then it will be a waste of space. And if the graph is undirected, there will always be at least 50% wasted space. The following figure also intuitively reflects this point.

The main advantages of the adjacency matrix are:

Intuitive and simple.
Determine whether the two vertices are connected, get the in-degree and out-degree and update the degree, the time complexity is O(1)

Because it is relatively simple to use, all my topics that need to be created basically use this method.

For example, Likou 743. Network delay time. Title description:

There are N network nodes, labeled 1 to N.

Given a list times, it represents the transit time of the signal passing through the directed edge. times[i] = (u, v, w), where u is the source node, v is the target node, and w is the time for a signal to pass from the source node to the target node.

Now, we send a signal from some node K. How long will it take for all nodes to receive the signal? If all nodes cannot receive the signal, return -1.


Example:

Input: times = [[2,1,1],[2,3,1],[3,4,1]], N = 4, K = 2
Output: 2


note:

The range of N is between [1, 100].
The range of K is between [1, N].
The length of times is between [1, 6000].
All edges times[i] = (u, v, w) have 1 <= u, v <= N and 0 <= w <= 100.

This is a typical graph problem. For this problem, how do we use the adjacency matrix to build a graph?

A typical mapping code:

    graph = collections.defaultdict(list)
    for fr, to, w in times:
        graph[fr-1].append((to-1, w))

This constructs a critical matrix, and then we can traverse the graph based on this adjacency matrix.

Adjacency List¶

For each point, a linked list is stored to point to all points directly connected to the point. For a weighted graph, the value of the element in the linked list corresponds to the weight.

For example, in the undirected and unauthorised graph:

graph-1

It can be seen that in the undirected graph, the adjacency matrix is symmetric about the diagonal, and the adjacency list always has two symmetrical edges.

And in the directed and unauthorised graph:

graph-2

Graph traversal¶

The graph is created, the next step is to traverse. No matter what algorithm you are, it must be traversed. Generally, there are the following two methods (other weird traversal methods are of little practical significance, and there is no need to learn). No matter what kind of traversal, if the graph has loops, it is necessary to record the access of nodes to prevent endless loops. Of course, you may not need to actually use a collection to record the access of nodes, such as using a data outside the scope of the data mark in place, the space complexity will be \(O(1)\).

Here, a directed graph is taken as an example, and the directed graph is similar, so I won't repeat it here.

Depth First Search: (Depth First Search, DFS)¶

The method of depth-first traversal of the graph is to start from a certain vertex v in the graph, and continue to visit the neighbors, the neighbors of the neighbors until the visit is completed.

As shown in the figure above, if we use DFS and start from node A, a possible access sequence is: A -> C -> B -> D -> F -> G -> E , Of course, may also be A -> D -> C -> B -> F -> G -> E, etc., depending on your code, but they are all depth-first.

Breadth First Search: (Breadth First Search, BFS)¶

Breadth-first search can be vividly described as "a little taste". It also needs a queue to maintain the order of traversed vertices, so that the adjacent vertices of these vertices can be visited in the order of dequeuing.

As shown in the figure above, if we use BFS and start from node A, a possible access sequence is: A -> B -> C -> F -> E -> G -> D , Of course, it may also be A -> B -> F -> E -> C -> G -> D, etc., depending on your code, but they are all breadth first.

It should be noted that DFS and BFS are just an algorithm idea, not a specific algorithm. Therefore, it has strong adaptability, not limited to the characteristic data structure. The graph described in this article can be used, and the tree described above can also be used. In fact, as long as it is a non-linear data structure, it can be used.

Common Algorithms¶

The algorithm for the title of the figure is more suitable for a set of templates. The main types of questions are:

-dijkstra -floyd_warshall -Minimum spanning tree (Kruskal & Prim) -A star path finding algorithm -Bipartite graph (staining method) -Topological sort

The following lists the templates of common algorithms. All the following templates are based on the adjacency matrix.

The shortest distance, the shortest path¶

dijkstra algorithm¶

The DIJKSTRA algorithm mainly solves the shortest distance between any two points in the graph.

The basic idea of the algorithm is greedy. It traverses all neighbors every time and finds the one with the smallest distance. It is essentially a breadth-first traversal. Here we use the data structure of heap to make it possible to find the point with the smallest cost in the time of \(logN\).

Code template:

import heapq


def dijkstra(graph, start, end):
    # The data in the heap are all binary ancestors of (cost, i), which means "the distance from start to i is cost".
    heap = [(0, start)]
    visited = set()
    while heap:
        (cost, u) = heapq.heappop(heap)
        if u in visited:
            continue
        visited.add(u)
        if u == end:
            return cost
        for v, c in graph[u]:
            if v in visited:
                continue
            next = cost + c
            heapq.heappush(heap, (next, v))
    return -1

For example, a picture is like this:

E - 1 --> B - 1 --> C - 1 --> D - 1 --> F
 \ /\
  \ ||
    -------- 2 ---------> G ------- 1 ------

We use the adjacency matrix to construct:

G = {
    "B": [["C", 1]],
    "C": [["D", 1]],
    "D": [["F", 1]],
    "E": [["B", 1], ["G", 2]],
    "F": [],
    "G": [["F", 1]],
}

shortDistance = dijkstra(G, "E", "C")
print(shortDistance) # E - 3 --> F - 3 --> C == 6

After meeting this algorithm template, you can go to AC 743. Network delay time.

Complete code:

class Solution:
    def dijkstra(self, graph, start, end):

        heap = [(0, start)]
        visited = set()
        while heap:
            (cost, u) = heapq.heappop(heap)
            if u in visited:
                continue
            visited.add(u)
            if u == end:
                return cost
            for v, c in graph[u]:
                if v in visited:
                    continue
                next = cost + c
                heapq.heappush(heap, (next, v))
        return -1
    def networkDelayTime(self, times: List[List[int]], N: int, K: int) -> int:
        graph = collections.defaultdict(list)
        for fr, to, w in times:
            graph[fr-1].append((to-1, w))
        ans = -1
        for to in range(N):
            dist = self.dijkstra(graph, K-1, to)
            if dist == -1: return -1
            ans = max(ans, dist)
        return ans

Have you learned it?

floyd_warshall algorithm¶

floyd_warshall is also an algorithm for solving the distance between two points, but because its calculation process saves the intermediate calculation results to prevent repeated calculations, it is particularly suitable for ** the distance between any two points in the figure**, such as 1462 of Likou. Course arrangement IV. In addition to this advantage, another very important point is that the floyd_warshall algorithm uses dynamic programming instead of greedy, so it can handle negative weights.

The basic idea of floyd_warshall is dynamic programming. The time complexity of this algorithm is \(O(N^3)\), and the space complexity is \(O(N^2)\), where N is the number of vertices.

The algorithm is not difficult to understand. Simply put, it is: ** the shortest path from i to j = the shortest path from i to k + the minimum value of the shortest path from k to j**.

The correctness of the algorithm is self-evident, because from i to j, either directly to or through another point k in the graph. The direct situation is the critical value of our algorithm, and the smallest one is taken through the intermediate point, which is naturally the shortest distance from i to j. .

Code template:

# graph is the adjacency matrix, v is the number of vertices
def floyd_warshall(graph, v):
    dist = [[float("inf") for _ in range(v)] for _ in range(v)]

    for i in range(v):
        for j in range(v):
            dist[i][j] = graph[i][j]

    # check vertex k against all other vertices (i, j)
    for k in range(v):
        # looping through rows of graph array
        for i in range(v):
            # looping through columns of graph array
            for j in range(v):
                if (
                    dist[i][k] != float("inf")
                    and dist[k][j] != float("inf")
                    and dist[i][k] + dist[k][j] <dist[i][j]
                ):
                    dist[i][j] = dist[i][k] + dist[k][j]
    return dist, v

Let's go back and see how to set a template to solve Likou's 1462. Course arrangement IV, topic description:

You need to take n courses in total, and the course numbers are from 0 to n-1.

Some courses have direct prerequisite courses. For example, if you want to take course 0, you must take course 1 first, then the number of prerequisite courses will be given in the form of [1,0] number pairs.

Give you the total number of courses n and a list of pairs of direct prerequisite courses, prerequisite, and a list of query pairs, queries.

For each query pair queries[i], please determine whether queries[i][0] is a prerequisite for queries[i][1].

Please return a list of boolean values. Each element in the list corresponds to the judgment result of each query pair of queries.

Note: If course a is a prerequisite course for course b and course b is a prerequisite course for course c, then course a is also a prerequisite course for course c.



Example 1:



Input: n = 2, prerequisites = [[1,0]], queries = [[0,1],[1,0]]
Output: [false,true]
Explanation: Course 0 is not a prerequisite for Course 1, but Course 1 is a prerequisite for Course 0.
Example 2:

Input: n = 2, prerequisites = [], queries = [[1,0],[0,1]]
Output: [false,false]
Explanation: There are no prerequisite courses, so each course is independent.
Example 3:



Input: n = 3, prerequisites = [[1,2],[1,0],[2,0]], queries = [[1,0],[1,2]]
Output: [true,true]
Example 4:

Input: n = 3, prerequisites = [[1,0],[2,0]], queries = [[0,1],[2,0]]
Output: [false,true]
Example 5:

Input: n = 5, prerequisites = [[0,1],[1,2],[2,3],[3,4]], queries = [[0,4],[4,0],[ 1,3],[3,0]]
Output: [true,false,true,false]


prompt:

2 <= n <= 100
0 <= prerequisite.length <= (n * (n-1) / 2)
0 <= prerequisite[i][0], prerequisite[i][1] <n
prerequisite[i][0] != prerequisite[i][1]
There is no circle in the prerequisites chart.
There are no repeated edges in the prerequisite graph.
1 <= queries.length <= 10^4
queries[i][0] != queries[i][1]

This problem can also be done using floyd_warshall. You can think of it this way. If the distance from i to j is greater than 0, isn't it a prerequisite. The data range of this question is about 10^4, and the above dijkstra algorithm will definitely time out, so the floyd_warshall algorithm is a wise choice.

I'll set the template directly here, and just change it a bit. Complete code:

class Solution:
    def floyd_warshall(self, dist, v):
        for k in range(v):
            for i in range(v):
                for j in range(v):
                    dist[i][j] = dist[i][j] or (dist[i][k] and dist[k][j])

        return dist

    def checkIfPrerequisite(self, n: int, prerequisites: List[List[int]], queries: List[List[int]]) -> List[bool]:
        graph = [[False] * n for _ in range(n)]
        ans = []

        for to, fr in prerequisites:
            graph[fr][to] = True
        dist = self.floyd_warshall(graph, n)
        for to, fr in queries:
            ans.append(bool(dist[fr][to]))
        return ans

A Star Pathfinding Algorithm¶

The problem solved by A star path finding is to find the shortest distance or shortest path between any two points in a two-dimensional table. It is a commonly used heuristic algorithm for mobile computing of NPCs in games. Generally, there will be obstacles in such problems. In addition to obstacles, there will be some restrictions on Likou's questions, making the questions more difficult.

This kind of problem is generally difficult to force. It is not difficult to understand, but it is not so easy to write it out without bugs.

In this algorithm, we start from the starting point, check the four adjacent squares and try to expand until we find the target. There are more than one way to find the way of A star path finding algorithm, you can find out if you are interested.

The formula is expressed as: f(n)=g(n)+h(n).

among them:

-f(n) is the estimated cost from the initial state to the target state through state n,

-g(n) is the actual cost from the initial state to state n in the state space,

-h(n) is the estimated cost of the best path from state n to the target state.

If g(n) is 0, that is, only the evaluation function h(n) from any vertex n to the target is calculated, and the distance from the starting point to the vertex n is not calculated, the algorithm is transformed into a best-first search using a greedy strategy, which is the fastest. But may not get the optimal solution; If h(n) is not greater than the actual distance from vertex n to the target vertex, the optimal solution can be found. The smaller h(n), the more nodes need to be calculated, and the lower the algorithm efficiency. Common evaluation functions are ——Euclidean distance, Manhattan distance, Chebyshev distance; If h(n) is 0, that is, only the shortest path g(n) from the starting point to any vertex n is required, and no evaluation function h(n) is calculated, it is transformed into a single-source shortest path problem, that is, Dijkstra's algorithm. Need to calculate the most vertices;

An important concept here is the valuation algorithm. Generally, we use Manhattan distance for valuation, that is, H(n) = D * (abs (nx – goal.x) + abs (ny – goal.y) ).

A complete code template:

grid = [
    [0, 1, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0], # 0 are free path whereas 1's are obstacles
    [0, 1, 0, 0, 0, 0],
    [0, 1, 0, 0, 1, 0],
    [0, 0, 0, 0, 1, 0],
]

"""
heuristic = [[9, 8, 7, 6, 5, 4],
             [8, 7, 6, 5, 4, 3],
             [7, 6, 5, 4, 3, 2],
             [6, 5, 4, 3, 2, 1],
             [5, 4, 3, 2, 1, 0]]"""

init = [0, 0]
goal = [len(grid)-1, len(grid[0])-1] # all coordinates are given in format [y,x]
cost = 1

# the cost map which pushes the path closer to the goal
heuristic = [[0 for row in range(len(grid[0]))] for col in range(len(grid))]
for i in range(len(grid)):
    for j in range(len(grid[0])):
        heuristic[i][j] = abs(i-goal[0]) + abs(j-goal[1])
        if grid[i][j] == 1:
            heuristic[i][j] = 99 # added extra penalty in the heuristic map


# the actions we can take
delta = [[-1, 0], [0, -1], [1, 0], [0, 1]] # go up # go left # go down # go right


# function to search the path
def search(grid, init, goal, cost, heuristic):

    closed = [
        [0 for col in range(len(grid[0]))] for row in range(len(grid))
    ] # the reference grid
    closed[init[0]][init[1]] = 1
    action = [
        [0 for col in range(len(grid[0]))] for row in range(len(grid))
    ] # the action grid

    x = init[0]
    y = init[1]
    g = 0
    f = g + heuristic[init[0]][init[0]]
    cell = [[f, g, x, y]]

    found = False # flag that is set when search is complete
    resign = False # flag set if we can't find expand

    while not found and not resign:
        if len(cell) == 0:
            return "FAIL"
        else: # to choose the least costliest action so as to move closer to the goal
            cell.sort()
            cell.reverse()
            next = cell.pop()
            x = next[2]
            y = next[3]
            g = next[1]

            if x == goal[0] and y == goal[1]:
                found = True
            else:
                for i in range(len(delta)): # to try out different valid actions
                    x2 = x + delta[i][0]
                    y2 = y + delta[i][1]
                    if x2 >= 0 and x2 <len(grid) and y2 >= 0 and y2 <len(grid[0]):
                        if closed[x2][y2] == 0 and grid[x2][y2] == 0:
                            g2 = g + cost
                            f2 = g2 + heuristic[x2][y2]
                            cell.append([f2, g2, x2, y2])
                            closed[x2][y2] = 1
                            action[x2][y2] = i
    invpath = []
    x = goal[0]
    y = goal[1]
    invpath.append([x, y]) # we get the reverse path from here
    while x != init[0] or y != init[1]:
        x2 = x-delta[action[x][y]][0]
        y2 = y-delta[action[x][y]][1]
        x = x2
        y = y2
        invpath.append([x, y])

    path = []
    for i in range(len(invpath)):
        path.append(invpath[len(invpath)-1-i])
    print("ACTION MAP")
    for i in range(len(action)):
        print(action[i])

    return path


a = search(grid, init, goal, cost, heuristic)
for i in range(len(a)):
    print(a[i])

Typical topic 1263. Sokoban

Topological sort¶

In the field of computer science, the topological sorting of a directed graph is a linear sorting of its vertices, so that for each directed edge uv from vertex u to vertex v, u comes first in the sorting. If and only if there is no directional ring in the graph (that is, a directed acyclic graph), it is possible to perform topological sorting.

A typical topic is to give you a bunch of courses. There is a prerequisite relationship between the courses, allowing you to give a feasible way of learning. The prerequisite courses must be studied first. Any directed acyclic graph has at least one topological sort. It is known that there are algorithms that can construct any topological sorting of directed acyclic graphs in linear time.

Kahn Algorithm¶

To put it simply, suppose that L is a list of results, first find those nodes with zero in-degree, and put these nodes in L, because these nodes do not have any parent nodes. Then remove the edges connected to these nodes from the graph, and then look for nodes with zero in-degree in the graph. For these newly found nodes with zero indegree, their parent nodes are already in L, so L can also be placed. Repeat the above operation until no node with zero in-degree is found. If the number of elements in L is the same as the total number of nodes at this time, the sorting is complete; if the number of elements in L is different from the total number of nodes, it means that there are loops in the original graph and topological sorting cannot be performed.

def topologicalSort(graph):
    """
    Kahn's Algorithm is used to find Topological ordering of Directed Acyclic Graph
    using BFS
    """
    indegree = [0] * len(graph)
    queue = []
    topo = []
    cnt = 0

    for key, values in graph.items():
        for i in values:
            indegree[i] += 1

    for i in range(len(indegree)):
        if indegree[i] == 0:
            queue.append(i)

    while queue:
        vertex = queue.pop(0)
        cnt += 1
        topo.append(vertex)
        for x in graph[vertex]:
            indegree[x] -= 1
            if indegree[x] == 0:
                queue.append(x)

    if cnt != len(graph):
        print("Cycle exists")
    else:
        print(topo)


# Adjacency List of Graph
graph = {0: [1, 2], 1: [3], 2: [3], 3: [4, 5], 4: [], 5: []}
topologicalSort(graph)

Minimum Spanning Tree¶

The two algorithms of Kruskal and Prim will not be written for now. Let me leave a template for everyone.

Kruskal¶

from typing import List, Tuple


def kruskal(num_nodes: int, num_edges: int, edges: List[Tuple[int, int, int]]) -> int:
    """
    >>> kruskal(4, 3, [(0, 1, 3), (1, 2, 5), (2, 3, 1)])
    [(2, 3, 1), (0, 1, 3), (1, 2, 5)]

    >>> kruskal(4, 5, [(0, 1, 3), (1, 2, 5), (2, 3, 1), (0, 2, 1), (0, 3, 2)] )
    [(2, 3, 1), (0, 2, 1), (0, 1, 3)]

    >>> kruskal(4, 6, [(0, 1, 3), (1, 2, 5), (2, 3, 1), (0, 2, 1), (0, 3, 2),
    ... (2, 1, 1)))
    [(2, 3, 1), (0, 2, 1), (2, 1, 1)]
    """
    edges = sorted(edges, key=lambda edge: edge[2])

    parent = list(range(num_nodes))

    def find_parent(i):
        if i != parent[i]:
            parent[i] = find_parent(parent[i])
        return parent[i]

    minimum_spanning_tree_cost = 0
    minimum_spanning_tree = []

    for edge in edges:
        parent_a = find_parent(edge[0])
        parent_b = find_parent(edge[1])
        if parent_a != parent_b:
            minimum_spanning_tree_cost += edge[2]
            minimum_spanning_tree.append(edge)
            parent[parent_a] = parent_b

    return minimum_spanning_tree


if __name__ == "__main__": # pragma: no cover
    num_nodes, num_edges = list(map(int, input().strip().split()))
    edges = []

    for _ in range(num_edges):
        node1, node2, cost = [int(x) for x in input().strip().split()]
        edges.append((node1, node2, cost))

    kruskal(num_nodes, num_edges, edges)

Prim¶

import sys
from collections import defaultdict


def PrimsAlgorithm(l): # noqa: E741

    nodePosition = []

    def get_position(vertex):
        return nodePosition[vertex]

    def set_position(vertex, pos):
        nodePosition[vertex] = pos

    def top_to_bottom(heap, start, size, positions):
        if start> size // 2-1:
            return
        else:
            if 2 * start + 2 >= size:
                m = 2 * start + 1
            else:
                if heap[2 * start + 1] <heap[2 * start + 2]:
                    m = 2 * start + 1
                else:
                    m = 2 * start + 2
            if heap[m] <heap[start]:
                temp, temp1 = heap[m], positions[m]
                heap[m], positions[m] = heap[start], positions[start]
                heap[start], positions[start] = temp, temp1

                temp = get_position(positions[m])
                set_position(positions[m], get_position(positions[start]))
                set_position(positions[start], temp)

                top_to_bottom(heap, m, size, positions)

    # Update function if value of any node in min-heap decreases
    def bottom_to_top(val, index, heap, position):
        temp = position[index]

        while index != 0:
            if index% 2 == 0:
                parent = int((index-2) / 2)
            else:
                parent = int((index-1) / 2)

            if val <heap[parent]:
                heap[index] = heap[parent]
                position[index] = position[parent]
                set_position(position[parent], index)
            else:
                heap[index] = val
                position[index] = temp
                set_position(temp, index)
                break
            index = parent
        else:
            heap[0] = val
            position[0] = temp
            set_position(temp, 0)

    def heapify(heap, positions):
        start = len(heap) // 2-1
        for i in range(start, -1, -1):
            top_to_bottom(heap, i, len(heap), positions)

    def deleteMinimum(heap, positions):
        temp = positions[0]
        heap[0] = sys.maxsize
        top_to_bottom(heap, 0, len(heap), positions)
        return temp

    visited = [0 for i in range(len(l))]
    Nbr_TV = [-1 for i in range(len(l))] # Neighboring Tree Vertex of selected vertex
    # Minimum Distance of explored vertex with neighboring vertex of partial tree
    # formed in graph
    Distance_TV = [] # Heap of Distance of vertices from their neighboring vertex
    Positions = []

    for x in range(len(l)):
        p = sys.maxsize
        Distance_TV.append(p)
        Positions.append(x)
        nodePosition.append(x)

    TreeEdges = []
    visited[0] = 1
    Distance_TV[0] = sys.maxsize
    for x in l[0]:
        Nbr_TV[x[0]] = 0
        Distance_TV[x[0]] = x[1]
    heapify(Distance_TV, Positions)

    for i in range(1, len(l)):
        vertex = deleteMinimum(Distance_TV, Positions)
        if visited[vertex] == 0:
            TreeEdges.append((Nbr_TV[vertex], vertex))
            visited[vertex] = 1
            for v in l[vertex]:
                if visited[v[0]] == 0 and v[1] <Distance_TV[get_position(v[0])]:
                    Distance_TV[get_position(v[0])] = v[1]
                    bottom_to_top(v[1], get_position(v[0]), Distance_TV, Positions)
                    Nbr_TV[v[0]] = vertex
    return TreeEdges


if __name__ == "__main__": # pragma: no cover
    # <--------- Prims Algorithm --------->
    n = int(input("Enter number of vertices: ").strip())
    e = int(input("Enter number of edges: ").strip())
    adjlist = defaultdict(list)
    for x in range(e):
        l = [int(x) for x in input().strip().split()] # noqa: E741
        adjlist[l[0]].append([l[1], l[2]])
        adjlist[l[1]].append([l[0], l[2]])
    print(PrimsAlgorithm(adjlist))

binary picture¶

I have talked about the bipartite graph in these two questions. After you look at it, you can do the two questions. In fact, there is no difference between these two questions and one question.

-0886. Possible dichotomy -0785. Judging the bipartite graph

The recommended order is: 886 first, then 785.

to sum up¶

To understand the common concepts of graphs, we are just getting started. Next, we can do the questions. The first step for general graph questions is to build a graph, and the second step is to traverse the graph of the first step to find a feasible solution.

The title of the picture is relatively difficult, especially at the level of code writing. But as far as interview questions are concerned, there are not many types of questions in the picture, and many questions can be solved by using templates. Therefore, it is recommended that you practice the template more and type it yourself to make sure you can type it out by yourself.