Prefix Tee (Trie)¶

Introduction¶

The dictionary tree is also called the prefix tree and Trie. It itself is a tree structure, that is, a polytree. Friends who have learned trees should be very easy to understand. Its core operations are insertion and search. Delete is rarely used, so this handout does not include delete operations.

Up to now (2020-02-04) Prefix Tree (Dictionary Tree) There are a total of 17 questions in LeetCode. Among them, 2 are simple, 8 are medium, and 7 are difficult.

Features of the prefix tree¶

Simply put, the prefix tree is a tree. The prefix tree generally records a series of words on the tree. If these words do not have a common prefix, it is no different from directly using an array. If there is a public prefix, the public prefix will only be stored once. It is conceivable that if there are many common prefixes for a series of words, it will effectively reduce space consumption.

The meaning of the prefix tree is actually space for time, which is the same as the original intention of hash tables and dynamic programming.

The principle is also very simple, as I said before, the public prefix will only be stored once, so if I want to find a word or a prefix in a bunch of words, I don’t need to traverse completely, but traverse The prefix tree is fine. Essentially, the time reduced by using prefix trees and not using prefix trees is the number of common prefixes. In other words, a bunch of words do not have a common prefix, and there is no point in using a prefix tree.

Knowing the characteristics of the prefix tree, we then implement a prefix tree ourselves. For implementation, please refer to 0208.implement-trie-prefix-tree

Application scenarios and analysis¶

As mentioned above, the core idea of the prefix tree is to use space for time, and use the common prefix of the string to reduce the time cost of the query.

For example, give you a string query and ask you whether this string appears in the string collection, so that we can build the string collection, and after it is built, we can match whether the query appears, then Some friends will definitely ask, isn’t the hashmap that I talked about better?

Let’s think about it when searching on Baidu, type "one language", and the search bar will show recommended texts such as "one language tells you" and "one language becomes a truth (four tones of chen)". This is called fuzzy matching. It is to give a fuzzy query, hoping to give a list of related recommendations. Obviously, hashmap is not easy to achieve fuzzy matching, and Trie can implement fuzzy search based on prefix.

Note that the fuzzy search here is only based on the prefix. For example, in the above example, searching for "Dao Po" will not match "Dao Po" but only "Dao Po xx"

Therefore, my understanding here is: the above precise search is only a special case of fuzzy search. Obviously, the fuzzy search hashmap cannot be done. If there are too many conflicts in the hashmap in the precise search problem, the efficiency may not be higher than that of Trie. Interested Friends can do a test to see which is faster.

For example, give you a long sentence and a bunch of sensitive words, and find out all the positions where all the sensitive words appear in the long sentence (think about it, sometimes we breathe fragrant mouth, but the result is sent out as *** *,got it)

Tips: In fact, the AC automata uses the nature of trie to match sensitive words, and the performance is very good. So much so that many editors use the AC automata algorithm.

There are other scenes, but I won’t discuss them here. If you’re interested, you can google it.

basic concepts¶

A prefix tree looks like this:

As shown in the figure, each node stores a character, and then adds a control information to indicate whether it is the end of a word. The actual use process may be slightly different, but the change is not big.

Next, let's look at the concept in Trie-node.

-The root node has no practical meaning -Each node represents a character -The data structure in each node can be customized, such as isWord (whether it is a word), count (the number of times the prefix appears), etc. What is needed for actual analysis of the actual problem.

API¶

To implement the prefix tree yourself, you must first know what its APIs are and what are the specific functions.

The main APIs of the prefix tree are as follows:

-insert(word): insert a word -search(word): Find whether a word exists -startWith(word): Find if there is a word prefixed with word

Among them, startWith is the core usage of the prefix tree, and the name prefix tree comes from here. You can start with 208 questions, get familiar with the prefix tree, and then try other questions.

Trie insertion¶

-Suppose a few words such as [she,he,her,good,god] are given to construct a Trie as shown below:

-That is to say, a word composed of characters from the root node to a pink node has appeared in the word list. Of course, we can also add a count attribute to each node of the tree, which represents the root node to the The number of occurrences of the string prefix formed by the node

It can be seen that the structure of the tree is very simple. When inserting a new word, it starts from the root node and inserts one character at a time. If there is a corresponding character node, the corresponding attribute is updated, and if there is no one, one is created!

Trie's query¶

The query is even simpler. Given a Trie and a word, the process is similar to that of inserting. Find one character by character.

-If there is no corresponding node for a character in the middle → Trie does not contain the word -If the string is traversed, there are corresponding nodes, but the node corresponding to the last character is not pink, and it is not a word → Trie does not contain the word

Trie template¶

Understand the usage scenarios of Trie and basic API, then finally it is implemented with code.

Here I provide the code in Python and Java.

C# Solution¶

using System;
using System.Collections.Generic;
namespace Algorithms.Medium
{
  public class TrieNode
  {
    public Dictionary<char, TrieNode> children;
    public bool endOfWord;
    public TrieNode()
    {
      children = new Dictionary<char, TrieNode>();
      endOfWord = false;
    }

  }
  public class Trie
  {
    public TrieNode root { get; private set; }
    public Trie()
    {
      root = new TrieNode();
    }

    public void Insert(string v)
    {
      var current = root;

      for (var i = 0; i < v.Length; i++)
      {
        if (!current.children.ContainsKey(v[i]))
        {
          current.children.Add(v[i], new TrieNode());
        }
        current = current.children[v[i]];
      }

      current.endOfWord = true;
    }

    public virtual bool Search(string v, bool isStartsWith = false)
    {
      var current = root;
      for (var i = 0; i < v.Length; i++)
      {
        if (!current.children.ContainsKey(v[i])) return false;

        current = current.children[v[i]];
      }

      return isStartsWith ? true : current.endOfWord;
    }

    public bool StartsWith(string v)
    {
      return Search(v, true);
    }
  }
}

Complexity Analysis

-The time complexity of insertion and query is naturally \(O(len(key))\), and key is the string to be inserted (searched).

-The worst space complexity of creating a tree is \(O(m^{n})\), where m is the number of characters in the character set, and n is the length of the string.

Topic recommendation¶

The following are the solutions to the six topics of this topic. The content will be updated continuously. Thank you for your attention~

to sum up¶

The core idea of the prefix tree is to use space for time, and use the common prefix of the string to reduce the time cost of the query. Therefore, if there are many common prefixes in the topic, you can consider using prefix trees to optimize.

The basic operation of the prefix tree is insertion and query. The query can be a complete query or a prefix query. The prefix-based query is the soul of the prefix tree and the source of its name.

Finally, we provide you with the prefix tree template in two languages. If you need to use it, you can directly encapsulate it into a standard API call.

The topic based on the prefix tree usually does not change much, and it can be solved by using a template. How to know how to use prefix tree optimization is a difficult point, but you only need to keep in mind that the complexity of the algorithm is that the bottleneck of the algorithm is in the string search, and the string has many common prefixes, you can use the prefix tree optimization** .