Trie (Prefix Tree)前缀树

时间：2020-06-13 23:35:06 阅读：76 评论：0 收藏：0 [点我收藏+]

标签：false 标记字典顺序 string children 第一个字符字符 int ret

Trie (Prefix Tree)前缀树
使用insert，search和startsWith方法实现Trie。

Example:

Trie trie = new Trie(); 
trie.insert("apple"); 
trie.search("apple"); // returns true 
trie.search("app"); // returns false 
trie.startsWith("app"); // returns true 
trie.insert("app"); 
trie.search("app"); // returns true

摘要
本文适用于中级用户。它引入了以下 idea ：数据结构Trie（前缀树）和最常见的操作。

1. 前缀树的应用

Trie（我们读作“ try”），前缀树是一种树数据结构，用于检索字符串数据集中的键。这种非常有效的数据结构有多种应用，例如：

Autocomplete
Spell checker
IP routing (Longest prefix matching)

还有其他几种数据结构，例如平衡树和哈希表，可以在字符串数据集中搜索单词。
那为什么我们需要 Trie 呢？尽管哈希表在寻找键时具有O（1）的时间复杂度，但在以下操作中效率不高：查找具有共同前缀的所有键；按字典顺序枚举字符串数据集。
Trie 之所以胜过哈希表的另一个原因是，随着哈希表大小的增加，会有很多哈希冲突，并且搜索时间复杂度可能会恶化为O（n），其中 n 是插入的键数。
当存储许多具有相同前缀的密钥时，与哈希表相比，Trie可以使用更少的空间。在这种情况下，使用trie仅具有O（m）的时间复杂度，其中m是密钥长度。在平衡树中搜索密钥的时间复杂度为O（mlogn）。

2. Trie 节点结构

Trie 是一棵有根树。它的节点具有以下字段：

到其子节点的最大的 R 个link ，其中每个 link 对应于数据集字母表中的字符值之一。在本文中，我们假设R为26，即小写拉丁字母的数量。
布尔值字段，用于指定节点是对应于键的结尾还是仅仅是键的前缀。

class TrieNode { ? 
    // R links to node children
 ? ?private TrieNode[] links;
 ? ?private final int R = 26;
 ? ?private boolean isEnd;
 ? ?public TrieNode() {
 ? ? ? ?links = new TrieNode[R];
 ?  }
 ? ?public boolean containsKey(char ch) {
 ? ? ? ?return links[ch -‘a‘] != null;
 ?  }
 ? ?public TrieNode get(char ch) {
 ? ? ? ?return links[ch -‘a‘];
 ?  }
 ? ?public void put(char ch, TrieNode node) {
 ? ? ? ?links[ch -‘a‘] = node;
 ?  }
 ? ?public void setEnd() {
 ? ? ? ?isEnd = true;
 ?  }
 ? ?public boolean isEnd() {
 ? ? ? ?return isEnd;
 ?  }
}

3. 插入一个key? 到 Trie 中

我们通过在Trie中搜索来插入 key 。我们从根开始，搜索一个link ，该link 对应于key 的第一个字符。有两种情况：

link是存在的。然后，我们沿着 link 向下移动到下一个子节点 level 。算法继续搜索key的下一个字符。
link 不存在。那么，我们创建一个新节点并将其与当前字符所匹配的 link位置进行linking。重复此步骤，直到遇到key的最后一个字符，然后将当前节点标记为结束节点，算法完成。

我们可以从图上看到，在逻辑上字符的信息是存储在link上面的；

class Trie {
 ? ?private TrieNode root;
 ? ?public Trie() {
 ? ? ? ?root = new TrieNode();
 ?  }
 ? ?// Inserts a word into the trie.
 ? ?public void insert(String word) {
 ? ? ? ?TrieNode node = root;
 ? ? ? ?for (int i = 0; i < word.length(); i++) {
 ? ? ? ? ? ?char currentChar = word.charAt(i);
 ? ? ? ? ? ?if (!node.containsKey(currentChar)) {
 ? ? ? ? ? ? ? ?node.put(currentChar, new TrieNode());
 ? ? ? ? ?  }
 ? ? ? ? ? ?node = node.get(currentChar);
 ? ? ?  }
 ? ? ? ?node.setEnd();
 ?  }
}

复杂度分析
时间复杂度：O（m），其中m是key长度。在算法的每次迭代中，我们都将在树中检查或创建一个节点，直到到达key的末尾。这仅需要m次的操作。
空间复杂度：O（m）。在最坏的情况下，新插入的key 没有和已经插入到Trie中的其他key共享前缀。我们必须添加m个新节点，这需要我们O（m）的空间。

4.? 在 Trie中搜索一个 key

每个 key在 trie树中表示为从根到内部节点或叶的路径。初始时我们使用key的第一个字符从root节点开始。我们检查当前节点是否存在与字符对应的link 。有两种情况：

存在link 。我们将移动到此link 指向的下一个节点，然后继续搜索key 的下一个字符。
link 不存在。如果key 以及没有剩余的字符了，并且当前节点被标记为 isEnd，则返回true。否则，存在两种情况，我们都返回false：? ?
* key 还剩下一些字符没有搜索，但是不能沿着Trie树中的关键路径继续前进了，我们就知道key没有找到。
* key的字符没有剩下的了，但当前节点没有被标记为isEnd。因此，我们搜索的key 只是trie中另一个key的前缀。


class Trie {
 ?  ...
 ? ?// search a prefix or whole key in trie and
 ? ?// returns the node where search ends
 ? ?private TrieNode searchPrefix(String word) {
 ? ? ? ?TrieNode node = root;
 ? ? ? ?for (int i = 0; i < word.length(); i++) {
 ? ? ? ? ? char curLetter = word.charAt(i);
 ? ? ? ? ? if (node.containsKey(curLetter)) {
 ? ? ? ? ? ? ? node = node.get(curLetter);
 ? ? ? ? ? } else {
 ? ? ? ? ? ? ? return null;
 ? ? ? ? ? }
 ? ? ?  }
 ? ? ? ?return node;
 ?  }
 ? ?// Returns if the word is in the trie.
 ? ?public boolean search(String word) {
 ? ? ? TrieNode node = searchPrefix(word);
 ? ? ? return node != null && node.isEnd();
 ?  }
}

复杂度分析
时间复杂度：O（m）在算法的每个步骤中，我们要搜索key的下一个字符。在最坏的情况下，该算法执行m次运算。
空间复杂度：O（1）

5. 在Trie中搜索key前缀

该方法与我们在Trie 中搜索key 的方法非常相似。我们从根开始遍历Trie，直到key前缀中没有剩余字符，或者不能使用当前的字符继续在Trie中的路径遍历下去。
与上述搜索key 的算法的唯一区别是，当我们到达key前缀的末尾时，我们总是返回true。我们不需要考虑当前trie节点的isEnd标记，因为我们正在搜索键的前缀，而不是整个键。
技术图片

class Trie {
 ?  ...
 ? ?// Returns if there is any word in the trie
 ? ?// that starts with the given prefix.
 ? ?public boolean startsWith(String prefix) {
 ? ? ? ?TrieNode node = searchPrefix(prefix);
 ? ? ? ?return node != null;
 ?  }
}

复杂度分析
时间复杂度：O（m）
空间复杂度：O（1）

参考：?https://leetcode.com/articles/implement-trie-prefix-tree/

Trie (Prefix Tree)前缀树

标签：false 标记字典顺序 string children 第一个字符字符 int ret

原文地址：https://www.cnblogs.com/billxxx/p/13121972.html

踩

(0)