Trie (prefix tree) is a common, space-efficient way of storing a set of valid strings, e.g. all valid English words. Although a hash table would offer a better time complexity, a trie is typically preferred when the number of words is large, because trie stores all common prefixes only once, e.g. achtung
and achilles
share the ach
in a trie, but not in a hash table.
The top level type Trie
contains the root level TrieNode
, corresponding to the null path, and the size
field, which keeps track of the number of words the trie.
pub struct Trie {
root: TrieNode,
size: usize,
}
Each TrieNode
contains the boolean eow
(end of word) field, indicating if the current path is a valid word, and a mapping of each valid next character to its own TrieNode
pub(super) struct TrieNode {
// Is the char mapping to this value end of a valid word?
pub(super) eow: bool,
// Possible continuations.
pub(super) child_map: HashMap<char, TrieNode>
}
impl TrieNode {
pub(super) fn new() -> TrieNode {
TrieNode {eow: false, child_map: HashMap::new()}
}
}
impl Debug for TrieNode {
fn fmt(&self, f: &mut Formatter<'_>) -> Result {
write!(f, "TrieNodeMapValue (eow = {}, child_map = {:?})", self.eow, self.child_map)
}
}
To search, for example, for the word Hephaestos
, we’ll need to traverse the tree one character per node, and check if the TrieNode
pointed to by the last s
has its eow
(end of word) is set to true.
The critical thing to note, is that although TrieNode
is a recursive structure, we did not have to box it, like we had to with the stack. That is because HashMap
is doing the indirection for us: the size of its header is constant regardless of how many entries it contains, because all its entries are allocated on the heap.
Likewise, the destruction is cleanly handled by the default Drop
implementation, which recursively calls the destructors of all the map elements. This is fine, because the depth of the recursion is limited by the longest word, which in a typically sufficiently smaller than the max call stack size.
Note as well, that I’ve implemented Debug
for all the types to be able to print them. Rust can derive Debug
implimentation (if we ask so with #[derive(Debug)])
) for some custom types, but there are limitations, in particular it doesn’t seem to do it for recursive structures.
The final code for Trie
:
pub struct Trie {
root: TrieNode,
size: usize,
}
impl Debug for Trie {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Trie (root: {:?}, size: {})", self.root, self.size)
}
}
impl Trie {
pub fn new() -> Self {
Trie{root: TrieNode::new(), size: 0}
}
pub fn insert(&mut self, token: &str) {
let mut curr_map_value = &mut self.root;
for char in token.chars() {
let next_map_value = curr_map_value.child_map.entry(char).or_insert(TrieNode::new());
curr_map_value = next_map_value;
}
curr_map_value.eow = true;
self.size += 1;
}
pub fn size(&self) -> usize {
self.size
}
pub fn contains(&mut self, token: &str) -> bool {
let mut curr_map_value = &self.root;
for char in token.chars() {
match curr_map_value.child_map.get(&char) {
None => return false,
Some(next_map_value) => {
curr_map_value = next_map_value;
}
}
}
curr_map_value.eow
}
}
Comments are closed.