Rust Dust: 4. Trie

Source

Trie (prefix tree) is a common, space-efficient way of storing a set of valid strings, e.g. all valid English words. Although a hash table would offer a better time complexity, a trie is typically preferred when the number of words is large, because trie stores all common prefixes only once, e.g. achtung and achilles share the ach in a trie, but not in a hash table.

The top level type Trie contains the root level TrieNode, corresponding to the null path, and the size field, which keeps track of the number of words the trie.

pub struct Trie {
    root: TrieNode,
    size: usize,
}

pub struct Trie {
    root: TrieNode,
    size: usize,
}

Each TrieNode contains the boolean eow (end of word) field, indicating if the current path is a valid word, and a mapping of each valid next character to its own TrieNode

pub(super) struct TrieNode {
    // Is the char mapping to this value end of a valid word?
    pub(super) eow: bool,
    // Possible continuations.
    pub(super) child_map: HashMap<char, TrieNode>
}

impl TrieNode {
    pub(super) fn new() -> TrieNode {
        TrieNode {eow: false, child_map: HashMap::new()}
    }
}

impl Debug for TrieNode {
    fn fmt(&self, f: &mut Formatter<'_>) -> Result {
        write!(f, "TrieNodeMapValue (eow = {}, child_map = {:?})", self.eow, self.child_map)
    }
}

pub(super) struct TrieNode {
    // Is the char mapping to this value end of a valid word?
    pub(super) eow: bool,
    // Possible continuations.
    pub(super) child_map: HashMap<char, TrieNode>
}

impl TrieNode {
    pub(super) fn new() -> TrieNode {
        TrieNode {eow: false, child_map: HashMap::new()}
    }
}

impl Debug for TrieNode {
    fn fmt(&self, f: &mut Formatter<'_>) -> Result {
        write!(f, "TrieNodeMapValue (eow = {}, child_map = {:?})", self.eow, self.child_map)
    }
}

To search, for example, for the word Hephaestos, we’ll need to traverse the tree one character per node, and check if the TrieNode pointed to by the last s has its eow (end of word) is set to true.

The critical thing to note, is that although TrieNode is a recursive structure, we did not have to box it, like we had to with the stack. That is because HashMap is doing the indirection for us: the size of its header is constant regardless of how many entries it contains, because all its entries are allocated on the heap.

Likewise, the destruction is cleanly handled by the default Drop implementation, which recursively calls the destructors of all the map elements. This is fine, because the depth of the recursion is limited by the longest word, which in a typically sufficiently smaller than the max call stack size.

Note as well, that I’ve implemented Debug for all the types to be able to print them. Rust can derive Debug implimentation (if we ask so with #[derive(Debug)])) for some custom types, but there are limitations, in particular it doesn’t seem to do it for recursive structures.

The final code for Trie:

pub struct Trie {
    root: TrieNode,
    size: usize,
}

impl Debug for Trie {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "Trie (root: {:?}, size: {})", self.root, self.size)
    }
}
impl Trie {

    pub fn new() -> Self {
        Trie{root: TrieNode::new(), size: 0}
    }

    pub fn insert(&mut self, token: &str) {
        let mut curr_map_value = &mut self.root;
        for char in token.chars() {
            let next_map_value = curr_map_value.child_map.entry(char).or_insert(TrieNode::new());
            curr_map_value = next_map_value;
        }
        curr_map_value.eow = true;
        self.size += 1;
    }

    pub fn size(&self) -> usize {
        self.size
    }

    pub fn contains(&mut self, token: &str) -> bool {
        let mut curr_map_value = &self.root;
        for char in token.chars() {
            match curr_map_value.child_map.get(&char) {
                None => return false,
                Some(next_map_value) => {
                    curr_map_value = next_map_value;
                }
            }
        }
        curr_map_value.eow
    }
}

pub struct Trie {
    root: TrieNode,
    size: usize,
}

impl Debug for Trie {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "Trie (root: {:?}, size: {})", self.root, self.size)
    }
}
impl Trie {

    pub fn new() -> Self {
        Trie{root: TrieNode::new(), size: 0}
    }

    pub fn insert(&mut self, token: &str) {
        let mut curr_map_value = &mut self.root;
        for char in token.chars() {
            let next_map_value = curr_map_value.child_map.entry(char).or_insert(TrieNode::new());
            curr_map_value = next_map_value;
        }
        curr_map_value.eow = true;
        self.size += 1;
    }

    pub fn size(&self) -> usize {
        self.size
    }

    pub fn contains(&mut self, token: &str) -> bool {
        let mut curr_map_value = &self.root;
        for char in token.chars() {
            match curr_map_value.child_map.get(&char) {
                None => return false,
                Some(next_map_value) => {
                    curr_map_value = next_map_value;
                }
            }
        }
        curr_map_value.eow
    }
}

Rust Dust

Variant Experiment Server

bookworms

aws-wordpress

Rust Dust: 4. Trie