Rust Dust: 1. Stack

0. Introduction

If, like me, you are coming to Rust from a language that runs on a virtual machine, like Java or Erlang, your biggest challenge is Rust’s memory safety facility. It requires you to be cognizant of which allocations happen on the stack and which on the heap — the concepts that are completely abstracted out for you by VMs. If, on the other hand, you’re coming from C/++, your discomfort comes from the fact that the language does not give you a mechanism to directly allocate on the heap, but still requires you to understand the implicit memory organization of your datastructures.

The code I develop in the following sections stops well short of the impolementation of these data structures found in the standard library. Those use unsafe direct pointer manipulations in the name of efficiency and may seem in contravention of the language’s principal ethos of memory safety. For now, I am leaving both the unsafe Rust and philosophy out of this exercise.

Note that all the modules that fail compilation are intentionally commented out of main.rs to have them ignored by the compiler. If you like to reproduce a compilation error cited in the following sections, uncomment the corresponding module registration in main.rs.

We start with a stack (a LIFO singly-linked list) because its memory organization is acyclic: each heap-allocated object has a single referer. (In doubly-linked lists each element is pointed to by the previous and the next elements, which presents an additional challenge.)

1. Naïve Stack (The Java Approach)

In the naïve stack, one can see what happens if we ditch the concepts of stack and heap and hope that the language will handle memory management details for us. In the listing below, the struct Stack holds the pointer to the head node, if any, and the current stack size. The struct StackNode holds the value of the element of type E and the pointer to the next node, if any.

struct Stack<E> {
    head: Option<StackNode<E>>,
    size: usize,
}

struct StackNode<E> {
    next: Option<StackNode<E>>,
    elem: E,
}

struct Stack<E> {
    head: Option<StackNode<E>>,
    size: usize,
}

struct StackNode<E> {
    next: Option<StackNode<E>>,
    elem: E,
}

Something like this would work fine in Java but Rust gives a compilation error:

error[E0072]: recursive type `naive_stack::StackNode` has infinite size
 --> src/naive_stack.rs:6:1
  |
6 | struct StackNode<E> {
  | ^^^^^^^^^^^^^^^^^^^
7 |     next: Option<StackNode<E>>,
  |                  ------------ recursive without indirection
  |
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle
  |
7 |     next: Option<Box<StackNode<E>>>,
  |                  ++++            +

error[E0072]: recursive type `naive_stack::StackNode` has infinite size
 --> src/naive_stack.rs:6:1
  |
6 | struct StackNode<E> {
  | ^^^^^^^^^^^^^^^^^^^
7 |     next: Option<StackNode<E>>,
  |                  ------------ recursive without indirection
  |
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle
  |
7 |     next: Option<Box<StackNode<E>>>,
  |                  ++++            +

Recursive structures do not have a size known at compile time because the amount of memory needed to allocate one depends on the depth of recursion. This works in Java because the JVM allocates everything on the heap, deferring the size computation until runtime. Rust does not have runtime, so the compiler must know how much memory each type requires.

Next, we consider compiler’s (helpful) suggestion “insert some indirection (e.g., a Box, Rc, or &) to break the cycle.“

2. Naïve Stack (The C Approach)

C too requires that all structures’ sizes be known at compile time. There, the problem is resolved by (as suggested by the Rust compiler) indirection, replacing the inline inner structure with a pointer to it:

template <typename T>
struct Stack {
    StackNode* head;
    int size;
};

template <typename T>
struct StackNode {
    T elem;
    StackNode* next;
};

template <typename T>
struct Stack {
    StackNode* head;
    int size;
};

template <typename T>
struct StackNode {
    T elem;
    StackNode* next;
};

In C, the programmer is responsible for explicitly allocating each instance of StackNode and storing its pointer in the parent node. Pointers have a known size (that of the machine word), but direct pointer manipulation is exactly the brittleness that Rust intentionally does not allow. If we try something like this:

struct Stack<E> {
    head: Option<&StackNode<E>>,
    size: usize,
}

struct StackNode<E> {
    next: Option<&StackNode<E>>,
    elem: E,
}

struct Stack<E> {
    head: Option<&StackNode<E>>,
    size: usize,
}

struct StackNode<E> {
    next: Option<&StackNode<E>>,
    elem: E,
}

we get a compilation error:

error[E0106]: missing lifetime specifier
 --> src/naive.rs:2:18
  |
2 |     head: Option<&StackNode<E>>,
  |                  ^ expected named lifetime parameter
  |
help: consider introducing a named lifetime parameter
  |
1 ~ struct Stack<'a, E> {
2 ~     head: Option<&'a StackNode<E>>,
  |

error[E0106]: missing lifetime specifier
 --> src/naive.rs:7:18
  |
7 |     next: Option<&StackNode<E>>,
  |                  ^ expected named lifetime parameter
  |
help: consider introducing a named lifetime parameter
  |
6 ~ struct StackNode<'a, E> {
7 ~     next: Option<&'a StackNode<E>>,
  |

error[E0106]: missing lifetime specifier
 --> src/naive.rs:2:18
  |
2 |     head: Option<&StackNode<E>>,
  |                  ^ expected named lifetime parameter
  |
help: consider introducing a named lifetime parameter
  |
1 ~ struct Stack<'a, E> {
2 ~     head: Option<&'a StackNode<E>>,
  |

error[E0106]: missing lifetime specifier
 --> src/naive.rs:7:18
  |
7 |     next: Option<&StackNode<E>>,
  |                  ^ expected named lifetime parameter
  |
help: consider introducing a named lifetime parameter
  |
6 ~ struct StackNode<'a, E> {
7 ~     next: Option<&'a StackNode<E>>,
  |

The error demonstrates the fundamental difference between a C pointer and a Rust reference: a Rust reference represents a borrowed value owned by some other value, and the Rust compiler needs a guarantee that the reference will not outlive the referent. In contrast, the C compiler simply trusts the programmer to do the right thing by making sure that the references will not be dereferenced after the referent is freed.

Curiously, from the standpoint of automated proof of correctness, a C program by design cannot be safe; the freeing of the referent and of the reference cannot be accomplished atomically. This leaves the program for a finite period of time with either a dangling pointer or leaked memory. Rust breaks this impasse by pushing the check to compile time. This removes the possibility of a dangling pointer — by far the predominant source of bugs in C, — but not that of memory leaks. As we will see in the next post, memory leaks are a distinct possibility in Rust.

3. Working Stack (The Rust Approach)

The way to fix this is to use Box, the simplest smart pointer providing owned heap allocation. Box consists of two parts: a chunk of heap memory whose size may not be known at compile time and a fixed sized stack object that contains the pointer to that heap memory. Whereas C‘s malloc gives you a raw pointer, Box hides that pointer inside the stack-allocated header. The following is a basic implementation of stack in Rust.

struct Stack<E> {
    head: Option<Box<StackNode<E>>>,
    size: usize,
}

struct StackNode<E> {
    next: Option<Box<StackNode<E>>>,
    elem: E,
}

impl<E> Stack<E> {
    fn new() -> Self {
        Stack{head: None, size: 0}
    }
    fn push(&mut self, elem: E) {
        let new_node = Box::new(StackNode{elem, next: self.head.take()});
        self.head = Some(new_node);
        self.size += 1;
    }

    fn pop(&mut self) -> Option<E> {
        match self.head.take() {
            None => None,
            Some(old_head) => {
                self.head = old_head.next;
                self.size -= 1;
                Some(old_head.elem)
            }
        }
    }
}

struct Stack<E> {
    head: Option<Box<StackNode<E>>>,
    size: usize,
}

struct StackNode<E> {
    next: Option<Box<StackNode<E>>>,
    elem: E,
}

impl<E> Stack<E> {
    fn new() -> Self {
        Stack{head: None, size: 0}
    }
    fn push(&mut self, elem: E) {
        let new_node = Box::new(StackNode{elem, next: self.head.take()});
        self.head = Some(new_node);
        self.size += 1;
    }

    fn pop(&mut self) -> Option<E> {
        match self.head.take() {
            None => None,
            Some(old_head) => {
                self.head = old_head.next;
                self.size -= 1;
                Some(old_head.elem)
            }
        }
    }
}

Note, that Box, like String, does not implement Copy in order to prevent multiple ownership of the same heap data. Allowing so would cause double free errors when Box is dropped.

4. Cleanup

Let’s now add a new test case

#[test]
fn drop_test() {
    let mut stack: Stack<i32> = Stack::new();
    for i in 0..100000 {            
        stack.push(i);
    }
}

#[test]
fn drop_test() {
    let mut stack: Stack<i32> = Stack::new();
    for i in 0..100000 {            
        stack.push(i);
    }
}

On my system this crashes with

thread 'stack::tests::drop_test' has overflowed its stack
fatal runtime error: stack overflow

thread 'stack::tests::drop_test' has overflowed its stack
fatal runtime error: stack overflow

(If you’re not seeing the error, increase the size of the stack.) The error happens when stack goes out of scope at the end of drop_test() function. The default Drop implementation for our Stack, as generated by the Rust compiler, is recursive: i-th node’s drop() must call i+1st node’s drop() before cleaning up itself. We must provide a custom Drop implementation that replaces this recursion with iteration. (For a deeper look at why in this case rustc cannot generate an iterative default implementation, see this discussion).

The first thing that comes to mind is to simply call pop() in a loop:

impl <E> Drop for Stack<E> {
    fn drop(&mut self) {
        while let Some(node) = self.pop() {}
    }
}

impl <E> Drop for Stack<E> {
    fn drop(&mut self) {
        while let Some(node) = self.pop() {}
    }
}

This works fine in the sense that drop_test() now passes, but this implementation can still be improved. The pop() method repeatedly disposes of the head node, but it also moves the (potentially large) element value out of Box and returns it, re-wrapped in Option. This extra memory copy is unavoidable if the caller wants the element value, but our drop() method does not. A better solution is to keep taking the `Box` from `StackNode.next` and letting it go out of scope on its own without unpacking:

fn drop(&mut self) {
    let mut curr_head = self.head.take();
    while let Some(boxed_node) = curr_head {
        curr_head = boxed_node.next;
    }
}

fn drop(&mut self) {
    let mut curr_head = self.head.take();
    while let Some(boxed_node) = curr_head {
        curr_head = boxed_node.next;
    }
}

5. Iterating

Let’s now turn our stack into an iterable collection, so that it can be iterated over as in the next test case we’re going to add:

#[test]
fn iter_test() {
    let mut stack: Stack<String> = Stack::new();
    for i in 0..100 {
        stack.push(i.to_string());
    }
    for i in stack {  // Error. Stack is not an interator and does not implement `IntoIterator`.
        println!("{}", i);
    }
}

#[test]
fn iter_test() {
    let mut stack: Stack<String> = Stack::new();
    for i in 0..100 {
        stack.push(i.to_string());
    }
    for i in stack {  // Error. Stack is not an interator and does not implement `IntoIterator`.
        println!("{}", i);
    }
}

For now, this fails to compile because rustc does not know how to iterate over our Stack. The error message offers an actionable suggestion:

help: the trait `Iterator` is not implemented for `Stack<String>`
note: required for `Stack<String>` to implement `IntoIterator`

The for expression loops over elements of an iterator. The iterator can be provided explicitly, when the for expression already implements Itarator (and thus has the next() method) or implicitly, when the for expression implements std::iter::IntoIterator. As the error suggests, we only need to implement Iterator for our Stack so that the blanket implementation of IntoIterator, provided by the standard library for any Iterator, becomes available to the compiler.

Implementing Iterator for our Stack is trivial — next() simply calls pop().

impl <E> Iterator for Stack<E> {
    type Item = E;
    fn next(&mut self) -> Option<Self::Item> {
        self.pop()
    }
}

impl <E> Iterator for Stack<E> {
    type Item = E;
    fn next(&mut self) -> Option<Self::Item> {
        self.pop()
    }
}

Clearly, this iterator moves the values out of the collection, leaving it empty once the iterator is exhausted. This is not always what we want. Standard collections implement two more methods which return iterators over them, iter() and iter_mut() which borrow, without consuming, the elements of the collection in an immutable or mutable fashion respectively. The implementations of these methods is an advanced topic that I am leaving out, but if you’re interested, here’s an excellent write-up.

6. Deficiencies

Our implementation of the stack data structure is for educational purposes only. You should not use it in production for the following reasons:

Not thread safe. We will address this in a subsequent post.
Lacks many useful methods available on std::collections::Vec.

Rust Dust

Variant Experiment Server

bookworms

aws-wordpress