Sunday, 5 March 2017

Building pointer-free trees with constant processing space

I've been working on a project called TeardownTree for the last few months. An interesting problem came up yesterday: given a sorted array of items, is it possible to build a pointer-free Binary Search Tree (PFT) out of it in a loop, using constant extra space?


Introduction


Some background to explain how this problem came up. But first a brief description of the data structure in question. Most BSTs are implemented using nodes and pointers. In my project, however, I am using a pointer-free version, which is represented in memory as follows:
  • The root is at the index $0$.
  • The index of the left child given a parent's index $p$ is $2p+1$, and the index of the right child is $2p+2$.
This representation is famously used in the Binary Heap, which is one of the reasons that data structure is as popular as it is. The pointer-free representation has both advantages and disadvantages over the pointer-based versions. One obvious advantage is that no memory is wasted on pointers - a nearly complete PFT occupies the same space as a sorted array. By "nearly complete" I mean that every level of the tree is full, perhaps except for the last one, which is filled from left to right.

Tree representation in memory


Initially, I used the obvious recursive version that built the tree based on the following root selection algorithm (in Rust):

/// Finds the point to partition n keys for a nearly complete binary tree
/// http://stackoverflow.com/a/26896494/3646645
fn build_select_root(n: usize) -> usize {
    // the highest power of two <= n
    let x = if n.is_power_of_two() { n }
            else { n.next_power_of_two() / 2 };

    if x/2 <= (n-x) + 1 {
        x - 1
    } else {
        n - x/2
    }
}

While it always felt "meh", I couldn't justify spending effort on a better solution, since the tree was intended to be constructed once and then reused many times.

But recently I decided to implement the $insert$ algorithm, and that's where things got more interesting. To implement a useful $insert$, the tree needs to balance itself, otherwise we will quickly run out of space (imagine numbers from 0 to 100 being inserted into an empty tree in the increasing order). In most pointer-based trees, balancing is done using tree rotations - a method that strongly depends on pointer manipulation and is therefore not applicable to the PFT.

But there is another way: partial rebuilding! Scapegoat Tree is one example of data structures that use this idea. In a nutshell: when, after inserting a new node $u$, we notice that the tree has become too high, we look for ancestor $v$ of $u$ s.t. $|T_v| \ll 2^{h(T_v)}-1$, where $T_v$ is the subtree rooted in $v$ and $h(T_v)$ is its height. We then balance $T_v$ by rebuilding it into a nearly complete tree. (It turned out upon closer examination that I cannot use Scapegoat Tree in the PFT, but I can use another algorithm along these lines. I will leave the details for another post.) So it goes like this:
  1. Traverse the subtree to be rebuilt in-order, moving every node into an auxiliary sorted array.
  2. Move nodes from the array back into the tree in a way that produces a nearly complete subtree as a result.

So, coming back to the problem at hand (and ignoring for the moment the distinction between full and partial rebuilding): is there a faster way to build a nearly complete tree from a sorted array than via recursive root selection?


Solution

It's all about the sequence


How do we approach the problem? A Scientist would start by stating the problem formally, along the lines of:

Find function $f$ satisfying: given $i \in X=[0, 1, \dots, n-1]$, find $j$, such that $i$ is the $j$'th item in a BFS traversal of a complete BST encoding $X$.
... and solving it from there. But I will leave that as an exercise to the reader 😁 (please do comment if you manage to formally prove the solution) and approach it as an Engineer. Let's fill a complete tree with numbers from $0$ to $n-1$ for some small $n$, such as $n=7$, and write down the expected result of $f(i)$ for every index:

In order to obtain the $f(i)$ table above, we manually traverse the tree in-order, writing down the current storage indices as we go. In other words, we go over the $\text{Sorted array}$ table and, for each entry $i$, find $i$ in the $\text{Tree storage}$ table and write down its index $k$ into the $f(i)$ table.

Now that we have this table of expected results, what can we do with it? We can look it up in The Online Encyclopedia of Integer Sequences! (If you actually try to do that, you will find that you need to switch to the 1-based notation to obtain a useful result; the reason for that will become clear shortly.) We end up with an infinite sequence that looks like this:
    \[0, 0, \underline{1}, 0, \underline{2, 1, 3}, 0, \underline{4, 2, 5, 1, 6, 3, 7}, 0, \underline{8, 4, 9, 2, 10, 5, 11, 1, 12, 6, 13, 3, 14, 7, 15}, 0, \dots \]

This sequence encodes $f(i)$ for every complete BST, with individual tree encodings (underlined) separated by zeros. All behind a simple formula:
    \[a(2n) = n,\] \[a(2n+1) = a(n)\]
Isn't it magical how an entire universe of trees fits in these two short equations? But we still need to figure out how to transform them into code. First, notice that for  $|T|=n$, the relevant span in the sequence starts at the offset $n+1$. Therefore, we can implement $f(n,i)$ as follows:
    \[f(n,i) = a(n+1+i) - 1\]

And this is how the generating equations translate into code:

fn a025480(k: usize) -> usize {
    let shift = (!k).trailing_zeros();
    k >> (shift+1)
}

Explanation: when $k$ is odd, $a025480$ shifts it right by one bit and checks again, etc -- until $k$ is even. So, in effect, it looks for the least-significant $0$ in the binary representation of $k$. Rust's std has a function to determine the trailing zeros, but not the trailing ones, which is why we invert the problem to finding the number of zeros in a negated $k$.



Nearly complete


Next piece of the puzzle: how can we expand this solution to nearly complete trees? But wait, why do we even need to do that? Does it matter that the tree is incomplete? Let's define $c(n)$ to compute the size of the smallest complete tree that can hold $n$ items: $c(n) = 2^h-1$, where $h=\lceil log_2(n+1)\rceil$. We also define for convenience $f_n(i) = f(c(n),i)$ and $f_D(i) = f(c(|D|), i)$.

We use the same $build$ algorithm as for complete trees: for the given index $i$ into a sorted array $D$ and array $T$ representing a BST to be built, we copy $D[i]$ into $T[f_D(i)]$. Does it work? Indeed, it does. Sort of.

The result is a valid BST, but it's not nearly complete. Note the hole at index $2$ in the $\text{Tree storage}$. The solution is as follows. Instead of going over the sorted array $D$ and directly copying each $D[i]$ into $T[f_D(i)]$, we only copy nodes when $f_D(i)$ is a valid index inside our nearly complete BST. We exploit the fact that all empty leaves (I'm abusing the terminology here a little - by "leaves" I mean all nodes at the bottom level of the complete tree, whether present or not) at the bottom level of a nearly complete tree are located at the very end of the storage and therefore have indices greater than those of any valid node.

In other words, every time we encounter a leaf with index $j$ satisfying $f_D(j) \geq |D|$, we proceed to the next in-order node without copying $D[j]$. To that end, we maintain a skipped nodes counter $s$. If $f_D(i+s) < |D|$, we copy the node and increment $i$, otherwise we don't copy, but increment $s$. (You can find the complete code down below.)


Partial rebuilding


One last problem remains: how can we expand the $build$ algorithm to support partial rebuilding, i.e. how can we make it work on a subtree $S=T_v$ rooted at index $r \neq 0$? The first impulse to shift the results of $f_S(i)$ by $r$ is obviously wrong -- indeed, for $r=3$, $left(r) = 7 \neq 4=left(0)+3$. But the problem is actually easily solved as soon as we write down the indices of the two nodes that are connected with identically shaped paths to their roots $r_a$ and $r_b$. Assume that the path starts with a $left$ branch: \[index(node_a) = left(r_a)=2r_a + 1\] \[index(node_b) = left(r_b)=2r_b + 1\] \[shift_{a,b}=2(r_b-r_a)\]

Clearly, if the path starts with a $right$ branch, the result is the same. It follows that, irrespective of the actual shape of the path, after $k$ steps the difference is
    \[shift_{a,b}=2^k (r_b-r_a)\] Since we have $r_a=0$, \[shift_{a,b}=2^k r_b\]

This means that, given an index $i$ into a sorted array $D$, the corresponding index in the tree is $t=f_D(i) + 2^d r$, where $d$ is the depth of the node relative to $r$, i.e. $t=f_D(i) + (c(f_D(i))+1) r$.


fn build<T>(sorted: &[T], root: usize, tree: &mut [T]) {
    let len = sorted.len();
    if len  == 0 {
        return;
    }

    let height = depth_of(len-1) + 1;
    let complete_count = (1 << height) - 1;

    let mut skipped = 0;
    let mut src_offs = 0;
    while src_offs < len {
        let inorder = src_offs+skipped;
        let local_idx = a025480(complete_count+1+inorder) - 1;

        if local_idx < len {
            let local_depth = depth_of(local_idx);
            let global_idx = local_idx + (root << local_depth);
            tree[global_idx] = sorted[src_offs];
            src_offs += 1;
        } else {
            skipped += 1;
        }
    }
}

(The above is not valid Rust because of safety checks in the compiler, but I removed the relevant code for brevity.)


Conclusion


A quick benchmark shows the new $build$ function to be 2.5-3 times faster than the recursive one.

Someone, somewhere probably thought this stuff up before. But if they did, my Google-fu was not enough to find it.