Monday, November 17, 2008

one-pass tree pruning

A friend of mine wondered the other day if one-pass tree pruning was possible. Common sense dictates that a two-pass algorithm is required to first mark the subtrees due for deletion and then a second walk to do the actual pruning. I have since revisited this thought and managed to get a one-pass proof of concept:


boolean pruneTree(Iterator parentIterator, TreeNode root, Set allowedNodeValues) {
    boolean pruneNode = true;

    Iterator nodeIterator = root.getChildren().iterator();
    while (nodeIterator.hasNext()) {
        TreeNode child = nodeIterator.next();
        pruneNode = pruneTree(nodeIterator, child, allowedNodeValues) && pruneNode;
        }

    pruneNode = pruneNode && !allowedNodeValues.contains(root.getValue());
    if (pruneNode && parentIterator != null) {
        parentIterator.remove();
    }

    return pruneNode;
}

This will remove parts of the tree whose node values are of no interest to us, otherwise retaining the same structural properties as those of the original tree:

    a                                               a
   / \                                             / \
  b   c                                           b   c
 / \  | \     pruneTree(it, a, {a,b,c,d}) -->    /    | 
d  e  b  e                                      d     b     
   |    / \                                                 
   c   d   f                                           

To make sure this was doing what was meant to do, a pretty print method was necessary. I used indentation and nested parentheses (with slight adjustments from Knuth's original):

void printTreeIndented(TreeNode root, int currentDepth) {
    System.out.println();
    for (int i = 0; i < currentDepth - 1; i++) {
        System.out.print("\t");
    }
    System.out.print(root.getValue());

    currentDepth++;
    for (TreeNode node : root.getChildren()) {
        printTreeIndented(node, currentDepth);
    }
}

void printTreeParenthesized(TreeNode root) {
    System.out.print(root);
    List children = root.getChildren();
    if (children.size() == 0) {
        return;
    }

    System.out.print('(');
    printTreeParenthesized(children.get(0));
    for (int i = 1; i < children.size(); i++) {
        System.out.print(", ");
        printTreeParenthesized(children.get(i));
    }
        
    System.out.print(')');
}

The first one will print:

a
  b
    d
    e
      c
  c
    b
    e
      d
      f

while the second will print a(b(d, e(c)), c(b, e(d, f))). Unfortunately after some runs I realised that the tree pruning algorithm's performance is dismal; it prunes by removing nodes bottom up, one by one. After all, it has no other way of knowing what to prune in one pass. Compare this with a fast mark pass and a second prune pass whereby whole sub-trees can be pruned. What's worse, as it's written, it runs out of heap space for large trees with more than a handful of levels. I guess the question is, does anybody know of any fast one-pass tree pruning algorithm?