A friend of mine wondered the other day if one-pass tree pruning was possible. Common sense dictates that a two-pass algorithm is required to first mark the subtrees due for deletion and then a second walk to do the actual pruning. I have since revisited this thought and managed to get a one-pass proof of concept:
boolean pruneTree(Iterator parentIterator, TreeNode root, Set allowedNodeValues) {
boolean pruneNode = true;
Iterator nodeIterator = root.getChildren().iterator();
while (nodeIterator.hasNext()) {
TreeNode child = nodeIterator.next();
pruneNode = pruneTree(nodeIterator, child, allowedNodeValues) && pruneNode;
}
pruneNode = pruneNode && !allowedNodeValues.contains(root.getValue());
if (pruneNode && parentIterator != null) {
parentIterator.remove();
}
return pruneNode;
}
This will remove parts of the tree whose node values are of no interest to us, otherwise retaining the same structural properties as those of the original tree:
a a
/ \ / \
b c b c
/ \ | \ pruneTree(it, a, {a,b,c,d}) --> / |
d e b e d b
| / \
c d f
To make sure this was doing what was meant to do, a pretty print method was necessary. I used indentation and nested parentheses (with slight adjustments from Knuth's original):
void printTreeIndented(TreeNode root, int currentDepth) {
System.out.println();
for (int i = 0; i < currentDepth - 1; i++) {
System.out.print("\t");
}
System.out.print(root.getValue());
currentDepth++;
for (TreeNode node : root.getChildren()) {
printTreeIndented(node, currentDepth);
}
}
void printTreeParenthesized(TreeNode root) {
System.out.print(root);
List children = root.getChildren();
if (children.size() == 0) {
return;
}
System.out.print('(');
printTreeParenthesized(children.get(0));
for (int i = 1; i < children.size(); i++) {
System.out.print(", ");
printTreeParenthesized(children.get(i));
}
System.out.print(')');
}
The first one will print:
a
b
d
e
c
c
b
e
d
f
while the second will print a(b(d, e(c)), c(b, e(d, f)))
.
Unfortunately after some runs I realised that the tree pruning algorithm's performance is dismal; it prunes by removing nodes bottom up, one by one. After all, it has no other way of knowing what to prune in one pass. Compare this with a fast mark pass and a second prune pass whereby whole sub-trees can be pruned. What's worse, as it's written, it runs out of heap space for large trees with more than a handful of levels. I guess the question is, does anybody know of any fast one-pass tree pruning algorithm?