Friday, August 10, 2007

cloning adventure

Once I wanted to create clones of tree nodes, naming the clones of node say A as A-Copy-1, A-Copy-2, etc. My first thought was using the parent's child-count property and start counting copies from that value onwards. However, if the nodes are A, A-Copy-1 and A-Copy-2, when A-Copy-1 gets deleted the child-count becomes 2 and if that is used you get the same name as one of the existing children. Then I thought I would simply iterate through all children named A-Copy-i, for i = 1 to child-count, and create clones whenever that name was for take. That would reuse slots created by previous deletions but would make distinguishing the new nodes impossible. This was easily fixed by finding the maximum copy count of A and start creating clones after that so, if A-Copy-4 and A-Copy-6 sibling nodes are left in the tree, any new copies would be named as A-Copy-7, A-Copy-8 etc. That way you would always get a nice continuous set of cloned nodes in the tree. Even if you wanted to clone A-Copy-i node, that would quite naturally become A-Copy-i-Copy-j. I decided to use regular expressions to look for <node name>-Copy-<copy count> sibling nodes. That gave the extra benefit of having the copy count part available by means of a capturing group. To force any metacharacters in the node name to be treated like ordinary characters, I preceded every single character in the name with backslashes:

StringBuffer escapedNodeName = 
  new StringBuffer(nodeName.length()*2);
for (int i = 0; i < nodeName.length(); i++) {
  escapedNodeName.append('\\').append(nodeName.charAt(i));
}
RE exp = new RE(escapedNodeName.toString() + "-Copy-([1-9]\\d*$)");
This failed miserably! Guessed why? It is quite alright to escape a metacharacter, but it is another kettle of fish when "escaping" ordinary characters. When this code was used on a tree which had numbers in names, the escaped digits became backreferences and the match failed. I hastily changed the escaping part of the code to enclose the name within a quote (\Q and \E). No sooner had I done this than I realised that even that had its own fallacy: what if the node name had a \E in it? Proper quoting required some more effort:
StringBuilder escapedNodeName = 
  new StringBuilder(nodeName.length() * 2);
escapedNodeName.append("\\Q");
slashEIndex = 0;
int current = 0;
while ((slashEIndex = nodeName.indexOf("\\E", current)) != -1) {
  escapedNodeName.append(nodeName.substring(current, 
                                            slashEIndex));
  current = slashEIndex + 2;
  escapedNodeName.append("\\E\\\\E\\Q");
}
escapedNodeName.append(nodeName.substring(current, 
                                          nodeName.length()));
escapedNodeName.append("\\E");
Thankfully this is made available as java.util.regex.Pattern.quote(String) method since Java 1.5.

0 comments: