Thursday, February 16, 2012

Data Structure : Hash Table and Hash Function


Hash Table is a data structure that offers very fast insertion and searching. If data is distributed evenly in Hash Table, its insertion and searching time is close to constant. Size of the data does not affect its performance. Hash Table is so fast that it is used by computer and many applications for searching and insertion in large volume data, like spell checker. It is faster than a Tree and relatively easier to implement.

Hash table does have some disadvantages. Hash table is implemented using arrays and arrays are difficult to expand once it is created. Its performance degrades badly if it is too full. Also there is no convenient way to visit data in hash table in any order.

As I have said, Hash table is easy to implement, it is similar to array, get the index(hash) of the data then access the data at the index in array. But generating index or hash, both is similar term in Hash Table, is critical task. So we will discuss that.

Hashing
Hashing is transformation of sequence of characters into shorter fixed size key (Hash), which will represent the data in data set. Hash is used to index and retrieve data in hash table and database. Hashing is one way operation i.e. original string can not be generated back from its hash. Hashing should be referentially transparent i.e. it should always generate same hash key for same input. An ideal hash function should not generate same hash key for different inputs. If it is generating same hash key for different inputs, it is called collision. A hash function with low risk of collision is most preferred.  The hash key is used as index number to insert data in array.

Risk of collision increases as a large set of data is squeezed into smaller hash table. It
may appear that possibility of collision renders hashing scheme impractical, but there is several workaround. We will discuss Open Addressing and Separate chaining.

Open addressing
If any data is hashed to an index which is already occupied, search for empty cell in the array in systematic way, so the data will be stored in the array only. The number of steps required to find next vacant length is called Probe Length. We will discuss three ways of open addressing, which is used to find the next vacant cell.

Linear Probing
Linear probing searches for vacant cell sequentially. If index 10 is occupied, it will go to 11 and then 12, and so on, incrementing the index until it finds a vacant cell. The "sequence of filled cells" grows larger as it keep adding new data next to it and this forms cluster. This clustering can result in long probe length hence degrading the Hash Table performance.

Quadratic Probing
We saw that clustering can occur in Linear probing approach to open addressing. Once a cluster forms it tends to grow larger. Items that hash to any value in the range of cluster will step along and insert themselves at the end of cluster, thus making it even bigger. Quadratic probing is an attempt to keep clusters from forming by probing more widely separated cells. If the hash value is X, probe start from X+1 then X+4 and then X+9 and so on, increasing the hash value by square of step number.
STEP 1 – X+1^2
STEP 2 – X+ 2^2
STEP 3 – X+3^2
:
STEP N - X+N^2

Quadratic probe does eliminate the clustering problem that we saw with Linear probing, but it suffers from different clustering problem. This clustering happens because all the items which hash to a particular cell follow the same sequence in trying to find the vacant cell.

Double hashing
To eliminate both type of clustering, there is an approach: Double Hashing. Probe sequence is generated based on item, rather than same for every item. So, items that hash to same index number have different probe sequence. Second hash function which generates probe sequence should not be same as hash function used to generate index. Second hash function should never output 0; otherwise there would be no step.

Separate chaining
In open chaining, collision is handled by looking for vacant cell in same array.  Another approach is to insert linked list at each index. Items are hashed in usual way to an index and the item is inserted in the linked list at that index. If another item is hashed to the same index, the new item will also be inserted at the end of the linked list present at that index.

Sunday, January 29, 2012

Data Structure : Tree traversal algorithm


We have discussed tree and binary tree in our last post part1 and part2. We discussed most common operations on Tree and Binary Tree. In this post we will discuss Tree traversal.

Traversing a tree mean visiting each node in specified order. Traversing is not as commonly used as find, insert and delete operations. It is not fast, but is very useful in some situation. There are three simple orders in which tree can be transferred: Inorder, Preorder and Postorder. We will discuss these traversing orders only.

Inorder Traversal
This is the most commonly used traversal order for Binary tree. An inorder traversal of binary tree will cause all the nodes to be visited in ascending order.  This is useful when you need to create a sorted list of data in a binary tree. Simplest way to execute traversal is through recursion. Below are the three steps to implement inorder traversal
  1. Visit node’s left sub tree
  2. Visit node
  3. Visit node’s right sub tree

Jave code for inorder traversal is

private void inOrder(Node node)
{
if(node != null)
{
inOrder(node.leftChild);
System.out.print(node.getValue() + “ “);
inOrder(node.rightChild);
}
}

Preorder and Postorder Traversal
These are almost same as inorder traversal; only the sequence of traversing node, left subtree and right subtree is different. These are used generally in parsing and analysing expression like algebra expression.
Below are the steps to execute preorder traversal
  1. Visit node
  2. Visit node’s left sub tree
  3. Visit node’s right sub tree

Java code for prorder traversal is
private void preOrder(Node node)
{
if(node != null)
{
System.out.print(node.getValue() + “ “);
preOrder(node.leftChild);
preOrder(node.rightChild);
}
}

Similarly, below are the steps to execute postorder traversal
  1. Visit node’s left sub tree
  2. Visit node’s right sub tree
  3. Visit node

Java code for postorder traversal is
private void postOrder(Node node)
{
if(node != null)
{
postOrder(node.leftChild);
postOrder(node.rightChild);
System.out.print(node.getValue() + “ “);

}
}

Sunday, January 22, 2012

Data Structure: Tree and Binary Tree: Part 2


In the part 1, we discussed Tree in brief. In this part, we will discuss Binary Tree. Binary Tree has two additional constraints in addition to what regular tree has. Any node in Binary Tree should not have more than 2 children. Value at the left child should always be lesser than parent’s value and value at the right child should not be lesser than parent’s value. Only these two additional characteristics make any tree a Binary Tree.  Here is the basic Java code of Tree Node

class Node
{
Int value;
node leftChild;
node rightChild;
}

We will discuss 3 common operations of Binary Tree.
  1. Find
  2. Insert
  3. Delete

Find
Finding a node with a specific key is the simplest all the 3 major operation. Start from root and traverse down the tree until key is found or leaf node is reached. Here are three steps to
  1. Check the value of current node, if value is same as search key, return location.
  2. If value of current node is greater than search key, go to the left sub-tree
  3. If value of current node in lesser than search key, go to right sub-tree.

Start with setting root as current, execute the above three steps unless leaf node is hit or value has been found. If program hits leaf node traversing down the tree searching for key, it means search key not found.

public Node findValnode(int value)
            {
                        Node current = root;
                       
                        while(current != null)
                        {
                                    if(current.getValue() == value)
                                    {
                                                return current;
                                    }
                                    else if(value < current.getValue())
                                                current = current.getLeftChild();
                                    else
                                                current = current.getRightChild();
                                               
                        }
                        return null;
            }

Insert
To insert a node, we must find place to insert the node. Insertion is almost same as finding a node. Take value to insert as search key and insert the value at found place. If value is not found in the tree, it will be added to any leaf node as either left child or right child depending upon its value. Even if value is found and duplicate is allowed in tree, it will be added in right sub-tree to the found node. Below are the steps to insert a node.
  1. Create a node with the given value
  2. Search the given value in the tree.
  3. Insert the node as left child/ right child, depending upon its value, to the leaf node where Tree Traversing ended.
Below is the java code for Insertion in tree.

public void insert(int value)
            {
                        Node node = new Node();
                        node.setValue(value);
                       
                        if(root == null)
                        {
                                    root = node;
                                   
                        }
                        else
                        {
                                    Node current = root;
                                    Node parent = root;
                                    while(true)
                                    {
                                                parent = current;
                                                if(value < current.getValue())
                                                            current = current.getLeftChild();
                                                else
                                                            current = current.getRightChild();
                                               
                                               
                                                if(current == null && value < parent.getValue())
                                                {
                                                            parent.setLeftChild(node);
                                                            break;
                                                }
                                                else if(current == null && value >= parent.getValue())
                                                {          
                                                            parent.setRightChild(node);
                                                            break;
                                                }
                                               
                                    }
                        }
            }

Delete
Delete is the most complicated common operation of Binary Tree. First find the node to delete, use same approach as for Find operation. Once the node is found, there are three cases to consider before deleting the node
Case 1.             The node to be deleted is leaf node
Case 2.             The node to be deleted has only one child
Case 3.             The node to be deleted has two children
Case 1: It is simplest of all above cases. To delete a leaf node, change the appropriate child field in node’s parent to null

Case 2:  This case is also not that complicated. Node to be deleted has only two connections: one to its parent and second to its child. To delete the node, just remove the node from the sequence. Connect the node’s parent directly to its child.

Case 3: This is little tricky and complicated. If the deleted node has 2 children, we cannot just replace it with one of these children. There are various cases to consider.
Good news is there is trick to handle these cases. Find the inorder successor of the node to be deleted.  Literally speaking, inorder successor is the next highest value in the tree. It means, this successor should be in the right sub tree of the node to be deleted. For sake of information, inorder, preorder and postorder are the three ways a binary tree can be traversed. We will discuss the Tree Traversal Algorithms in detail in some other post. For this post, let not digress from the main topic and let keep this post simple. Here is java code to find inorder successor of a node (This is just for reference)

private Node getSuccessor(Node delNode)
            {
                        Node successorParent = delNode;
                        Node successor = delNode;
                        Node current = delNode.getRightChild();
                        while(current != null)
                        { // left children,
                                    successorParent = successor;
                                    successor = current;
                                    current = current.getLeftChild();
                        }
                        // if successor not
                        if(successor != delNode.getRightChild()) // right child,
                        { // make connections
                        successorParent.setLeftChild(successor.getRightChild());
                        successor.setRightChild(delNode.getRightChild());
                        }
                        return successor;
            }
Once we have successor in hand, there are two cases to handle to delete the node.

Successor is the right child of the node to be deleted
If the successor is the right child of the node to be deleted, then this right child does not have left child. Let refer node to be deleted as current node. The deletion can be done in two steps

  1. Remove reference to current from its parents and set reference to successor in that place
  2. Set successor left child to point to current’s left child.

Successor Is Left Descendant of Right Child of the node to be deleted
If the successor is the left descendant of right child of the node to be deleted, deletion takes place in four steps. Let refer node to be deleted as current node.
  1. Plug the right child of the successor into the left child of its parent.
  2. Plug the right child of the current to the right child of its successor
  3. Unplug current node from the its parent and set successor in that place
  4. Unplug left child of the current and set it into successor left child

Tuesday, January 10, 2012

Deleting all JS comments from JSP page at build time: Part 2

Writing an ANT task is easy; all you need to extend is Task Class.  I have written one, it is simple java program.  This ant task is now being added in our application build tasks. This will help us in keeping the developers comments for readability and future reference, but cleaning all the JS comments from final build, hence mitigating the reputation risk. Here is the ant task declaration and configuration.

<taskdef name="jsscript" classname="com.aman.task.JSPDeleteScriptCommensRegExp"/>

<target name="allcomments">
<fileset id="jsp.fileset" dir="${src.dir}" includes="**/*.jsp"/>

<jsscript matchInline="(^|[^:\-'\&quot;])(//+.*(?!(\*/))$)" replaceInline="\1" matchMultiline="(^|[^/])(/\*+([\S\s](?!(/+\*+)))+?\*+/+)" replaceMultiline="\1">
            <fileset refid="jsp.fileset"/>
      </jsscript>
</target>

com.aman.task.JSPDeleteScriptCommensRegExp is the class that I have written to delete all the JS comment.

matchInlineis the property of the above class, this hold the inline comment pattern and this will be replaced with replaceInline  “\1”: this is the backreference of regular expression. We are replacing only the first group as second group is all comment.

matchMultilineis the again a similar property to delete multiline comments and this will be replaced by replaceMultiline – “\1”.

Matching and replacing the inline and multiline comments are handled differently.
Every line is searched for and replaced with pattern one at a time in inline matching, whereas in multiline all the code inside the script block is first buffred in and then matched and replaced.

I am not describing the regex used in detail, but if you have any concern please do contact me.

Here is the full code for reference

package com.aman.task;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.Reader;
import java.io.Writer;
import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.tools.ant.BuildException;
import org.apache.tools.ant.DirectoryScanner;
import org.apache.tools.ant.Project;
import org.apache.tools.ant.Task;
import org.apache.tools.ant.taskdefs.optional.ReplaceRegExp;
import org.apache.tools.ant.types.FileSet;
import org.apache.tools.ant.types.RegularExpression;
import org.apache.tools.ant.types.Substitution;
import org.apache.tools.ant.util.FileUtils;
import org.apache.tools.ant.util.regexp.Regexp;


public class JSPDeleteScriptCommensRegExp extends Task {

       private long totalline;
       private long comment;
       private long mcomment;
       private Vector filesets;
       private RegularExpression inlineRegex;
       private Substitution inlineSubs;
       private RegularExpression multilineRegex;
       private Substitution multiLineSubs;
       FileUtils fileUtils = FileUtils.getFileUtils();
     
       public JSPDeleteScriptCommensRegExp() {
                     super();
                     this.filesets = new Vector();
                    
                         this.inlineRegex = null;
                     this.inlineSubs = null;
       }
       
       public void setMatchInline(String match) {
                    if (inlineRegex != null) {
                         throw new BuildException("Only one inline regular expression is allowed");
                     }
                             inlineRegex = new RegularExpression();
                         inlineRegex.setPattern(match);
       }
       
       public void setReplaceInline(String replace) {
                if (inlineSubs != null) {
                         throw new BuildException("Only one inline substitution expression is "
                                                + "allowed");
                   }
     
              inlineSubs = new Substitution();
              inlineSubs.setExpression(replace);
      }
     
       public void setMatchMultiline(String match) {
              if (multilineRegex != null) {
                   throw new BuildException("Only one multiline regular expression is allowed");
               }
              multilineRegex = new RegularExpression();
              multilineRegex.setPattern(match);
      }
     
      public void setReplaceMultiline(String replace) {
             if (multiLineSubs != null) {
                        throw new BuildException("Only one multiline substitution expression is "
                                                + "allowed");
                   }
     
             multiLineSubs = new Substitution();
             multiLineSubs.setExpression(replace);
      }
       
       public void addFileset(FileSet set) {
             filesets.addElement(set);
     }
       
       protected String doReplace(RegularExpression r, Substitution s, String input, int options) {
                      String res = input;
                     Regexp regexp = r.getRegexp(getProject());
                 
                   if (regexp.matches(input, options)) {
                       log("Found match; substituting", Project.MSG_DEBUG);
                       res = regexp.substitute(input, s.getExpression(getProject()),
                                               options);
                   }
           
                    return res;
         }
       
       protected void doReplaceInline(File f, int options) throws IOException {
           
            File temp = fileUtils.createTempFile("replace", ".txt", null);
        temp.deleteOnExit();

        Reader r = null;
        Writer w = null;

        try {
          
            r = new FileReader(f);
            w = new FileWriter(temp);
          
            BufferedReader br = new BufferedReader(r);
            BufferedWriter bw = new BufferedWriter(w);
            PrintWriter pw = new PrintWriter(bw);

            boolean changes = false;

           
            StringBuffer linebuf = new StringBuffer();
            String line = null;
            String res = null;
            int c;
            boolean scriptFound = false;
            Pattern scriptStart = Pattern.compile("<script[\\S\\s]*?>");
            Pattern inlineScript = Pattern.compile("<script[\\S\\s]*?((/>)|(</script>))");
            Pattern scriptend = Pattern.compile("</script");
            while((line = br.readLine()) != null)
            {
                  totalline++;
                  Matcher scriptMacther = scriptStart.matcher(line);
                  Matcher inlineMatcher = inlineScript.matcher(line);
                  Matcher sciptendMatcher = scriptend.matcher(line);
                  if(inlineMatcher.find())
                  {
                        // dont do anything just print it in final file and escape
                  }
                  else if(scriptMacther.find())
                  {
                        scriptFound = true;
                       
                  }
                  else if(sciptendMatcher.find())
                  {
                        scriptFound = false;
                       
                  }
                 
                  if(scriptFound)
                  {
                        res  = doReplace(inlineRegex, inlineSubs, line, options);
                        if (!res.equals(line)) {
                        changes = true;
                        comment++;
                    }

                    pw.println(res);
                  }
                  else
                  {
                        pw.println(line);
                  }
            }
            pw.flush();
            r.close();
            r = null;
            w.close();
            w = null;
            if (changes) {
                log("File has changed; saving the updated file", Project.MSG_VERBOSE);
                try {
                    fileUtils.rename(temp, f);
                    temp = null;
                } catch (IOException e) {
                    throw new BuildException("Couldn't rename temporary file "
                                             + temp, getLocation());
                }
            } else {
                log("No change made", Project.MSG_DEBUG);
            }
        } finally {
            try {
                if (r != null) {
                    r.close();
                }
            } catch (Exception e) {
                // ignore any secondary exceptions
            }

            try {
                if (w != null) {
                    w.close();
                }
            } catch (Exception e) {
                // ignore any secondary exceptions
            }
            if (temp != null) {
                temp.delete();
            }
        }  
      }
       
       protected void doReplaceMuliline(File f, int options) throws IOException {
                 
               File temp = fileUtils.createTempFile("replace", ".txt", null);
             temp.deleteOnExit();

             Reader r = null;
             Writer w = null;

             try {
               
                 r = new FileReader(f);
                 w = new FileWriter(temp);
               
                 BufferedReader br = new BufferedReader(r);
                 BufferedWriter bw = new BufferedWriter(w);
                 PrintWriter pw = new PrintWriter(bw);

                 boolean changes = false;

                 StringBuffer linebuf = new StringBuffer();
                 String line = null;
                 String res = null;
                 int c;
                 boolean scriptFound = false;
                 boolean startMatch = false;
                 Pattern scriptStart = Pattern.compile("<script[\\S\\s]*?>");
                 Pattern inlineScript = Pattern.compile("<script[\\S\\s]*?((/>)|(</script>))");
                 Pattern scriptend = Pattern.compile("</script");
                 while((line = br.readLine()) != null)
                 {
                 Matcher scriptMacther = scriptStart.matcher(line);
                 Matcher inlineMatcher = inlineScript.matcher(line);
                 Matcher sciptendMatcher = scriptend.matcher(line);
                 if(inlineMatcher.find())
                 {
                 }
                 else if(scriptMacther.find())
                 {
                        scriptFound = true;
                 }
                 else if(sciptendMatcher.find() && scriptFound)
                 {
                        startMatch = true;
                        scriptFound = false;
                 }
                
                 if(scriptFound || (!scriptFound && startMatch))
                 {
                        linebuf.append(line).append(System.getProperty("line.separator"));
                 }
                 else
                 {
                        pw.println(line);
                 }
                
                 if(startMatch)
                 {
                        line = linebuf.toString();
                        res  = doReplace(multilineRegex, multiLineSubs, line, options);
                        if (!res.equals(line)) {
                         changes = true;
                         mcomment++;
                        }

                     pw.print(res);
                     linebuf = new StringBuffer();
                     startMatch = false;
                 }
                 }
                 pw.flush();
                 r.close();
                 r = null;
                 w.close();
                 w = null;
                 if (changes) {
                     log("File has changed; saving the updated file", Project.MSG_VERBOSE);
                     try {
                         fileUtils.rename(temp, f);
                         temp = null;
                     } catch (IOException e) {
                         throw new BuildException("Couldn't rename temporary file "
                                                  + temp, getLocation());
                     }
                 } else {
                     log("No change made", Project.MSG_DEBUG);
                 }
             } finally {
                 try {
                     if (r != null) {
                         r.close();
                     }
                 } catch (Exception e) {
                     // ignore any secondary exceptions
                 }

                 try {
                     if (w != null) {
                         w.close();
                     }
                 } catch (Exception e) {
                     // ignore any secondary exceptions
                 }
                 if (temp != null) {
                     temp.delete();
                 }
             }  
            }


      public void execute() throws BuildException {
            if (inlineRegex == null) {
                  throw new BuildException("No inline expression to match.");
            }
            if (inlineSubs == null) {
                  throw new BuildException("Nothing to replace inline expression with.");
            }
           
            if (multilineRegex == null) {
                  throw new BuildException("No multiline expression to match.");
            }
            if (multiLineSubs == null) {
                  throw new BuildException("Nothing to replace multiline expression with.");
            }

           
            int options = 0;

           
            options |= Regexp.REPLACE_ALL;
        options |= Regexp.MATCH_CASE_INSENSITIVE;
   
        int sz = filesets.size();

       for (int i = 0; i < sz; i++) {
        FileSet fs = (FileSet) (filesets.elementAt(i));
        DirectoryScanner ds = fs.getDirectoryScanner(getProject());

        String[] files = ds.getIncludedFiles();

        for (int j = 0; j < files.length; j++) {
            File f = new File(fs.getDir(getProject()), files[j]);

            if (f.exists()) {
                try {
                  long tempTotalLine =totalline;
                  long tempComment = comment;
                  long tempMComment = mcomment;
                  String fileNameWithPath = f.getAbsolutePath();
                    doReplaceInline(f, options);
                    doReplaceMuliline(f, options);
                    log("File - "+fileNameWithPath +" Line - "+(totalline-tempTotalLine)+"            inline comments - "+(comment-tempComment)+"           multiline comments - "+(mcomment-tempMComment), Project.MSG_INFO);
                } catch (Exception e) {
                    log("An error occurred processing file: '"
                        + f.getAbsolutePath() + "': " + e.toString(),
                        Project.MSG_ERR);
                }
            } else {
                log("The following file is missing: '"
                    + f.getAbsolutePath() + "'", Project.MSG_ERR);
                  }
            }
      }
       log("Total Line - "+totalline+"          total inline comments - "+comment+"       total multiline comments - "+mcomment);
      }
     
     
}