Tuesday, December 27, 2011

Data Structure: Advance Sorting Algorithm


In this post I am writing about two advance sorting algorithm: Shell Sort and Quick Sort. These sorts both operate faster than simple sorts. Shell sort take about O(N(logN)2) time and Quick sort takes O(N*logN) time.

Shell Sort
The Shell sort is good for medium sized array, perhaps up to few thousand items. Shell sort implementation is short and simpler than Quick sort. Shell sort is typically based on Insertion sort. To review Insertion sort, read my previous post. At any point in insertion sort, data item at the left of marker is internally sorted. Suppose there is an item, smallest of all at the right end of the array. To insert this item at proper position, all the N items must be shifted to make space at the front of the array. This is a drawback of Insertion sort. Shell sort addresses this problem. In Shell sort, widely spaced items are insertion sorted. This brings the items many spaces closer to their proper position. The spacing between the elements for insertion sort is called increment and traditionally represented as letter h.
Let’s sort an array of 10 items with increment 4. So, items at 0, 4 and 8 will be insertion sorted first, then 1, 5 and 9. This process will continue until all items have been 4-sorted, this means that all the items four cell apart are sorted among themselves. At the end of this first set of iteration, the array can be thought of four sorted sub arrays: (0, 4, 8), (1, 5, 9), (2, 6) and (3, 7). These sub arrays are interleaved but otherwise independent.
For an array of size 10, the initial gap is 4 but for larger array gap should start out much larger. This gap is then diminished repeatedly until it becomes 1. The sequence of intervals is called gap sequence or interval sequence. Donal Ervin Knuth recommended an interval sequence. The interval sequence can be generated by
An+1= 3*An + 1

Java Code for Shell Sort

public void shellSort()
{
int inner, outer;
long temp;
int h = 1;

while(h <= nElems/3)
h = h*3 + 1;

while(h>0)
{
for(outer=h; outer<nElems; outer++)
{
temp = theArray[outer];
inner = outer;


while(inner > h-1 && theArray[inner-h] >= temp)
{
theArray[inner] = theArray[inner-h];
inner -= h;
}
theArray[inner] = temp;
}
h = (h-1) / 3;
}
}
}

Quick Sort
Quick sort is most popular and fastest in majority of situation. Partitioning algorithm is required to understand Quick Sort. So I will discuss Partitioning algorithm first.

Partitioning data is dividing data items in two groups. All the items greater that key data are in one group and all the items less than key data are in another group. The key data item is called pivot. This pivot can be one among the data items to be divided or it can be any other data.

Java code for Partition

public int partition(int left, int right, long pivot)
{
int leftPtr = left - 1;
int rightPtr = right + 1;
while(true)
{
while(leftPtr < right && theArray[++leftPtr] < pivot);

while(rightPtr > left && theArray[--rightPtr] > pivot);

if(leftPtr >= rightPtr)
break;
else
swap(leftPtr, rightPtr);
}
return leftPtr; // return partition
}

Quick short uses partitioning algorithm to sort the data. Data is first partitioned using a pivot. Then left and right sub arrays are Quick sorted. Selection of pivot is very crucial in this sorting algorithm. We are taking the rightmost item in the array as pivot. So, after first round of quick sort pivot(right most element) is at proper position and left and right sub array will be quick sorted now.

Step to implement Quick Sort
  1. Partition the data using right most item as pivot and get the position of pivot
  2. QuickSort left to pivot position items.
  3. QuickSort pivot to right position items.

It is using recursive function to implement quick sort.

Java code for Quick Sort
public void recQuickSort(int left, int right)
{
if(right-left <= 0)
return;
else
{
long pivot = theArray[right];
int partition = partitionIt(left, right, pivot);
recQuickSort(left, partition-1);
recQuickSort(partition+1, right);
}
}

Since we have chosen to take right most element as a pivot, we have to modify the Partition code given above in the post.

Monday, December 19, 2011

Data Structure: Linked List


I have discussed data structure Stack, Queue and variations of Queue in last post. Today we will discuss commonly used data structure Linked list. As it is implied from name, it is a list of data linked with each other by some mean. Stack and Queues can be implemented by array or linked lists.

Array and Linked list have their own advantages and disadvantages

Disadvantages of Array which are addressed by Linked list
  1. Arrays are of fixed size whereas linked list has no size limit.
  2. Insertion and deletion is slow in Array, as many items are shifted to create the place for new item or to fill the due to deletion. These activities are fast in linked list.


What is linked list?
Linked list contains data and reference to next link. Linked list is set of data linked to one another.  Linked list provides set of functions to iterate over the linked list, to delete, insert data.

Java Code of Data item in linked list

public class Link {
            public int data; //this can be any data or reference to any data
            public Link next; // refer to next item in linked list
           
            public Link(int newData) {
                        this.data = newData;
            }
            public int getData() {
                        return data;
            }
            public void setData(int data) {
                        this.data = data;
            }
            public Link getNext() {
                        return next;
            }
            public void setNext(Link next) {
                        this.next = next;
            }
            public void displayData()
            {
                        System.out.println(this.data);
            }
}


 “next” value is null, when it is the last item in linked list.

I will describe simple linked list, this has very limited function to carry out on Linked list. Following are the action which can be carried out on simple linked list
  1. Insert data at the beginning of linked list
  2. Delete data at the beginning of linked list
  3. View all the data in the list.

Java code for Linked list application class

class LinkList
{
            private Link first; // refer to first link in the list
           
            public LinkList()
            {
                        first = null;
            }
            public boolean isEmpty()
            {
                        return (first==null);
            }
            public void insertFirst(int id, double dd)
            {
                        Link newLink = new Link(id);
                        newLink.next = first;
                        first = newLink;
            }
            public Link deleteFirst()
            {
                        Link temp = first;
                        first = first.next;
                        return temp;
            }
            public void displayList()
            {
                        Link current = first;
                        while(current != null)
                        {
                                    current.displayData();
                                    current = current.next;
                        }
            }
}

Other variations of linked list
Double ended list:  It is same as ordinary linked list with an additional feature: an additional reference to last link. This gives it a benefit of adding data item directly at the end of the list rather than iterating over whole list and then inserting the data.

Doubly linked list: This is also same as ordinary linked list, but it has an additional feature which let you move in both directions. This improves traversal activity required in some of the activity.

Tuesday, December 13, 2011

Data Structure: Stack and Queues


In this post I will discuss two basic data storage structure: the Stack and the Queue.

These data structure are used mainly to carry out some task rather than actually storing data in system.  These are programmer’s tool.
Stack and Queues are more abstract entities than array and many other data storage structures. They are defined primarily by their interface. Some set of operations are exposed via the interface, which can be carried out on them.

Stacks
A Stack allows access to last item only. Stack is analogous to disc cake box. You can only take disc which is on the top. In Stack, last item inserted is the first item to be removed (LIFO)

Following operations can be carried out on Stack :-
  1. Push – (This insert the data. Data is inserted at the top of the Stack)
  2. Pop – (This deletes the data from the Stack and return the value.)
  3. Peek – (This just sneak and return the value at the top of the Stack)
  4. Size – (Returns the current size of the Stack)


The underlying implementation of Stack is either array or link list. We will discuss linklist in separate post.

Java implementation of Stack and its interface operations.

class StackImpl
{
private int maxSize; // size of stack array
private long[] stackArray;
private int top; // pointer to the top item
public StackImpl(int s)
 {
maxSize = s; 
stackArray = new long[maxSize]; 
top = -1;
}
public void push(long item)
{
stackArray[++top] = item;
}
public long pop() 
{
return stackArray[top--];
}
public long peek()
{
            return stackArray[top];
}
}

Queue
In Queue, item first inserted is the first item to be removed (FIFO). This is analogous to line in front of movie ticket counter.  There are various queue doing their job in our system, like print request queue. In Queue, items are removed from front and inserted at rear.
Following functionalities are supported by Queue:-
  1. Insert - (To insert an item at the rear)
  2. Remove - (To get an item from front and then delete it from Queue)
  3. Peek - (To sneek and get the value at front)
  4. Size - (Returns the size of the Queue)


The underlying implementation of Queue is also either array or linked list.

Java implementation of Queue

class QueueImpl
{
private int maxSize;
private long[] queArray;
private int front;
private int rear;
private int nItems;
public QueueImpl(int s)
{
maxSize = s;
queArray = new long[maxSize];
front = 0;
rear = -1;
nItems = 0;
}
public void insert(long j)
{
if(nItems != maxSize)
{
queArray[++rear] = j;
nItems++;

}
}
public long remove()
           {
                         long temp = -99999;
                        if(nItems != 0)
                        {
long temp = queArray[front++];
nItems--;
                        }
                       return temp;
}
public long peek()
{
return queArray[front];
}
public int size()
{
return nItems;
}

Other variations of Queue are Circular Queue, Deques and Priority Queue.
Circular Queue
To avoid the problem of not being able to insert data in Queue even when it is not full, the front and rear wraps around to the beginning of array. This special implementation of Queue is Circular Queue.

Deque
A Deque is a Double ended queue. Items can be inserted and deleted at either end. A Deque provides a more versatile data structure than Stack and Queue. It can be used as Stack and Queue depending upon requirement.

Priority Queue
Priority Queue is more specialized data structure than Stack and Queue. Items are ordered by key in Priority Queue as opposed to Queue in which items are not ordered.  Items are inserted in proper position in Priority Queue. All other functionality remains same as it is in Queue.







Sunday, December 4, 2011

Data Structure : Sorting Algorithm

Sorting is indispensable part of data structure and algorithm .

After almost 3-4 years, I was studying sorting for last 1 week. In this blog, I will be covering basic sorting algorithms for people like me who have been out of touch.

There are 3 basic sorting algorithms
  1. Bubble Sort
  2. Selection Sort
  3. Insertion Sort

Bubble Sort
This is simplest and slowest of all the sorting algorithms. Its simplicity is the reason, I have thought of explaining this first.
This sorting algorithm can be explained in three simple steps
  • Compare two number
  • If X > Y and X is left Y in list, swap them
  • Move 1 step right

After first round of iteration through list of N values, highest value bubbles up in the N-th position.
Similarly, after second round of iteration through list of N-1 values, 2nd highest value bubbles up in the N-1 th position and so on. So, if size of the list is N, all the items are sorted after N-1 iteration.

Efficiency of Bubble sort
As you can see, if number of items is N, there are N-1 comparisons in first pass, N-2 comparisons in second pass and so on. Hence total number of comparisons to sort N number of items is

(N-1)+(N-2)+(N-3)+...+1 = N*(N-1)/2
If the data is random, swap is necessary half the time, so there will be about (N^2)/4 swaps.

Java Code block for Bubble Sort

public void bubbleSort() {
int out, in;
for(out=nElems-1; out>1; out--) 
    for(in=0; in<out; in++)
          if( a[in] > a[in+1] ) swap(in, in+1);
} // end bubbleSort() 


Selection Sort
Selection sort improves on bubble sort by reducing the number of swap. Number of comparison remains the same as the bubble sort. Steps to execute the selection sort are as follows
  • Select left most index of unsorted list and marked it as MIN
  • Pass through the unsorted list of items and store the poistion of any item in IN if the value of the item is less than value at position MIN
  • Swap values at IN and MIN

After first round of iteration, lowest value item is selected and placed in 1st position. Similarly after second pass, second lowest value is selected and placed in 2nd position and so on. So, if size of the list is N, all the items are sorted after N-1 iteration.

Efficiency of Selection Sort
The selection sort perform the same number of comparison as the bubble sort: N*(N-1)/2. But there is lesser number of swap than Bubble sort.

Java Code block for Selection Sort

public void selectionSort() {
   int out, in, min;
   for(out=0; out<nElems-1; out++) {
       min = out;
       for(in=out+1; in<nElems; in++)

             if(a[in] < a[min] ) min = in;
             swap(out, min);
   } // end for(out)
} // end selectionSort()
 

Insertion Sort
Insertion Sort is faster than Bubble sort and Selection sort in almost all cases, though the number of comparisons is same. This sorting has a concept of partial sorting. It assumes that all the items in the left of a marked position M is partially sorted, i.e. sorted among themselves. 
In Bubble sort and Selection sort, a group of data item is completely sorted at any given time; in insertion sort a group of data is partially sorted. Steps to implement insertion sort are as follows
  • Select item in Mth position, M-1 items in left of the M is partially sorted
  • Insert the selected item in (M-1) sorted items at correct position, shift the items in the left of M to right to make place for insertion
  • Now items in left of M+1 is partially sorted, increment the value of M and repeat unless all the items in the list is placed.s

As you can see, items at 1st and 2nd place are sorted amoung themselves after first round and item at poistions from 1-3 are partailly sorted after second round. So after N-1 round, all the N items are sorted.

Efficiency of Insertion sort
On the first pass it compares maximum 1 item, on second it comares maximum 2 and so on up to maximum comparison of N-1 on last pass. Hence total number of comparison is
1+2+3+4.... + (N+1) = N(N-1)/2

Number of copies to move and insert the data is almost same as numbers of comparison. However since, copying data takes less time than swap, so it is twice as fast as Bubble sort and faster than Selection sort for random data.
For the data that is already sorted or almost sorted this Sorting technique is much better. However for inversely sorted data, this is not better than Bubble Sort as every possible Shift and Comparison is carried out.

Java Code for Insertion Sort

public void insertionSort() {
    int in, out;
   for(out=1; out<nElems; out++) {
       long temp = a[out];
       in = out;
       while(in>0 && a[in-1] >= temp) // until one is smaller,
       {
                a[in] = a[in-1]; // shift item right,
                --in;
        }
    a[in] = temp; } // end for

} // end insertionSort()