Data Structures
Data Structures
Data structure refers to a collection of data and its arrangement within the computer system with well defined operations, behaviour and properties, so that it can be used efficiently.
Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address—a bit string that can be itself stored in memory and manipulated by the program. Data structure can be broadly classified into two types –
1. Linear Data Structure
2. Non Linear Data Structure
In Linear Data structure the data elements are arranged in such a way that a successor data element has only one predecessor. They can either be sequential like an ARRAY or non sequential, like a list more particularly called LINKED LIST.
In a non Linear Data Structure data elements are arranged in such a way that two or more successor has a same predecessor or in other words a predecessor has two or more successor, like for e.g. TREES
Each of this data elements are stored in the physical memory of the computer called cells, the memory space can be allocated to these data elements in two different ways, statically and dynamically –
1. Static Memory allocation is a technique which reserves a fixed amount of memory before the actual processing takes place, thus the number of data elements to be stored must be pre determined.
E.g. ARRAYS are allocated memory in this way
2. Dynamic Memory allocation technique allocates memory during the actual execution of the program itself. This technique is superior to static and is preferred as it facilitates release of allocated memory space when not required.
E.g. Data structures like LINKED LISTS and TREES allocate memory in this way.
STACKS and QUEUES
Stack is a linear data structure that can be implemented either using an ARRAY or LINKED LIST. A stack follows LIFO Architecture (which stands for Last In First Out) where insertions and deletions takes place at only one end i.e. the TOP of the stack, which means the last inserted will be the first one to be deleted.
The principal (or only) operations on the collection are –
1. PUSH, i.e. addition of an entity to the collection, at the top and
2. POP, i.e. removal of an entity, from the top
A stack may be implemented to have a bounded capacity (ARRAY). If the stack is full and does not contain enough space to accept an entity to be pushed, the stack is then considered to be in an overflow state. The pop operation removes an item from the top of the stack. A pop either reveals previously concealed items or results in an empty stack, but, if the stack is empty, it goes into underflow state, which means no items are present in stack to be removed.
When implemented in an unrestricted /unbound manner as a LINKED LIST it behaves dynamically, i.e. grow with increase in number of elements or shrink with decrease of elements.
Towers of Hanoi
One of the most interesting applications of stacks can be found in solving a puzzle called Tower of Hanoi. According to an old Brahmin story, the existence of the universe is calculated in terms of the time taken by a number of monks, who are working all the time, to move 64 disks from one pole to another. But there are some rules about how this should be done, which are:
move only one disk at a time, for temporary storage, a third pole may be used, a disk of larger diameter may not be placed on a disk of smaller diameter.
Queue is a linear data structure that can be implemented either in a restricted manner like ARRAY or unrestricted like LINKED LIST. A queue follows FIFO Architecture (which stands for First In First Out) where insertions takes place at one end i.e. REAR and deletions takes place at the other end i.e. the FRONT of the stack, which means the first inserted will be the first one to be deleted.
The principal (or only) operations on the collection are
1. Enqueue, addition of entities to the rear terminal position, and
2. Dequeue, removal of entities from the front terminal position
When implemented as a LINKED LIST it can behave dynamically, i.e. grow with increase in number of elements or shrink with decrease of elements. In array implementation if the array based queue goes out of storage space will result in Queue Overflow.
In both the above cases of Stack as well as queue, be it implemented using an ARRAY or a LINKED LIST will result in Stack/Queue Underflow if the any of the above data structures is empty and we carry out a deletion procedure.
Types of Queues
1. Priority Queue – A queue in which entry of the elements Is based according to the priorities of the element is called Priority Queue, in this the higher priority elements are nearer to the front whereas less priority elements are nearer to the rear.
2. Deque(or DeQueue) – A deque (not to be confused with Dequeue which means removal, ) is a double ended queue in which elements can be added or removed from the either ends.
3. Circular Queue – A queue in which the last node is connected to the first node so that insertion and deletion of elements are made in a circular mode.
LINKED LIST is a linear collection of data elements in a nonsequential manner where each data element is known as a NODE. These NODES are the smallest unit in a LINKED LIST which consists of two parts –
1. INFO (which contains data information)
2. POINTER (which contains the address of the next node)
A linked list is a collection of structures ordered not by their physical placement in memory but by logical links that are stored as part of the data in the structure itself. It is not necessary that it should be stored in the adjacent memory locations. Every structure has a data field (INFO) and an address field (POINTER). The Address field contains the address of its successor. Linked list can be singly, doubly or multiply linked and can either be linear or circular.
A linked list with three nodes contain two fields each: an integer value and a link to the next node
A linked list with a single node containing two fields each:
an integer value and a link to the next node
Sample Code Snippet
public class IntNode
{
public int value;
public IntNode link;
public IntNode(int v)
{
value = v;
}
}
Advantages of Linked List over Arrays
Compared to arrays, linked data structures allow more flexibility in organizing the data and in allocating space for it. In arrays, the size of the array must be specified precisely at the beginning, this can be a potential waste of memory. A linked data structure is built dynamically and never needs to be bigger than the programmer requires. It also requires no guessing in terms of how much space you must allocate when using a linked data structure. This is a feature that is key in saving wasted memory.
In array, the array elements have to be in contiguous (connected and sequential) portion of memory. But in linked data structure, the reference to each node gives us the information where to find out the next one. The nodes of a linked data structure can also be moved individually to different locations without affecting the logical connections between them, unlike arrays.
With due care, a process can add or delete nodes to one part of a data structure even while other processes are working on other parts. Memory can be utilized more efficiently by using this data structures. Memory is allocated as per the need and when memory is not further needed, deallocation is done.
General disadvantages of a Linked List
Linked data structures may also incur in substantial memory allocation overhead (if nodes are allocated individually) and frustrate memory paging and processor caching algorithms (since they generally have poor locality of reference).
In some cases, linked data structures may also use more memory (for the link fields) than competing array structures. This is because linked data structures are not contiguous. Instances of data can be found all over in memory, unlike arrays.
In arrays, nth element can be accessed immediately, while in a linked data structure we have to follow multiple pointers so element access time varies according to where in the structure the element is.
In some theoretical models of computation that enforce the constraints of linked structures, such as the pointer machine, many problems require more steps than in the unconstrained random access machine model.
Polish Notations
Polish notation, also known as Polish prefix notation or simply prefix notation is a form of notation for logic arithmetic and algebra. Its distinguishing feature is that it places operators to the left of their operands. If the parity of the operators is fixed, the result is a syntax lacking parentheses or other brackets that can still be parsed without ambiguity.
The Polish logician Jan Łukasiewicz invented this notation in 1924 in order to simplify sentential logic.
The term Reverse Polish notation also called Polish postfix notation is sometimes taken (as the opposite of infix notation), in which the operator is placed after the operands.
Expression evaluation and syntax parsing
Calculators employing Reverse Polish Notation use a stack structure to hold values. Expressions can be represented in prefix, postfix or infix notations and conversion from one form to another may be accomplished using a stack. Many compilers use a stack for parsing the syntax of expressions, program blocks etc. before translating into low level code. Most programming languages are contextfree languages, allowing them to be parsed with stack based machines.
Infix Notation 
Prefix Notation 
Postfix Notation 
A + B 
+ AB 
AB + 
(A – C) X B 
X – ACB 
AC – B X 
A + (B X C) 
+ A X BC 
ABC X + 
Infix Notation Prefix Notation Postfix Notation
A + B + AB AB +
(A – C) X B X – ACB AC – B X
A + (B X C) + A X BC ABC X +
Algorithm to Convert an Infix expression into Postfix (Reverse Polish Notation)
Step 1. Push “(“ at the beginning of the expression and a “)” at the end
Step 2. Scan the expression ‘X’ from left to right and repeat steps 3 to 6 until stack is empty.
Step 3. If an OPERAND is encountered, add it to ‘Y’ the postfix expression
Step 4. If a left parenthesis “(“ is encountered push it to STACK
Step 5. If an operator is encountered then repeatedly POP from stack top each operator which has
same or higher precedence than the operator encountered and add it to ‘Y’
Step 6. Add the encountered operator to STACK
Step 7. If a right parenthesis is encountered then repeatedly POP from the stack TOP all the
operators and add it to ‘Y’ until a right “)” parenthesis is encountered.
Step 8. Remove these parenthesis (Do not add to ‘Y’)
Step 9. Exit.
Evaluation of an infix expression that is fully parenthesized
(((2 * 5) – (1 * 2)) / (11 – 9))
Input Symbol 
Stack (from bottom to top) 
Operation 
( 

( 

( 

( 
10 – 

( 
8 / 

) 
10Feb 
1 * 2 = 2 & Push 
) 
8 
10 – 2 = 8 & Push 
) 
08Feb 
11 – 9 = 2 & Push 
) 
10 
2 * 5 = 10 and push 
) 
4 
8 / 2 = 4 & Push 
* 
10 – 1 * 

* 
2 * 

– 
10 – 

– 
8 / 11 – 

/ 
8 / 

1 
10Jan 

11 
08Nov 

2 
10 – 1 * 2 

2 
2 

5 
2 * 5 

9 
08/11/09 

New line 
Empty 
Pop & Print 
Non Linear Data Structure – TREES
In computer science, a tree is a widely used abstract data type (ADT) or data structure implementing this ADT that simulates a hierarchical tree structure, with a root value and subtrees of children, represented as a set of linked nodes.
A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the “children” or daughter), with the constraints that no reference is duplicated, and none points to the root.
Alternatively, a tree can be defined abstractly as a whole (globally) as an ordered tree, with a value assigned to each node. Both these perspectives are useful: while a tree can be analyzed mathematically as a whole, when actually represented as a data structure it is usually represented and worked with separately by node (rather than as a list of nodes and an adjacency list of edges between nodes, as one may represent a digraph, for instance).
For example, looking at a tree as a whole, one can talk about “the parent node” of a given node, but in general as a data structure a given node only contains the list of its children, but does not contain a reference to its parent (if any).
TREE Terminology
1. A node is a structure which may contain a value or condition, or represent a separate data structure which could be a tree itself.
1. A node that has a child is called the child’s parent node or ancestor node, or superior node.
2. A node has at most one parent.
3. An internal node (also known as an inner node, inode for short, or branch node) is any node of a tree that has child nodes.
4. An external node (also known as an outer node, leaf node, or terminal node) is any node that does not have any child nodes.
5. The topmost node in a tree is called the root node, which always grows downwards.
6. A tree without any node is called Empty Tree.
7. Order of a tree is the maximum number of nodes connected to any given node of the tree. So a tree with max 2 nodes is called a Binary Tree.
8. Being the topmost node, the root node will not have a parent.
9. Nodes belonging to the same parent are called siblings.
10. All other nodes can be reached (or parsed) by following edges or links. (In the formal definition, each such path is also unique.)
11. All the edges or links is collectively called path.
12. The nodes of a tree can be reached in 3 different ways this is called Traversal.
13. There are 3 types of traversals – Inorder, Preorder and Postorder.
14. A tree is allotted a Level number at different levels of its hierarchy; hence the root node is always at level 0 whereas the level of nodes C, E & H above is 3.
15. Height or Depth of a tree is one more than its deepest level number.
16. Conventionally, an empty tree (tree with no nodes, if such are allowed) has depth and height −1.
17. A Full Binary Tree or a Strictly Binary Tree is a tree in which every node other than the leaves has two children.
18. A proper binary tree is an ordered tree in which each internal node has exactly two children.
19. A perfect binary tree is a full binary tree in which all leaves have the same depth or same level, and in which every parent has two children.
20. A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible