In the last post we saw a naive linear search algorithm for finding an item in a list. It’s worst-case complexity is proportional to the length of the list, i.e 𝒪(n).

We can search more efficiently in sorted arrays, in time proportional to the logarithm of the size of the array, i.e 𝒪(log(n)). But inserting an element into a sorted array is a costly operation. This operation will also mutate the state of the array, which will be frowned upon by puritans.

In this post and the one that follows, we see that tree data structures can buy us the best of both worlds. They preserve the nice functional properties of lists while giving logarithmic costs for search operations.

Binary Trees

The basic data structure we use is known as the binary tree. A binary tree can either be empty or can be formed from a node, known as the root. A node in a binary tree has some data and two child nodes - a right and left child - that are binary trees themselves.

A binary tree is represented by the following SML datatype:

datatype 'a btree =
	 Empty
	 | Node of 'a btree * 'a * 'a btree

We can define some functions to access the various parts of a binary tree. These functions return values of the option type so that invalid accesses can be reported using the value NONE.

fun root Empty = NONE
  | root (Node(_, a, _)) = SOME a
(* val root = fn : 'a btree -> 'a option *)

fun left Empty = NONE
  | left (Node(l, _, _)) = SOME l
(* val left = fn : 'a btree -> 'a btree option *)

fun right Empty = NONE
  | right (Node(_, _, r)) = SOME r
(* val right = fn : 'a btree -> 'a btree option *)

Here is how you would construct and use a binary tree:

- val t = Node(Node(Empty, 2, Empty), 1, Node(Empty, 3, Empty));
(* val t = Node (Node (Empty,2,Empty),1,Node (Empty,3,Empty)) : int btree *)

- root(t);
(* val it = SOME 1 : int option *)

- left(t);
(* val it = SOME (Node (Empty,2,Empty)) : int btree option *)

- right(t);
(* val it = SOME (Node (Empty,3,Empty)) : int btree option *)

Constructing a binary tree looks a bit verbose now, in a later post we will figure out how to add syntactic support for “tree literals” that will make their input and output more user-friendly.

Ordered Binary Trees

To search efficiently in a binary tree, the value in each node should satisfy the following properties:

  1. It must be greater than all values in the left child.
  2. It must be less than all values in the right child.

With these properties enforced, the search for value x can be accomplished with the following simple algorithm:

  1. a = value(currentNode)
  2. If x = a, then return true.
  3. If x < a, then search(x, left(currentNode)).
  4. If x > a, then search(x, right(currentNode)).

Adding elements in order

Before we implement the search algorithm, we need a way to create ordered binary trees. The algorithm for adding an element to an ordered binary tree is similar to the search algorithm we just described.

If the binary tree is empty, we add the new value to the root. If the new value is smaller than the value at root, we add it to the left child. Otherwise, the value is added to the right child. These steps are applied recursively until a place for the new value is found. If the value is already in the tree, we return the tree unmodified.

The following function implements this algorithm:

fun btree_add v Empty = Node(Empty, v, Empty)
  | btree_add v (t as Node(l, x, r)) =
    if v = x then t
    else if v < x then Node(btree_add v l, x, r)
    else Node(l, x, btree_add v r)

Note The as keyword is used to bind a pattern to a variable. The function can use this variable to refer to the argument represented by the pattern.

Here is how you would construct an ordered binary tree by repeatedly calling btree_add:

val t = btree_add 1 (btree_add 18 (btree_add 3 (btree_add 11 (btree_add 2 Empty))))

Again this is a rather verbose way to create a tree. I promise, we will find a better way. But right now let’s concentrate on solving the search problem.

We can see from the code of the btree_add function that our binary tree implementation preserves the nice properties of the list data structure - they are immutable, can be destructured via pattern matching and is amenable to programming with recursion. All this leads to compact, simple and clean functional expression of quite complex algorithms.

The search function

The binary tree created by the preceding program will be laid out in memory as:

             2
           /   \
	  1    11
              /  \
	     3   18

Let’s realize the search algorithm we described earlier as a function.

fun btree_search x Empty = false
  | btree_search x (Node(left, v, right)) =
    if x = v then true
    else if x < v then btree_search x left
    else btree_search x right

Imagine that we are searching for the value 1 in the sample tree. If the numbers were stored in a list, the search would have to traverse the entire length of the list to find this value, which was added last. But with the ordered binary tree, the algorithm deduces that 1 is less than the value at the root (i.e 2) and searches only the left child. This results in the item being found in just one more lookup.

More generally, if the tree is “sufficiently balanced”, the btree_search function ensures logarithmic search times even in the worst case. A tree is sufficiently balanced if the number of levels below the root is small even while managing large data-sets.

- btree_search 1 t;
(* val it = true : bool *)

- btree_search 2 t;
(* val it = false : bool *)

- btree_search 18 t;
(* val it = true : bool *)

Conclusion

We encountered the first version of our search tree in this post. We saw that, as long as the tree is kept balanced, the binary search function can yield good performance. But the responsibility of keeping the tree balanced rests with the caller of btree_add. For instance, consider creating a tree with entries 1, 2, 3, 4 and 5:

val t = btree_add 5 (btree_add 4 (btree_add 3 (btree_add 2 (btree_add 1 Empty))))

The resulting tree will look exactly like a linear list in memory:

   1
    \
     2
      \
       3
        \
	 4
	  \
	   5

The tree is not balanced and btree_search cannot do better than a linear search on this structure.

To make our lives easier, we need a tree structure that will automatically maintain its balance, so that searches will always complete in logarithmic time.

The next post will be about one of these dynamic and interesting tree structures.


Note that name and e-mail are required for posting comments