In the previous post we learned about binary trees and how they can be used to implement search algorithms with 𝒪(log(n)) worst-case time complexity. But this performance characteristic is lost for ordered data.

In this post we will develop a version of binary tree that knows how to keep its balance, in spite of the order in which elements are inserted. Search and insert operations on these trees will always maintain the logarithmic worst-case time complexity.

The type of binary trees that we will study here are known as red-black trees. In a red-black tree, every node with data is colored either red or black. So we have to augment the basic binary tree with this color bit:

datatype color = Red | Black

datatype 'a rbtree =
Empty
| Node of color * 'a rbtree * 'a * 'a rbtree


## Searching

The search function for red-black trees is similar to that of the basic binary tree. It’s just that the color field should be ignored:

fun rbtree_search x Empty = false
| rbtree_search x (Node(_, left, v, right)) =
if x = v then true
else if x < v then rbtree_search x left
else rbtree_search x right


## Inserting

The insert function is more complex because it needs to maintain the following two invariants:

1. No red node should have a red child.
2. Every path from the root to an empty node should have the same number of black nodes.

The following function can be called on a node to regain its balance:

fun balance ((Black, Node(Red, Node(Red, a, x, b), y, c), z, d)
| (Black, Node(Red, a, x, Node(Red, b, y, c)), z, d)
| (Black, a, x, Node(Red, Node(Red, b, y, c), z, d))
| (Black, a, x, Node(Red, b, y, Node(Red, c, z, d))))
= Node(Red, Node(Black, a, x, b), y, Node(Black, c, z, d))
| balance x = Node(x)


The balance function basically fixes a violation of invariant 1, i.e when the parent of a new node is red. This violation is fixed when balance deals with the black parent of a red node with a red child. This can come in four patterns, but the fix is same for all cases - rewrite the node red with two black children.

Now we can implement the function to insert a new node to a red-black tree:

fun rbtree_add x t =
let fun insert Empty = Node(Red, Empty, x, Empty)
| insert (t as Node(color, a, y, b)) =
if x < y then balance(color, insert a, y, b)
else if x > y then balance(color, a, y, insert b)
else t;
val Node(_, a, y, b) = insert t
in
Node(Black, a, y, b)
end


The rbtree_add function is similar to the btree_add function, with the following noticeable differences:

1. The balance function is called for the x < y and x > y conditions to enforce the invariants.
2. The root of the final tree is always forced to be black.

The balance function recursively balances to the top of the tree, but it might end up with a root red node with a red child. This case is handled by always forcing the root to be black.

Here is how you would construct a red-black tree and perform searchs on it:

val t = rbtree_add 5 (rbtree_add 4 (rbtree_add 3 (rbtree_add 2 (rbtree_add 1 Empty))));

rbtree_search 5 t
(* val it = true : bool *)

rbtree_search 3 t
(* val it = true : bool *)

rbtree_search 0 t
(* val it = false : bool *)


As an exercise, can you change the rbtree_search function to print the path it followed? This should show you that the tree is really balanced.

## An Optimization

A small optimization can be applied to balance - it could be split into two separate functions for fixing violations involving left and right children respectively.

fun lbalance ((Black, Node(Red, Node(Red, a, x, b), y, c), z, d)
| (Black, Node(Red, a, x, Node(Red, b, y, c)), z, d))
= Node(Red, Node(Black, a, x, b), y, Node(Black,c,z,d))
| lbalance x = Node(x)

fun rbalance ((Black, a, x, Node(Red, Node(Red, b, y, c), z, d))
| (Black, a, x, Node(Red, b, y, Node(Red, c, z, d))))
= Node(Red, Node(Black, a, x, b), y, Node(Black,c,z,d))
| rbalance x = Node(x)


The rbtree_add function can be re-defined as follows. This revision will get rid of several unnecessary tests for the balancing act:

fun rbtree_add x t =
let fun insert Empty = Node(Red, Empty, x, Empty)
| insert (t as Node(color, a, y, b)) =
if x < y then lbalance(color, insert a, y, b)
else if x > y then rbalance(color, a, y, insert b)
else t;
val Node(_, a, y, b) = insert t
in
Node(Black, a, y, b)
end


## References

This post is based on the red-black tree implementation from Purely Functional Data Structures by Chris Okasaki.

The Functional Approach to Programming shows a complete implementation of an AVL tree.

Both these books are great sources for SML programmers looking for practical implementations of functional algorithms.

Note that name and e-mail are required for posting comments