Algorithms are fundamental to problem solving. This post and the few that follow will introduce algorithm design and implementation in the context of functional programming.

Algorithms consume data. so we will also look at functional ways of structuring complex data. We will learn about efficient implementations of trees, queues and dictionaries.

As functional programming eschews mutable state, implementing even familiar algorithms and data structures may look challenging. We will study patterns and techniques unique to the functional world that will help us face this challenge.

Note: Before proceeding, please make sure you are familiar with the basics of functional programming using SML.

Searching

We often lookup lists for things, like searching through our contacts for a friends phone number. Computers can search through lists really fast and there are multiple ways to make them search even faster.

Let’s begin our exploration with the basic search strategy ‐ look at each element in the list until we find the desired one.

Programs written in Java or C will use an array to store the items. The search function will loop over this array starting at index 0. When the searched item is found, the current loop index is returned. If the array is exhausted, a non-indexable value like -1 is returned, indicting value not found.

This simple search algorithm is known as linear search.

Functional programmers prefer to use the list data structure over mutable arrays. Pattern matching and recursive function calls takes the place of imperative looping constructs like for and while.

Here is how you would implement the linear search algorithm in SML:

fun linearSearch _ []     = false
  | linearSearch x (h::t) = if x = h then true
			     else linearSearch x t

The linearSearch function will check if the element at the head of the list (h) is equal to the element being searched for (x). If they are equal the function terminates by returning true, otherwise it will recurse on the tail of the list. When the list is exhausted, it will return false, indicating that x is not a member of the list.

As no imperative looping constructs are used, there is no chance for overshooting the array bounds or missing an index at the beginning or end of the array. There is no need to explicitly maintain a loop invariant through increment and assignment operations. Less book-keeping and explicit state tracking means less chance for bugs.

Let’s try out the function on a few simple lists:

- linearSearch 1000 [1, 10, 100, 10];
(* val it = false : bool *)

- linearSearch 2 [1, 5, 3, 10, 0, 12, 2];
(* val it = true : bool *)

The linear search function may be implemented more declaratively using the orelse logical operator in place of the if construct:

fun linearSearch _ []     = false
  | linearSearch x (h::t) = (x = h) orelse (linearSearch x t)

Both implementations works correctly. They traverse the list in a linear fashion and check if the current head is the value being searched for. But along with correctness we should also care about performance.

You might have already noticed the problem with linearSearch ‐ if the value being searched for is at the end of the list, the function will walk down the entire length of the list. In other words, the worst-case running time of this algorithm is proportional to the number of elements in the list.

Are there ways to make the search algorithm better? Unfortunately, with the list data structure, linear search is pretty much the best we can do.

Don’t despair! We will figure out data structures better suited for fast search operations. But to build new data structures, we need new raw materials. Let’s first have a look at how lists themselves are built. We will then use the same building material to create new data structures that will enable us to write better search algorithms.

Datatypes

For constructing new structures, SML provides a powerful mechanism called datatypes. A datatype definition consists of the name of the new structure (known as a type constructor) and one or more data constructors. Data constructors are operators used for creating values belonging to the new datatype.

Here is a simple datatype to enumerate the days of week:

datatype day_of_week =
	 Sunday
       | Monday
       | Tuesday
       | Wednesday
       | Thursday
       | Friday
       | Saturday

Note: It’s a common SML convention to capitalize the names of data constructors, as in Sunday, Monday etc.

We can write functions that pattern match on this datatype:

fun isWorkday Sunday   = false
  | isWorkday Saturday = false
  | isWorkday _        = true

New values of the datatype are created by invoking the appropriate data constructor:

- isWorkday Sunday;
(* val it = false : bool *)

- isWorkday Monday;
(* val it = true : bool *)

Constructors with Values

A datatype can be more general than a simple enumeration. This is achieved by data constructor expressions which consists of the constructor name, the keyword of and a type expression. Values created by the constructor can “wrap” data belonging to this type.

The following example shows a datatype for geometric shapes. Each shape’s constructor can also accept values that define the dimensions of the shape.

datatype shape =
	 Circle of real
	 | Square of real
	 | Rectangle of real * real

Now we can define a Circle with a real valued radius, a Square with a real valued length and a Rectangle having a length and height represented as a tuple of two real values.

This leads to the following generic function that can calculate the area of any geometric shape:

fun area (Circle (r))       = Math.pi *  r * r
  | area (Square (l))       = l * l
  | area (Rectangle (l, w)) = l * w

This function can be applied to any shape instance:

- area(Circle(10.34));
(* val it = 335.885263514 : real *)

- area(Square(5.0));
(* val it = 25.0 : real *)

- area(Rectangle(5.0, 3.5));
(* val it = 17.5 : real *)

Polymorphic Constructors

The type expressions in data constructors can be polymorphic. The following program defines a datatype that can be used to create a sequence of values belonging to the same type:

datatype 'a seq =
	 Empty
	 | Pair of 'a * 'a seq

The definition says - “a sequence is either empty or is a pair of a value belonging to 'a and another 'a seq”. A pair is encoded as a two-element tuple. The first element is the head of the sequence and the second is its tail. This is essentially how lists are represented in SML.

Constructing sequences can be a bit verbose:

- val xs = Pair (1, Pair(2, Pair (3, Empty)));
(* val xs = Pair (1,Pair (2,Pair #)) : int seq *)

SML has made this task easier by defining the [] and :: operators for list construction.

Exercise 1 Write a function size to report the number of elements in a sequence.

Exercise 2 Define the linearSearch function for sequences.

The option type

The SML basis library defines a very useful polymorphic datatype called option. It has two constructors NONE and SOME of 'a used to denote partial or optional values. For instance, trying to access the first element of an empty list can return NONE indicating a missing value and for a pair can return SOME h where h is the first element (head) of the pair.

fun first Empty         = NONE
  | first (Pair (h, _)) = SOME h

Exercise 3 Define a function rest that will return the tail of a sequence. Make use of the option type to handle the empty sequence.

Conclusion

Algorithms that search for things are fundamental to computing. The list data structure is limited to linear search because it does not allow us to directly access elements by index. On the bright side, lists allow us to express algorithms more declaratively because they are well suited for pattern matching and recursion.

We need to find data structures that can support faster search algorithms without losing the nice properties of lists. In this post we took the first steps in this direction by looking at the building blocks SML provide for defining new structures from scratch.

In the next post we will actually build new data structures suitable for fast retrieval of items. We will start with some straightforward implementations, evaluate their performance characteristics and look at ways to improve them.


Note that name and e-mail are required for posting comments