In the last part of this series, we got our local SML system running and learned how to use it to evaluate simple arithmetic expressions.

This post is a tutorial introduction to the SML language. We will cover:

  1. basic data types
  2. comparison and logical operations
  3. conditional expressions
  4. variable and function declarations
  5. scripting and compiling

Basic Data types

Numbers

The basic numeric types supported by SML are integers and real numbers.

Here are some calculations with integers:

- 1 + 2 * 3;
(* val it = 7: int *)
- 10 div 5;
(* val it = 2: int *)
- 10 mod 5;
(* val it = 0: int *)

The Int module define more functions on integers. For example, to convert an integer to a string, call the Int.toString function:

- Int.toString(12);
(* val it = "12" : string *)

Real numbers or floating-point numbers are represented conventionally in SML.

- 3.14159 * 10.0 * 10.0;
(* val it = 314.159 : real *)

- (3.14159 * 10.0 * 10.0) / 2.0;
(* val it = 157.0795 : real *)

Note that the div and mod operators can be applied only to integers. Real division has to be accomplished by the / the operator. For finding the remainder of two real numbers, use the rem function from the Real module:

- Real.rem(10.0, 4.2);
(* val it = 1.6 : real *)

The way arithmetic operators are defined in SML might seem a bit inflexible to you, especially if you have programmed in a dynamically typed language like Python. But keep in mind that in SML all expressions are statically checked by the compiler. This means an SML program will not fail at run-time due to type mismatches in function calls.

The strict type checking also enable the compiler to do many optimizations that are difficult or even impossible to achieve in dynamically typed languages.

So this seeming inflexibility lands you in a winning situation, more so if you have to maintain a large code base worked on by many programmers.

Truth values

The boolean literals are true and false. The not function negates a boolean:

- not(true);
(* val it = false : bool *)

- not(false);
(* val it = true : bool *)

Strings and Characters

Strings literals are represented by sequences of characters enclosed in double-quotes. Certain special characters like the tab and the newline are represented by sequence of characters, with the backslash (\) serving as the escape character.

Strings can be concatenated using the ^ operator:

- "hello " ^ "world";
val it = "hello world" : string

Character values in SML are encoded by prefixing # to a string of length one. Thus #"x" represents the character x.

The String and Char modules contain many useful functions for working with textual data.

- String.size("abc");
(* val it = 3 : int *)

- String.sub("abc", 1);
(* val it = #"b" : char *)

- Char.toUpper(String.sub("abc", 0));
(* val it = #"A" : char *)

Tuples

You can combine the basic types to build more complex ones. You might have seen various incarnations of this idea - structs in C, classes in Java and so on.

SML has a number of ways to define new data types. However, the simplest and one of the most important means is the tuple construct.

A tuple is formed by a comma-separated list of two or more values of any types, surrounded by parenthesis. The following is a tuple of an integer, a real and a string:

- (12, 45.77, "hi");
val it = (12,45.77,"hi") : int * real * string

The response of SML indicates the type of the tuple - int * real * string. The operator * in a type specification indicates a tuple formation and not multiplication.

Note : A tuple is based on the mathematical idea of a Cartesian product. Hence a tuple is also known as a product type. The Cartesian product of two sets S and T (expressed as S * T), is the set of all ordered pairs {(s, t) ...} where sS and tT. For a tuple, you can think of S, T etc as types and s, t etc as values of those types.

The ith component of a tuple can be accessed by applying the #i function:

- #1(12, 45.77, "hi");
(* val it = 12 : int *)

- #2(12, 45.77, "hi");
(* val it = 45.77 : real *)

- #3(12, 45.77, "hi");
(* val it = "hi" : string *)

Remember that you always need two or more elements to form a tuple. There are no single element tuples. The following is just an integer in parenthesis and not a tuple:

- (10);
(* val it = 10 : int *)

Unit

Unit is another basic type in SML. In a sense this is similar to void in C. But unlike void, unit has a value with a literal representation. Unit has exactly one value which is represented as ().

Unit is used as the return value of functions called only for side-effects. For instance, the built-in print function writes its string argument to the standard output (a side-effect) and return unit:

- print("hello, world\n");
(* hello, world *)
(* val it = () : unit *)

Unit is also used as the argument of side-effecting functions that require no input.

Comparison and Logical Operations

SML has six comparison operators - = (== in C), <, <=, >, >=, <> (!= in C).

- 1 < 2;
(* val it = true : bool *)

- "abc" = "abc";
(* val it = true : bool *)

- #"A" > #"Z";
(* val it = false : bool *)

- 3.14 > 3.00;
(* val it = true : bool *)

As the preceding example shows, these operations can be used to compare integers, reals, characters or strings.

There is one exception though - reals may not be compared using = or <>. This design decision was motivated by the fact that all machines perform real arithmetic only approximately. Thus in some circumstances, two real-valued expressions that are theoretically equal could turn out, because of rounding error, to be unequal in the machine.

To reliably test the sameness of two real numbers, use the Real.== and Real.!= functions.

- Real.==(3.14, 3.14);
(* val it = true : bool *)

- Real.==(3.14, 3.141);
(* val it = false : bool *)

- Real.!=(3.14, 3.141);
(* val it = true : bool *)

The logical operations are andalso and orelse (&& and || in C, respectively). These two operations are of lower precedence than the comparison or arithmetic operators.

- 1 < 2 orelse 3 > 4;
(* val it = true : bool *)

- 1 < 2 andalso 3 > 4;
(* val it = false : bool *)

- not(1 < 2 andalso 3 > 4);
(* val it = true : bool *)

Conditional Expressions

The conditional expression takes the form if E then F else G. If the expression E evaluates to true, F is evaluated. If the value of E is false, then G is evaluated.

- if 1 < 2 then "hi" else "bye";
(* val it = "hi" : string *)

- if 1 > 2 then "hi" else "bye";
(* val it = "bye" : string *)

The SML compiler will require that both F and G have the same type. Otherwise a compile-time error will be raised. This is because the value returned by F or G will become the value of the whole if expression and the type consistency rules of SML demands that an expression should have exactly one type.

As if is an expression, it can appear at any position that is valid for a sub-expression:

- 1 + (if 1 < 2 then 10 else 20);
(* val it = 11 : int *)

Variable and Function Declarations

A variable declaration assigns a name to a value. The ability to refer to values by name is the simplest means of abstraction any serious programming language should provide.

In SML, a variable declaration has the syntax val N = E. This will bind the identifier N to the value of the expression E.

- val age = 3;
(* val age = 3 : int *)

- val kid = "Joe";
(* val kid = "Joe" : string *)

- "Next year " ^ kid ^ " will be " ^ Int.toString(age + 1) ^ " years old";
(* val it = "Next year Joe will be 4 years old" : string *)

Though functions are also values in SML, function declarations have special syntax. The keyword fun introduces a function definition.

The function defined below computes the area of a circle:

- fun area r = 3.14159 * r * r;
(* val area = fn : real -> real *)

The variable r is the formal parameter of the function. It stands for the radius of the circle for which the area will be calculated.

The compiler has inferred the type of the function as fn : real -> real. This means area is a function that takes a real number as argument and produces (->) a real number as result.

This is how you will use the area function:

- area(10.2);
(* val it = 326.8510236 : real *)

Just as no parenthesis were required for the formal parameter, parenthesis around the argument is also optional:

- area 10.2;
(* val it = 326.8510236 : real *)

This is a sharp deviation from how functions are called in many popular languages. One reason for the optional parenthesis is that, all SML functions can have only a single parameter! Then how will we define functions with multiple parameters? With the help of tuples. This is demonstrated in the following program where we define a function to find the minimum of two integer values:

- fun min (x, y) =
    if x < y then x
    else y;

The type of this function as inferred by the compiler is:

val min = fn : int * int -> int

This means, the function min will accept a tuple of two ints as argument and produce an int result.

- min (10, 20);
(* val it = 10 : int *)

- min (100, 20);
(* val it = 20 : int *)

Note that the parenthesis are part of the tuple syntax and not required by the function call itself.

Identifiers

Identifiers are the sequence of characters that are normally used as variable names. As in most other languages, identifiers in SML are letters followed by any number of letters and digits. You may also use the _ character.

Identifiers beginning with apostrophe (‘) are type variables that refer to types and not ordinary values. We will talk more about type variables in a later post.

SML also allows you to create symbolic identifiers made from the following characters:

+ - / * < > = ! @ # $ % ^ & ~ ` \ | ? :

Many of these symbols are used by SML for built-in functions. For example, SML binds the + identifier to the addition function.

Here are a couple of valid declarations with symbolic identifiers:

- val $$$ = 300.00;
(* val $$$ = 300.0 : real *)

- val %% = 0.2;
(* val %% = 0.2 : real *)

- $$$ * %%;
(* val it = 60.0 : real *)

A symbolic identifier may not contain alphanumeric characters. This means the identifier %a@ is invalid, because it mixes a with the characters % and @.

Scripting and Compiling

The REPL is useful for experimenting with small snippets of code but unwieldy for dealing with large programs. Moreover, you may want to preserve the code you have written between REPL sessions.

The best way to solve both these problems is to type code in an editor and load the saved file into the REPL as required. My preferred environment for writing SML code is Emacs + the sml-mode.

You can call the use function from the REPL to load files that contain SML code. Such files are known as scripts and usually have the .sml extension.

Let us save the two function we defined earlier into a script and load it into the REPL.

Type the following declarations into your favorite code editor and save the file as fp2.sml.

fun area r = 3.14159 * r * r

fun min (x, y) =
    if x < y then x
    else y

Start a new SML session, load the script and call the functions from the REPL:

- use "fp2.sml";
(* val area = fn : real -> real
   val min = fn : int * int -> int
   val it = () : unit *)

- area 12.3444;
(* val it = 478.728714566 : real *)

- min (1, 2);
(* val it = 1 : int *)

Compiling to machine code

Some SML implementations like PolyML and MLton allows us to compile your scripts into native machine code.

Whenever I want to compile my SML program into an efficient executable binary, I invoke polyc - the PolyML compiler.

Let’s try this with a simple program that prints the message "hello, world".

(* file: hello.sml *)

fun main () =
    print "hello, world\n"

Note that to generate a standalone executable, PolyML expects the entry point function main to exist in the program.

The following command invokes the PolyML compiler to generate the executable named hello:

$ polyc hello.sml -o hello

Now you can run hello as a standalone command and view its output:

$ ./hello
hello, world

Exercise 2.1 One easy way to provide input to a program is via command-line arguments. Figure out how command-line arguments could be provided to an SML program. (You may start from here).

Exercise 2.2 The Unix echo command prints its command-line arguments on a single line. Implement the echo program in SML.

Conclusion

This post has covered the nitty-gritties of SML. Keep in mind that the objective of this series is to help you learn functional programming. SML is just a tool that will help us realize this objective. So future posts will concentrate more on functional programming concepts than SML.

SML and its standard library is thoroughly documented. Keep referring to the linked resources as and when you want to explore the language in depth.

All groundwork being done, let’s start on the real fun!

Stay tuned for part 3 of the series.


Note that name and e-mail are required for posting comments