unendlich - Home About Archive

Introducing Algebraic Data Types

Posted on 20.07.2017

In statically-typed programming languages all values are logically partitioned into sets which denote their types. For instance, numbers such as 1, 2, 99, and 456 are of the Integer type. Other numbers such as 3.14, 67.88, and 0.123 are of the Real type. When a value belongs to a type it is also said that such value inhabits that type. When types are viewed as theorems, we can also say that the value is a witness or a proof of the type it inhabits.

Algebraic data types, or ADTs, are a way of declaring new types often found in functional programming languages. The following simple ADT declares a new type for boolean values (the code snippets in this article will be given in Haskell):

This declaration introduces a new type, denoted by Bool, which has only two values: True and False (this is how the Bool type is actually defined in Haskell). When reading ADT declarations it is important to remember that the newly introduced name to the left of the equals sign is in the namespace of types, whereas the newly introduced names to the right of the equals sign are in the namespace of values. One could similarly create a new type for the days of the week:

Both newly introduced types are also called enumerations because they are sets of simple, fixed values. They are also called sum types, because their values are either one or other of the alternatives separated by the pipe symbol (|), i. e., they are the sum of the alternatives. But is also possible to define product types, such as the following:

They are called product types because their values are the Cartesian products of other values, such as Integer × Integer in this case. Also notice that, although we have introduced the new name Point on both sides of the equals sign, those names do not refer to the same thing. The Point to the left of the equals sign is naming a new type, whereas the Point to the right of the equals sign is naming a value constructor (also known as data constructor). We could as well have defined Point this way:

Value constructors are not values themselves, but are function-like objects that, given some input values, build new values of the newly-declared type. Given the above declaration, this is how we can now create new values that inhabit Point:

origin = MkPoint 0 0
p1 = MkPoint 0 100
p2 = MkPoint 33 33

They are “function-like” because, although they are invoked like regular functions, they are implemented by the compiler itself and just act as containers, holding the values passed to them for future use. For instance, let’s see which type is reported for the value constructor above by the Haskell interpreter:

$ ghci
Prelude> data Point = MkPoint Integer Integer
Prelude> :type MkPoint
MkPoint :: Integer -> Integer -> Point

It is very convenient that, in a language with first-class functions, value constructors are also functions. It allows one to create a list of points from two lists of coordinates with the simple snippet:

We have seen sum and product types but, more generally, ADTs can be sum types of product types:

This is actually why they are called algebraic in the first place. With regard to the type declared above, all the following values inhabit ProjectivePoint:

Just like we have value constructors, which transform values into values of other types, we can also have type constructors which transforms types into other types. One example is the very popular Maybe ADT:

In this declaration Maybe by itself does not denote a type, but a type constructor. The variable a in this declaration stands for any other type, such as Integer, Char or other ADTs such as Bool or Point. Maybe Integer, on the other hand, denotes a type. Values of this type can be created by one of the data constructors Nothing or Just x, where x is some integer value. We can write thus a function that returns the square root of a real number if it is not negative:

Other such type constructor is Either, which is a sum type whose values can hold values of either two different types, but not both at the same time:

The declaration Either Integer Double denotes a type whose values can hold either integer or real numbers:

It is also possible to create types that reference themselves. This is how an ADT for a list of values could be declared:

This declaration means that values of type List a are either an empty list, or a value of type a followed by another value of type List a. Such ADTs are called recursive. Examples of values of this type are:

So far we have seen different ways of creating ADTs, but how do we use them? We can look into the values inside an ADT by using pattern matching, which means pairing patterns together with code to be executed in case the respective pattern matches. For instance we can write a function that converts booleans to zeros and ones:

It is not necessary to list all the patterns if we want to treat just some of them differently. The following function will test if a weekday is in the weekend or not:

For matching against data constructors which hold values, we name the values which will become local variables in the function scope:

This was a short introduction to algebraic data types which should give a good overview of what they can do, but there is much more to be said about them. Some resources for further reading: