Introducing Algebraic Data Types

Posted on 20.07.2017

In statically-typed programming languages all values are logically partitioned into sets which denote their types. For instance, numbers such as 1, 2, 99, and 456 are of the Integer type. Other numbers such as 3.14, 67.88, and 0.123 are of the Real type. When a value belongs to a type it is also said that such value inhabits that type. When types are viewed as theorems, we can also say that the value is a witness or a proof of the type it inhabits.

Algebraic data types, or ADTs, are a way of declaring new types often found in functional programming languages. The following simple ADT declares a new type for boolean values (the code snippets in this article will be given in Haskell):

data Bool = True | False

This declaration introduces a new type, denoted by Bool, which has only two values: True and False (this is how the Bool type is actually defined in Haskell). When reading ADT declarations it is important to remember that the newly introduced name to the left of the equals sign is in the namespace of types, whereas the newly introduced names to the right of the equals sign are in the namespace of values. One could similarly create a new type for the days of the week:

data Weekdays = Sunday
              | Monday
              | Tuesday
              | Wednesday
              | Thursday
              | Friday
              | Saturday

Both newly introduced types are also called enumerations because they are sets of simple, fixed values. They are also called sum types, because their values are either one or other of the alternatives separated by the pipe symbol (|), i. e., they are the sum of the alternatives. But is also possible to define product types, such as the following:

data Point = Point Integer Integer

They are called product types because their values are the Cartesian products of other values, such as Integer × Integer in this case. Also notice that, although we have introduced the new name Point on both sides of the equals sign, those names do not refer to the same thing. The Point to the left of the equals sign is naming a new type, whereas the Point to the right of the equals sign is naming a value constructor (also known as data constructor). We could as well have defined Point this way:

data Point = MkPoint Integer Integer

Value constructors are not values themselves, but are function-like objects that, given some input values, build new values of the newly-declared type. Given the above declaration, this is how we can now create new values that inhabit Point:

origin = MkPoint 0 0
p1 = MkPoint 0 100
p2 = MkPoint 33 33

They are “function-like” because, although they are invoked like regular functions, they are implemented by the compiler itself and just act as containers, holding the values passed to them for future use. For instance, let’s see which type is reported for the value constructor above by the Haskell interpreter:

$ ghci
Prelude> data Point = MkPoint Integer Integer
Prelude> :type MkPoint
MkPoint :: Integer -> Integer -> Point

It is very convenient that, in a language with first-class functions, value constructors are also functions. It allows one to create a list of points from two lists of coordinates with the simple snippet:

-- zipWith takes a function and two lists, and apply the function
-- to the corresponding elements of the lists yielding a new list
-- as a result
points = zipWith MkPoint xCoords yCoords

-- this is the type of applying zipWith to MkPoint as reported
-- by Haskell, brackets denote a list
zipWith MkPoint :: [Integer] -> [Integer] -> [Point]

We have seen sum and product types but, more generally, ADTs can be sum types of product types:

data ProjectivePoint = PointInInfinity
                     | RealPoint Integer Integer

This is actually why they are called algebraic in the first place. With regard to the type declared above, all the following values inhabit ProjectivePoint:

infinity = PointInInfinity
origin = RealPoint 0 0
pp1 = RealPoint 100 (-100)

Just like we have value constructors, which transform values into values of other types, we can also have type constructors which transforms types into other types. One example is the very popular Maybe ADT:

data Maybe a = Nothing | Just a

In this declaration Maybe by itself does not denote a type, but a type constructor. The variable a in this declaration stands for any other type, such as Integer, Char or other ADTs such as Bool or Point. Maybe Integer, on the other hand, denotes a type. Values of this type can be created by one of the data constructors Nothing or Just x, where x is some integer value. We can write thus a function that returns the square root of a real number if it is not negative:

-- function type
maybeSqrt :: Double -> Maybe Double

-- function implementation
maybeSqrt x = if x < 0 then Nothing
                       else Just (sqrt x)

Other such type constructor is Either, which is a sum type whose values can hold values of either two different types, but not both at the same time:

data Either a b = Left a | Right b

The declaration Either Integer Double denotes a type whose values can hold either integer or real numbers:

someInteger :: Either Integer Double
someInteger = Left 42

someDouble :: Either Integer Double
someDouble = Right 3.1415

It is also possible to create types that reference themselves. This is how an ADT for a list of values could be declared:

data List a = Empty
            | Element a (List a)

This declaration means that values of type List a are either an empty list, or a value of type a followed by another value of type List a. Such ADTs are called recursive. Examples of values of this type are:

noInts :: List Integer
noInts = Empty

aDouble :: List Double
aDouble = Element 1.0 Empty

aMaybe :: List (Maybe Integer)
aMaybe = Element (Just 9) Empty

aCoupleMaybes :: List (Maybe Integer)
aCoupleMaybes = Element Nothing aMaybe

So far we have seen different ways of creating ADTs, but how do we use them? We can look into the values inside an ADT by using pattern matching, which means pairing patterns together with code to be executed in case the respective pattern matches. For instance we can write a function that converts booleans to zeros and ones:

binary :: Bool -> Integer
binary False = 0
binary True = 1

It is not necessary to list all the patterns if we want to treat just some of them differently. The following function will test if a weekday is in the weekend or not:

isWeekend :: Weekday -> Bool
isWeekend Sunday = True
isWeekend Saturday = True
isWeekend _ = False

For matching against data constructors which hold values, we name the values which will become local variables in the function scope:

hasBiggerThan10 :: Maybe Integer -> Bool
hasBiggerThan10 Nothing = False
hasBiggerThan10 (Just x) = x > 10

squaredDistance :: Point -> Point -> Integer
squaredDistance (MkPoint x1 y1) (MkPoint x2 y2) =
    (x2 - x1) ^ 2 + (y2 - y1) ^ 2

getXCoord :: ProjectivePoint -> Maybe Integer
getXCoord PointInInfinity = Nothing
getXCoord (RealPoint x _) = Just x -- no need to name an unused value

-- pattern matching can unwrap data constructors at arbitrary levels
getSecondIntegerOrZero :: List (Maybe Integer) -> Integer
getSecondIntegerOrZero (Element _ (Element (Just x) _)) = x
getSecondIntegerOrZero _ = 0

This was a short introduction to algebraic data types which should give a good overview of what they can do, but there is much more to be said about them. Some resources for further reading: