One of the things that really stuck me when I first learned Haskell was the concept of ML-style algebraic data types. I hadn't seriously played with any similar languages before, and the directness of the mapping between my intentions and the code that I was writing was striking. No superfluous inheritance hierarchies, methods, accessors, or (Jove preserve us) hacks with C unions and structs. Just show what the data looks like, and match values against patterns to make use of it.
If you haven't had the privilege of playing with these, here is a definition of a list-of-integer type in Haskell, with a length function:
data List = Nil | Cons Int List
len Nil = 0
len (Cons _ tail) = 1 + len tail
two = len (Cons 1 (Cons 2 Nil))
The data
line says that values of type List
have either the value Nil
, or a value Cons
holding an integer and another List
value. The integer
part could be made polymorphic, but since we're going to talk about
Common Lisp, the static typing aspects don't interest us.
The len
function on this type is then defined by handling
exactly these two cases. The matching against a certain constructor
(as the 'options' of an algebraic data type are usually called) also
implicitly does the binding of the variables (tail
) that
we need.
Finally, the bottom line shows that values of this type are created
simply by invoking the constructor like a function. Nil
doesn't take arguments, and Cons
takes two of them. For a
lot of simple data types, creating and unpacking are the main
use-cases, and it is great to be able to use such a tiny,
straightforward data
declaration instead of writing out a
string of boilerplate classes, accessors, or initargs.
If we consider Common Lisp, algebraic types won't give us the invulnerable sense of compiler-checked correctness that we get in a statically typed language. Still, their succinctness and clarity could really brighten up some programs — types for ASTs are a typical example. In any case, I constantly find myself missing them when writing CL code, so let's see what they might look like.
The first question is how to represent instances of our algebraic data types. There are three sensible choices: classes, structs, or vectors. Vectors would be the most conservative and efficient choice, but have the major drawback that you can not associate type information with them, so that it becomes impossible to specialise methods on these types. 'Playing nice' with the type system would be a plus, so let's try without vectors.
CLOS classes are awesomely flexible, whereas structs are kind of limited and stupid. But in many cases you'll want to use these types for tiny little objects, the kind that you will be creating by the millions. Because of their simplicity, struct instances are significantly faster to create and access on every implementation I tested. They have one really big drawback though — redefining them causes undefined behaviour. This usually means you have to throw away your existing instances and recompile the code that depends on the type whenever you redefine a type. I decided this was acceptable, but you could easily modify the code to use classes instead.
What kind of interface should an implementation of algebraic data types provide? Two constructs seem to go a long way: defining new types, and matching against types. For the first, one can do...
(variant _list _nil (_cons car cdr))
(With variant
being pleasantly shorter than
algebraic-data-type
and more readable than
adt
). This creates three struct types,
_list
, _nil
, and _cons
(underscores are there to prevent name clashes with standard symbols),
with _list
being an empty struct that just serves as a
supertype for the other two and, more importantly, as a way to
indicate what this data type is intended for.
Matching happens with what is basically a case statement:
(defun len (lst)
(vcase lst
(_nil 0)
((_cons cdr) (1+ (len cdr)))))
(len (_cons 1 (_cons 2 (_nil))))
A vcase
form works much like a typecase
form (and in fact expands to one), except that it allows extracting of
member fields. When the first element in a case is a symbol, that
clause is treated exactly like a typecase
clause, but
when it is a list, the first element of this list is a struct type,
and further elements are slot bindings. Slots can be either a single
symbol, in which case the bound variable has the same name as the
slot, or a list of two symbols, the first being the variable name, the
second the slot name. You don't have to extract slots you don't
use.
(If variant values were classes, this could simply expand to
slot-value
forms, but the standard doesn't require
slot-value
to work on structs, so my implementation
expands to regular struct accessors, taken from the package that
defines the struct type, so that you don't have to export the
accessors. There is a problematic corner case with structs that have a
name that you imported from another package.)
vcase
does not do nested patterns. For serious pattern matching,
struct-based variant values are trivial to use with
cl-match.
An implementation of the above, containing a few small extensions that
turned out to be useful in practice (evcase
for ensuring at least
one case matches, vbind
for when you already know the type you are
matching against) can be found here.