Part 13 of the 'implementation of Hob' saga.
« Part 12 | Index | Part 14 »

A Convention For Controlled Side-Effects

(This is a theory-only entry. No code was written for it yet—in fact, I'm having trouble finding the time for any kind of serious Hob coding sessions lately.)

What I want is a system that allows us to separate pure and non-pure code in a disciplined way, without resorting to something as restrictive as monads, or something as burdensomly complicated as uniqueness typing.

I've been going over this in the back of my mind for about three years now (the first draft was done on a lonely, feverish train ride from Skopje to Thessaloniki in October 2007), and I've convinced myself that I have something workable. Of course, this has been the topic of much research, which I'm not on top of, and it might be a bit arrogant of me to believe that I've found a system that solves all the issues. I invite (as usual) any reader to point out flaws, and suggest relevant literature.

My system, in short, is this: function types (->) get specialised into two different types, -> and => (I'm still trying to come up with good English terms for them). The => type is reserved for functions that close over mutable data, for example this one (where ref, =, and ! are placeholder syntax for mutable references):

 let x = ref 1
   (fn i => x = !x + i)

=> functions themselves count as mutable data for this purpose—a function that closes over such a function gets the => type itself. Things that mutate 'external' state, such as a print function, also have the => type (conceptually, they close over 'the world', which is mutable).

All other function get the -> type. This means that -> does not denote purity—such functions may modify their arguments at will, and return mutable data. A pure function is one that has the -> type and whose argument and return types are all immutable.

By doing the analysis in terms of what closes over what, we allow functions to internally create mutable values, even define internal functions that mutate them, without requiring the whole function to be impure. You could have a pure sort function that takes an immutable array, and returns a sorted version, which internally implements a shamelessly imperative sort algorithm.

Imperative algorithms are great. They often are easier to write and perform better than they pure cousins. On top of that, they often only do their mutations in a limited context, and can, given a system like what I'm describing here, be encapsulated in a pure function.

What is awkward with this system, is that if you have, for example a map function of type fn (fn x -> y) [x] -> [y], you can now no longer map your => functions over arrays. Any hard-core functional programmer will say "of course not! duh!". But, coming from pleasantly undisciplined languages like Lisp and JavaScript, I'm actually not satisfied with that answer. There are many cases where you'd like to do what map does, but also, on the side, perform some side effect from your mapping function.

Now, if we specify that a -> type in a function type means that the function takes 'any kind of function', and treat the type of function passed as a kind of type variable, we can use them in the same way that other polymorphic functions are used. If you give map a -> function, it is pure. If you give it a =>, it isn't. This is analogous to the fact that if you give id an immutable argument, it is pure, whereas if you give it something mutable, it isn't.

Doesn't this make everything a confusing jumble again? Let us say that the benefits of being able to distinguish pure functions are twofold: They allow us to make certain assumptions about our code, and they allow the compiler to do some transformations that wouldn't be safe with impure functions. Both of these are useful at the call site—meaning that it is not all that interesting whether a given function is pure, but rather, we want to know whether a given call is pure. Thus, to depend on the actual resolved types to determine purity seems acceptable.

Maybe it would be better to have yet another specifyer (+> maybe?) to explicitly denote polymorphic function arguments. You could then also specify functions that require their argument to be ->. I'll have to see in practice whether this would be worth the additional mental overhead and symbolic ugliness.

Another possible direction is to make this whole system implicit. The compiler can determine at compile time into which category each function falls, so we don't have to burden the programmer with choosing the right arrow type. This has the advantage that you can drop in, for example, a log statement somewhere without worrying about it screwing up your types—the compiler will adjust. However, I expect this will make it hard for people to keep track of which functions are pure. If you're calling something from another module, how are you to know whether it will make your function impure? The logging issue can be worked around with some hack like Haskell's unsafePerformIO.

(I must say I'm pleasantly surprised by how easy this was to write down. I half expected to run into an endless list of complications when trying to actually express these ideas. So again, if you spot any complications, write me. My email address is in the page footer.)

Part 13 of the 'implementation of Hob' saga.
« Part 12 | Index | Part 14 »

© Marijn Haverbeke, created November 6, 2010