Representing closures

Most compilers that I've read about represent closures with pointers to data structures that hold the function pointer as well as the data that was closed over. To enable sharing of data vectors between closures, there might be more indirection, with the closure containing a function pointer as well as a pointer to the actual data vector.

The way Hob is (currently) representing closures, and in fact all function values, is by a pair of 64-bit values (passed around directly, as two values, not as a pointer to an allocated pair). The first value is the function pointer, and the second one is the closure context. For calls to known, non-closed functions, the compiler can see that there is no context, and avoid passing it—the context will be put in one of the callee-saves registers, so it can just not put anything there, and the function, since it doesn't expect a context value, won't use the value it gets in that register.

This does mean that all dynamic function values, those passed as arguments, for example, will take up 128-bits, two registers. There are two practical downsides to this—big data structures with lots of function pointers (vtables, for example) will be bigger, and functions manipulating a bunch of function pointers will fill up their registers sooner, requiring more memory access.

The advantages are interesting as well, though. For one thing, the compiler always knows where to find the function pointer and can call it directly, whereas with allocated closures there has to be some indirection because the pointer might need to be fetched from some structure. Furthermore, functions closing over only a single value don't need to allocate anything, they just store their context directly in their companion value. Context-sharing also becomes much easier. The compiler creates a single value vector, and makes it the context of all the functions that need it.

This last point is actually very important for the way I intend to implement modules. Since the language is militantly anti-state-sharing, modules that contain mutable state or immutable state that's initialized using module parameters will have this state duplicated for every place where they are imported. (There will be a way to say "give me the module X as imported into module Y", in order to share state when needed.) Thus, a module is pretty much a regular closure. Being able to simply allocate a single context vector for the module, and then being able to call functions on it without further complication (the compiler knows the static address of the functions, and just passes them the right context) makes this very straightforward and efficient.

I'm even debating whether "first class contexts" would be a good idea. Being able to define a set of functions with shared context, where the context gets a unique type and the functions can only be applied to contexts of this type, would be a nicely minimalistic way to support "state with behaviour". In fact, if I integrate this with the (planned) type-class system, it could go a long way towards object-oriented programming. Possibly such a system would be too gimmicky, but I'll do some experiments in this direction.