Register allocation
Register allocation is the process of finding a suitable processor register (or stack location) for every value in a program. If you're completely unfamiliar with machine-level programming (and want to do something about that), I very much recommend this book.
(In this series — at least for now — I'll only be compiling to x86-64 assembly. That's a quite common processor type these days, and a little less painful to program than its 32-bit father.)
One can put as much effort as one wants into register allocation. Finding a very bad solution (just put everything on the stack right away) is very easy and results in rather ugly, big, and slow assembly code. On the other hand, finding a completely optimal solution is an NP-hard problem, and thus not something one solves before breakfast on a Sunday morning. In fact, the large gap between the scoping entry and the next one is there because I was stumped coming up with an elegant register allocation algorithm. (When faced with an intimidating problem in a hobby project, I tend to avoid the whole project.)
Closely related to register allocation are calling conventions. How do
arguments get passed to functions, and what is the effect of a
function call on the registers? For example, on x86 processors, C
functions typically take their arguments on the stack, clobber the
registers, and put their result in the %eax
register. This means
that the register allocator has to save any values that it will still
need after the call on the stack.
The x86-64 architecture, having much more registers than plain x86, opens up a whole world of possible calling conventions. Because of the functional nature of Hob, I set out to make calls as cheap as possible. This means we'd like to avoid putting arguments on the stack, but also don't want to lose the contents of all our registers on every call.
(For those completely new to the x86-64 architecture, see this summary of the differences from x86.)
What I came up with (and, as mentioned before, I'm too new to this to
really know what I'm doing) is this: The first eight arguments to
functions are passed in the r8
to r15
registers, in reverse order,
but always using the lower numbers (for example, three arguments are
passed in r10
, r9
, r8
). The reverse order is there because I
want to specify function arguments to be evaluated left-to-right in
this language. Functions return their return value in r8
, and since
the last argument is likely to be the result of a function call,
that'll at least be in the right place already. These registers (even
the ones not used for arguments) can be clobbered by the callee, and
the caller must assume they are lost after a call. The other
general-purpose registers are supposed to be saved on the stack and
restored by every function that uses them, so that callers can assume
them to still hold their value after a call.
Having callees save registers seems like a good strategy. A function that calls a lot of other functions will profit because it doesn't have to save registers multiple times, only once. A function that doesn't do much calling will not be penalised much because it's probably a small function that doesn't need that many registers.
The allocation algorithm I ended up using is absolutely not elegant, in fact it's kind of a mess, but it seems to do a good job and is not messy to the point where it really confuses me. I'll probably have to significantly revise it once I introduce garbage collection, but for what it's worth, it produces compilable and working programs, and I'm proud of that. Because the language is still missing a lot of constructs the test programs aren't very deep yet, but here's a cute one:
def true a b -> a
def false a b -> b
def if c then else -> c then else
def check x -> true
if (check 10) (if false 1 2) 3
It uses Church booleans
to evaluate a simple conditional. It compiles, and properly returns 2
.
The code generator, of which the register allocator is a part (see
codegen.lisp
), takes a block of instructions (see the
previous entry) as input, and returns a list
of x86-64 instructions that can be written to a assembly file (I'm
using as
for my assembler for now).
To do this, it makes two passes over the block. In the first pass, it
records the lifetime (start and end instruction) of every virtual
register in the block, and the location of function calls. It also
looks for opportunities to 'chain' virtual registers. That is, if an
instruction producing virtual register B
takes as input the last
occurrence of virtual register A
, it records this information
because it will probably be practical to re-use whatever real register
A
occupies to store B
. There are two reasons this chaining is done
in this extra pass, rather than during the real code generation.
Firstly, registers are often involved in more than one chain, and it
pays to pick the longest one. Secondly, in picking the register to use
for a chain, it can be useful to know that the last virtual register
of the chain has to end up in some specific location, because it is
the argument to a function or is returned from the block.
The actual allocation of registers is done using two different strategies. Virtual registers that do not live beyond the next function call are, as much as possible, put into argument registers (which the callee clobbers anyway), using chaining to try and minimise the amount of useless move instructions. When no argument registers are available, or when a virtual register has to live beyond the next function call, the normal algorithm, which uses the callee-saved registers and the stack, comes into effect. It always prefers registers, since those are faster and result in smaller code. But when no registers are available, stack space is allocated. (Basically the 'linear scan' algorithm.) The stack pointer is not adjusted every time stack space is required, but the minimum amount of space needed is recorded, and when greater than zero, the code generator adds instructions to adjust the stack pointer before and after the block.
The compiler calls the code generator for every block it produces, and
puts these blocks into a file, together with a simple entry _start
point. All code at the top-level is put into a function called main
,
and _start
simply calls this function and then makes an 'exit'
system call using the result. This file is then run through as
and
ld
to create an executable. All rather primitive, but it works. When
it doesn't, I run gdb
on the executable and try to figure out what
the problem is.
Many functions compile to pleasantly simple assembly code. For
example, the identity function (fn x -> x
) becomes:
id_1: ret
... since both the first argument and the return value are in register
r8
. And this (pointless) function ...
def call_quadruple f x ->
let x2 = x + x
f (x2 + x2)
... becomes ...
call_quadruple_1:
addl %r8d, %r8d
addl %r8d, %r8d
jmp *%r9
Note that it uses jmp
, rather than call
, because this is a tail
call.