Basic compilation
The next step would be compiling from abstract syntax structures to machine code. My total exposure to assembly to date has been one short university course on system programming and some debugging of hairy C bugs. But we're here to learn.
The language as it stands now does not have conditionals, or any data-types beyond integers and functions. No garbage collection either. This means only deplorably trivial programs can be written. But also that the compiler doesn't have to be too complicated yet. That's good — extending a working system later on, even if you have to overhaul it completely, is less intimidating than building a big complicated thing from scratch.
We'll probably need to use one or more intermediary formats for the code, since going straight from syntax to assembly is a bit too much of a jump. In this first version, I ended up using two (or maybe I should say one-and-a-half) intermediary formats, both linear sequences of instructions and already quite assembly-like.
Getting from the tree-shaped parse-tree to the first linear format looks like the biggest step, but was surprisingly easy (less than 100 lines of code). I'm creating one block of 'high level instructions' per function. High-level instructions are things like adding things, calling functions, cleaning up after a function. As operands, they use literals, addresses, and 'virtual registers'.
Virtual registers are not in such short demand as real machine registers. Every block can use an infinite amount of them, if it so desires. They are single-assignment, meaning they keep the same value throughout their entire lifetime. They are created by being specified as the output operand of a high-level instruction, and after that any number of instructions can use them as input operands.
The instruction stream is held in a dynamic variable, so that the
recursive function running over the syntax tree (and the things it
calls) can easily append to it. At any time, this function keeps an
environment associating visible variables with information about the
operand (virtual register, literal, or memory location) that it refers
to. These environments also store the arity of function values, which
is necessary to expand the one-argument-at-a-time function
applications from the parse tree to regular multiple-argument calls
(partial application isn't allowed yet in this version). Each function
value also has a 'call generator' function associated with it. For
regular functions, this just writes a call
instruction to the
instruction stream, but for built-in primitives (only '+
' and '*
'
for now) or in-line functions, this can generate other instructions.
Here's what let x = 5. x * x + 10
compiles to:
block main
enter nil
imulq vreg0 5 5
addq vreg1 vreg0 10
leave nil vreg1
Each instruction is followed by first its output operand and then its
inputs. (The imulq
and addq
use their ugly raw assembly names
because I'm currently only supporting 64-bit values, and thus there's
no need for more abstract add or multiply instructions yet.)
As you can see, the piece of code has been turned into a function.
Everything but the top-level code in a program will be a function
anyway, and there's no good reason to provide a special non-function
case in compilation. The nil
after enter
indicates that this
function does not take any argument. If it did, they are treated as
outputs of the enter
instruction. Leave always has nil
in first
operand position because it never outputs anything. The vreg1
in its
input position means that the result of this function can be found in
virtual register 1.
A slightly more involved piece of code, def succ x -> x + 1. succ
(succ 2) + 20
, compiles to this:
block main
enter nil
call vreg0 succ (2)
call vreg1 succ (vreg0)
addq vreg2 vreg1 20
leave nil vreg2
block succ
enter (vreg0)
addq vreg1 vreg0 1
leave nil vreg1
Here, the enter
in the succ
block does take an input (the argument
to the function) and assigns virtual register 0 to it.
The next step is to make sure all these values and virtual registers actually find their way through real hardware. See the next entry for that.