Part 5 of the 'implementation of Hob' saga.
« Part 4 | Index | Part 6 »

Basic compilation

The next step would be compiling from abstract syntax structures to machine code. My total exposure to assembly to date has been one short university course on system programming and some debugging of hairy C bugs. But we're here to learn.

The language as it stands now does not have conditionals, or any data-types beyond integers and functions. No garbage collection either. This means only deplorably trivial programs can be written. But also that the compiler doesn't have to be too complicated yet. That's good — extending a working system later on, even if you have to overhaul it completely, is less intimidating than building a big complicated thing from scratch.

We'll probably need to use one or more intermediary formats for the code, since going straight from syntax to assembly is a bit too much of a jump. In this first version, I ended up using two (or maybe I should say one-and-a-half) intermediary formats, both linear sequences of instructions and already quite assembly-like.

Getting from the tree-shaped parse-tree to the first linear format looks like the biggest step, but was surprisingly easy (less than 100 lines of code). I'm creating one block of 'high level instructions' per function. High-level instructions are things like adding things, calling functions, cleaning up after a function. As operands, they use literals, addresses, and 'virtual registers'.

Virtual registers are not in such short demand as real machine registers. Every block can use an infinite amount of them, if it so desires. They are single-assignment, meaning they keep the same value throughout their entire lifetime. They are created by being specified as the output operand of a high-level instruction, and after that any number of instructions can use them as input operands.

The instruction stream is held in a dynamic variable, so that the recursive function running over the syntax tree (and the things it calls) can easily append to it. At any time, this function keeps an environment associating visible variables with information about the operand (virtual register, literal, or memory location) that it refers to. These environments also store the arity of function values, which is necessary to expand the one-argument-at-a-time function applications from the parse tree to regular multiple-argument calls (partial application isn't allowed yet in this version). Each function value also has a 'call generator' function associated with it. For regular functions, this just writes a call instruction to the instruction stream, but for built-in primitives (only '+' and '*' for now) or in-line functions, this can generate other instructions.

Here's what let x = 5. x * x + 10 compiles to:

block main
  enter nil
  imulq vreg0 5 5
  addq  vreg1 vreg0 10
  leave nil vreg1

Each instruction is followed by first its output operand and then its inputs. (The imulq and addq use their ugly raw assembly names because I'm currently only supporting 64-bit values, and thus there's no need for more abstract add or multiply instructions yet.)

As you can see, the piece of code has been turned into a function. Everything but the top-level code in a program will be a function anyway, and there's no good reason to provide a special non-function case in compilation. The nil after enter indicates that this function does not take any argument. If it did, they are treated as outputs of the enter instruction. Leave always has nil in first operand position because it never outputs anything. The vreg1 in its input position means that the result of this function can be found in virtual register 1.

A slightly more involved piece of code, def succ x -> x + 1. succ (succ 2) + 20, compiles to this:

block main
  enter nil
  call  vreg0 succ (2)
  call  vreg1 succ (vreg0)
  addq  vreg2 vreg1 20
  leave nil vreg2
block succ
  enter (vreg0)
  addq  vreg1 vreg0 1
  leave nil vreg1

Here, the enter in the succ block does take an input (the argument to the function) and assigns virtual register 0 to it.

The next step is to make sure all these values and virtual registers actually find their way through real hardware. See the next entry for that.

Part 5 of the 'implementation of Hob' saga.
« Part 4 | Index | Part 6 »

© Marijn Haverbeke, created March 25, 2009, last modified on November 9, 2010