Part 4 of the 'implementation of Hob' saga.
« Part 3 | Index | Part 5 »

Getting scope right

As described in the syntax entry, the basic variable binding mechanism (besides function abstraction) is a straight-forward let form. In a statement like this...

let x = x + 1,
    y = 4
  x * y

The x + 1 and the 4 are evaluated in the outer scope, creating a new scope with new bindings for x and y, in which x * y is evaluated. As a catch, the block mechanism allows the let's body to be aligned with its definition, creating an effect that resembles the free-form introduction (and redefinition) of variables anywhere in a block:

let x = 4
print x
let x = x * 2
print x

This in fact resembles the way O'Caml works, and is more structured than, for example, Python. The let forms, though they don't have to add indentation, do open new blocks, and the bindings of the variables are immutable throughout their blocks.

But the fact that let definitions are evaluated in the 'old' environment means that they can not be used to define recursive functions. At least not without something like a Y-combinator. So we want another binding construct. Common Lisp's labels and O'Caml's let rec constructs are a bit clumsy — you have to explicitly decide whether your function is going to be recursive, and group mutually recursive functions into one chunk of definitions. Scheme's use of define forms at the top of a block, is already more convenient. But I am especially charmed by JavaScript's simplistic rule that all function definitions within a scope are bound before any of the code in the block executes. This allows function definitions to live near their use.

What I am doing is introduce a second binding construct, def, which can only define functions, no other types of values. def forms can appear anywhere in a block of expressions, with the caveat that there has to be at least one normal expression per block. During parsing, they are all moved to the 'top' of the block, or rather the whole block is wrapped in a construct that indicates it should be evaluated in an environment which has these functions bound (let rec style). For example, this tail-recursive reverse:

def reverse list ->
  def helper
    [] acc -> acc
    [x:xs] acc -> helper xs x:acc
  helper list []

reverse [1; 2; 3]

def is followed by the name of the function, and then by a set of pattern -> body cases, as in fn.

(Why not allow def to define non-function variables? The problem with that is that it would, in every sane approach I can think of, allow uninitialised variables to be accessed. The current rules make that strictly impossible, which is a very nice property for a language.)

One worrying aspect of this convention is that one of the golden rules of functional languages, 'everything is an expression', no longer holds. def forms are more like statements in C-like languages. Fortunately, they can only occur in one, well-defined place: an expression-block, which is itself an expression.

So what would a module look like in this language? Here's a possibility.

module RandomFunctionality (pi, e, frob)
let pi = 3.141592653589793
def frob x -> x - pi
let e = 2.7182817
.

A module body is a block of expressions. Note that the dot on the last line ends this block (though the EOF would also have ended it). I'm not sure yet whether it'll make sense to have multiple modules in a file, but it's nice to have the possibility. You can notice two problems with the let forms. Firstly, they nest — frob and e are defined inside the block that defines pi, and as such are not top-level, but have to be exported from the module anyway. Secondly, the last let does not have a body, which is currently a syntax error.

One solution that comes to mind is to have a 'magic' endmodule expression, that determines from which scope the variables exported by the module should be taken. Such an expression would not be allowed to appear inside a function body. Another option is to implicitly, invisibly insert such a thing at the end of the body of the innermost let expression at the bottom of the module body. The first is rather ugly, the second a bit too magical for my tastes — not to mention that it'd mess up the elegance of the parser.

There's a certain discrepancy between a module top-level being a set of definitions, and a regular block of expressions being a computation. Some languages, like Haskell or C, treat the two completely differently. I'm not sure I want to do that — since they also have some things in common, and regularity is nice. Besides, it is nice to be able to close over variables in top-level definitions (and possibly even being able to use regular flow control for conditional compilation). For now, lacking a perfect solution, a module is a regular block, culminating in one more more export expressions, which have type () (unit), and determine the symbols that are exported from the module.

module RandomFunctionality
let pi = 3.141592653589793
def frob x -> x - pi
let e = 2.7182817
export pi, frob, e

This nicely resolves the question of where the exported definitions are to be found. It does mean that the list of exported symbols isn't centralised at the top of the module anymore. This isn't always ideal, since you can't determine the whole interface by glancing at the top of a source file. But for such things one should use automatically generated documentation (more about that later) or development tools anyway. An advantage is that you don't require monstrous, messy lists of symbols for packages with a big interface, but you can just scatter export expressions throughout the package definition, in places where they make sense (below a data-type definition, right after a block of related functions, etcetera).

Part 4 of the 'implementation of Hob' saga.
« Part 3 | Index | Part 5 »

© Marijn Haverbeke, created February 18, 2009, last modified on March 25, 2009