Part 2 of the 'implementation of Hob' saga.
« Part 1 | Index | Part 3 »

Syntax

Syntax is hard to get right. It has to be regular, so that people can read code without first learning a large number of corner cases, and tools such as syntax-highlighting editors can be written in a straightforward way. But it also has to be pleasant to read and write — not too verbose, or too mechanical-looking.

My first impulse was to go for a Lisp-like syntax. S-expressions are about as regular as you can get. But, even after programming Common Lisp for years, they still look ugly. Too regular, and too heavy on the punctuation. So instead I'm taking my inspiration from Haskell, that wonder of succinctness and elegance. Function application is done by simply putting expressions next to each other, as in fac 4, with implicit currying - i.e. map f produces a function of one argument that maps f over a sequence. Operators have programmable precedence and associativity. Unlike Haskell, I'll allow unary operators.

The 'block structure' of a program is determined by its indentation. The exact rules are (for now): Whenever a 'block' of something is expected, the column in which the first token of the first item (typically expressions, but can be something else) in the block appears determines the indentation of the block. Any new lines will, if they are indented beyond that, be a continuation of the current item. If they are aligned with the block's base indentation, they start a new item in the block. And if they are indented less, they close the block. For example:

let x = 10
    y = 50
  print x
  print ("hello" ++ f x y)

The let is followed by a block of bindings, and then a block of statements. This means the first print expression has to start in a different column than the last definition, or the parser will try to parse it as another binding. But this shouldn't be too hard to get used to. Note that this is more free-form than Python's approach — blocks can start at any indentation they like, which allows things like...

let x = 5
let y = x + 10
print y

Where the blocks start on the same column as their parent block. This is nice for adding let bindings without indenting yourself out of your screen. But should probably be avoided for things like if expressions!

Since you don't always want to add a new line, the punctuation , and . is used to close, respectively, the current item and the current block.

let x = 10, y = 50. print (x + y)

This is convenient, but somewhat confusing. Consider for example the match form, which matches an expression against a number of patterns, and executes the body of the pattern that matches. These bodies can themselves be blocks. A one-line match might look something like this:

match list. [] -> 0, [x:_] -> x

(Where [] is list syntax, and -> separates the cases from their bodies.) Note that the 0 makes up a block. Should the comma close the pattern case, or just the inner block? In the above case, it is clearly intended to close the whole case. But if the block rules are applied, it only closes the case's body, and a ., would be required to get the above effect. That's awful, so I introduced the notion of 'expression blocks' behaving differently from general blocks. In an expression block, ; is used to separate expressions. That means you can do this:

match list. [] -> print "yee-haw"; 0, [x:_] -> x

When expression blocks appear at the end of another block, a , or . token causes them to end without consuming that token. However, in other cases, such as the first expression in a match form, a variant of expression blocks is used that does consume dots, so that the above example is valid. In an actual program, though, it looks better like this:

match list
  [] -> print "yee-haw"; 0
  [x:_] -> x

The punctuation is just a fall-back for one-liners, in general indentation will be used, and programs will be relatively readable.

The dependence on indentation magic does mean that the language's grammar is not easily expressed as a context-free grammar. My recursive-descent parser in Common Lisp makes use of a dynamic variable *block-indentation* to communicate the current block's starting indentation to the functions doing the grunt work, but is quite simple. No look-ahead or back-tracking is needed — though maybe I'll need that when I decide to use Haskell's list-comprehension syntax. Operator-precedence parsing is used to handle the arbitrary precedence of operators. The code can be seen in my gitweb.

Functions literals are written as fn followed by a block of pattern -> body cases. So a simple function looks like this:

let zero = fn 0 -> True
              n -> False
  zero 5

Usually, though, you'll use let for non-function variables, and def for functions. def forms may appear anywhere in a block of expressions, and define a function in the scope of the block. Unlike let bindings, these bindings see themselves and each other, and thus can be used for recursive or mutually recursive functions.

def fac
  0 -> 1
  n -> n * fac (n-1)
fac 5

Finally, strings use C conventions ("foo\n\"bar\""), characters are preceded by a single quote ('a), comments are started with a backslash or contained between {\ and \}, and a double colon is used to add type annotations.

Part 2 of the 'implementation of Hob' saga.
« Part 1 | Index | Part 3 »

© Marijn Haverbeke, created February 10, 2009, last modified on November 27, 2009