Syntax
Syntax is hard to get right. It has to be regular, so that people can read code without first learning a large number of corner cases, and tools such as syntax-highlighting editors can be written in a straightforward way. But it also has to be pleasant to read and write — not too verbose, or too mechanical-looking.
My first impulse was to go for a Lisp-like syntax. S-expressions are
about as regular as you can get. But, even after programming Common
Lisp for years, they still look ugly. Too regular, and too heavy on
the punctuation. So instead I'm taking my inspiration from Haskell,
that wonder of succinctness and elegance. Function application is done
by simply putting expressions next to each other, as in fac 4
, with
implicit currying - i.e. map f
produces a function of one argument
that maps f
over a sequence. Operators have programmable precedence
and associativity. Unlike Haskell, I'll allow unary operators.
The 'block structure' of a program is determined by its indentation. The exact rules are (for now): Whenever a 'block' of something is expected, the column in which the first token of the first item (typically expressions, but can be something else) in the block appears determines the indentation of the block. Any new lines will, if they are indented beyond that, be a continuation of the current item. If they are aligned with the block's base indentation, they start a new item in the block. And if they are indented less, they close the block. For example:
let x = 10
y = 50
print x
print ("hello" ++ f x y)
The let
is followed by a block of bindings, and then a block of
statements. This means the first print
expression has to start in a
different column than the last definition, or the parser will try to
parse it as another binding. But this shouldn't be too hard to get
used to. Note that this is more free-form than Python's approach —
blocks can start at any indentation they like, which allows things
like...
let x = 5
let y = x + 10
print y
Where the blocks start on the same column as their parent block. This
is nice for adding let
bindings without indenting yourself out of your
screen. But should probably be avoided for things like if
expressions!
Since you don't always want to add a new line, the punctuation ,
and .
is used to close, respectively, the current item and the current
block.
let x = 10, y = 50. print (x + y)
This is convenient, but somewhat confusing. Consider for example the
match
form, which matches an expression against a number of patterns,
and executes the body of the pattern that matches. These bodies can
themselves be blocks. A one-line match
might look something like this:
match list. [] -> 0, [x:_] -> x
(Where []
is list syntax, and ->
separates the cases from their
bodies.) Note that the 0
makes up a block. Should the comma close the
pattern case, or just the inner block? In the above case, it is
clearly intended to close the whole case. But if the block rules are
applied, it only closes the case's body, and a .,
would be required to
get the above effect. That's awful, so I introduced the notion of
'expression blocks' behaving differently from general blocks. In an
expression block, ;
is used to separate expressions. That means you
can do this:
match list. [] -> print "yee-haw"; 0, [x:_] -> x
When expression blocks appear at the end of another block, a ,
or .
token causes them to end without consuming that token. However, in
other cases, such as the first expression in a match
form, a variant
of expression blocks is used that does consume dots, so that the
above example is valid. In an actual program, though, it looks better
like this:
match list
[] -> print "yee-haw"; 0
[x:_] -> x
The punctuation is just a fall-back for one-liners, in general indentation will be used, and programs will be relatively readable.
The dependence on indentation magic does mean that the language's
grammar is not easily expressed as a
context-free grammar. My
recursive-descent
parser in Common Lisp makes use of a dynamic variable
*block-indentation*
to communicate the current block's starting
indentation to the functions doing the grunt work, but is quite
simple. No look-ahead or back-tracking is needed — though maybe I'll
need that when I decide to use Haskell's list-comprehension syntax.
Operator-precedence
parsing is used to handle the arbitrary precedence of operators. The
code can be seen in my
gitweb.
Functions literals are written as fn
followed by a block of pattern
-> body
cases. So a simple function looks like this:
let zero = fn 0 -> True
n -> False
zero 5
Usually, though, you'll use let
for non-function variables, and
def
for functions. def
forms may appear anywhere in a block of
expressions, and define a function in the scope of the block. Unlike
let
bindings, these bindings see themselves and each other, and thus
can be used for recursive or mutually recursive functions.
def fac
0 -> 1
n -> n * fac (n-1)
fac 5
Finally, strings use C conventions ("foo\n\"bar\""
), characters are
preceded by a single quote ('a
), comments are started with a
backslash or contained between {\
and \}
, and a double colon
is used to add type annotations.