Marijn Haverbeke's blog (license)

Acorn: yet another JavaScript parser

Tuesday, October 2, 2012 javascript parsing performance

Acorn is a JavaScript parser written in JavaScript.

Another one.

Just like:

Acorn produces a well-documented, widely used AST format. The same as the last two parsers in that list.

Acorn is really fast. Just like the last one in the list: Esprima.

Acorn is tiny. About half as big as Esprima, in lines of code.

Still, there's no good reason for Acorn to exist. Esprima is an excellent project, well-documented, and small enough for any practical use. It exposes an interface very similar to Acorn.

The only reason I wrote Acorn is that small, well-defined systems are so much fun to work with, and that Esprima's web page very triumphantly declared it was faster than parse_js, the implementation in UglifyJS version 1, which is a port of my own parse-js Common Lisp library.

I just had to see if I could do better.

Turns out I can. Acorn beats Esprima, at least on Chrome, Firefox, and Opera (I didn't test other browsers) by a narrow margin—in the 5-20% range—when not storing source location data, and by a wide one—about five times faster—when storing source location data. See here for a reference. That second number is mostly due to the very unoptimized way in which Esprima manages the flow of this data—it could probably easily do better.

To even get to the point where my parser had this small speed advantage over Esprima, I had to steal some of its tricks. Most notably, in order to test whether a string is part of a set (of keyword, reserved words, etcetera), Esprima uses hand-rolled predicates that use switch statements over the string values, with an outer switch over the length of the string. Something like this:

function isKeyword(word) {
  switch (word.length) {
    case 2:
      switch (word):
        case "if": case "in": case "do": return true;
      }
      return false;
    case 3:
      switch (word):
        case "var": case "for": case "new": case "try": return true;
      }
      return false;
    case 4:
      /* et cetera */
  }
}

I initially expected regular expressions to be faster (using the test method), but it turns out that they only are faster on Chrome, and there only by a tiny margin.

But I wasn't about to write out all these boring predicates myself, so I defined a function that, given a list of words, builds up the text for such a predicate automatically, and then evals it to produce a function.

Another big size (and probably speed) saving comes from the fact that Acorn uses an operator precedence parser, whereas Esprima writes out all the intermediate forms of binary operators in a long list of functions named parseMultiplicativeExpression, parseAdditiveExpression, parseShiftExpression, and so on for all of the ten precedence levels. Each of these has to be passed through for each expression parsed.

I've set up a project page for Acorn, which is simply the output of docco run on the source code, and a github page for browsing the code and reporting bugs.