parse-js

parse-js is a Common Lisp package for parsing JavaScript — ECMAScript 3, to be more precise. It is released under a zlib-style licence. For any feedback, contact me: Marijn Haverbeke.

The library can be downloaded, checked out from the git repository, or installed with asdf-install.

News

07-02-2013: New release. More corner-case bugs fixed. Representation of for-in nodes changed to accomodate things like for (x.y in z). Array element expressions may now be nil, when parsing a literal like [1,,3].

03-01-2011: New release. Lots of conformance fixes, driven by CL-JavaScript and UglifyJS work. parse-js-string is deprecated now (parse-js accepts strings), and basic support for ECMAScript 5 has been added.

11-06-2010: Move from darcs to git for version control, update release tarball.

Reference

function parse-js (input &key ecma-version strict-semicolons reserved-words)
→ syntax-tree

Reads a program from a string or a stream, and produces an abstract syntax tree, which is a nested structure consisting of lists starting with keywords. The exact format of this structure is not very well documented, but the file as.txt gives a basic description.

The keyword arguments can be used to influence the parsing mode. emca-version can be 3 or 5, and influences the standard that is followed. The default is 3. Support for version 5 is incomplete at this time. When strict-semicolons is true, the parser will complain about missing semicolons, even when they would have been inserted by 'automatic semicolon insertion' rules. Finally, if reserved-words is true, the parser will complain about 'future reserved words', such as class being used.

class js-parse-error

The type of errors raised when invalid input is encountered. Inherits from simple-error, and has js-parse-error-line and js-parse-error-char accessors that can be used to read the location at which the error occurred.

function lex-js (stream)
→ function

A JavaScript tokeniser. The function returned can be called repeatedly to read the next token object. See below for a description of these objects. When the end of the stream is reached, tokens with type :eof are returned.

function token-type (token)
→ keyword

Reader for the type of token objects. Types are keywords (one of :num :punc :string :operator :name :atom :keyword :eof).

function token-value (token)
→ value

Reader for the content of token objects. The type of this value depends on the type of the token ― it holds strings for names, for example, and numbers for number tokens.

function token-line (token)
→ number

The line on which a token was read.

function token-char (token)
→ number

The character at which a token starts.