Top Down Operator Precedence - in Haskell - Infrared Clearance Sector

Introduction

While reading Top Down Operator Precedence by Douglas Crockford I was struck by the notion that here was a chance for me to convert my Haskell knowledge from read-only to something more read-write. The particular point was ... his technique is most effective when used in a dynamic, functional programming language. Well Haskell is certainly functional but what about dynamic? Here was a small working system, claimed to be beautiful, which I could translate without having to consider design nor correctness issues. Of course for the comparison I'd have to keep the Haskell structure close to the JavaScript, which I think I've achieved in the majority.

I made several abortive attempts at building the full parser:

written bottom-up and stateful
written bottom-up and stateful plus ST references
written bottom-up and stateful plus IO references
written top-down and purely functional

In the end the parser that worked was done in a top-down stateful way without references.

The main reason the bottom up versions failed was that testing them with all that state and references was tricky. On the other hand the top down versions could have simple tests because the top-level interface is simple and I could avoid writing quite some code and still compile by using 'error' and type signatures for dependent functions I hadn't written yet. Any errors raised that way simply indicated the next function to write to get the tests running.

The switch to pure functional was prompted by the uglyness of the stateful code, especially a mix of state and references. The resulting environment passing tangle soon made the pure version less beautiful but it highlighted exactly where I could use state to clean the code and a crucial diversion from the JavaScript design which removed all of the reference-based horror. The last half of the parser coding work because almost mecahnical transliteration.

The current version of the Haskell code is available

From now on there's a side by side comparison of the two versions although I've elided some of Douglas Crockford's exposition where it didn't seem relevant for comparison.

JavaScript

... JavaScript is a dynamic, functional language, but syntactically it is obviously a member of the C family. ... We will build a parser that can process Simplified JavaScript. We will write the parser in Simplified JavaScript. Simplified JavaScript is just the good stuff, including:

Functions as first class objects. ...
...
Object literals and array literals. ...

... Our implementation depends on an Object.beget method (which makes a new object that inherits members from an existing object) and a tokenizer ...

Haskell

Haskell is a large functional language and syntactically not at all a member of the C family. So the beautiful vision of a Simplified Haskell parser written in Simplified Haskell will have to wait. Simplified Haskell would have to include:

Functions as first class objects as in JavaScript.
Record literals and list literals rather than JavaScript's object and array literals.

Our implementation depends on Haskell's record syntax (which creates a record based on another record with modifications) and a tokenizer. Unlike JavaScript neither state nor data maps are supported directly so we also depend on the Data.Map and Control.Monad.State libraries.

Symbol Table

Whereas the JavaScript code makes use of some variables scoped across the entire parser, we place symbol_table and some others into an environment record which will be passed around the parser thanks to the State library.

var symbol_table = {};

symbol_table :: SymbolTable
type SymbolTable = Map Value Symbol

original_symbol has quite a few more fields set to avoid some warnings and provide the correct defaults.

Haskellers will note that the standard function id has been excluded for use in the Symbol record and the standard error has been replaced by a custom version. Also the use of skope is to avoid name collision with another record - Haskell records are somewhat strange.

To ensure that the nud, etc., 'methods' are called with the correct object scope we use some helper functions

In the JavaScript code first, second and third can be null, a symbol or a list of symbols. To avoid that dynamism we encode all of those as a list of symbols and the pretty printer transforms them uniformly back to JavaScript-alike form. Because this is done everywhere there are cases where our output is slightly different from the original.

var original_symbol = {
    nud: function () {
        this.error("Undefined.");
    },
    led: function (left) {
        this.error("Missing operator.");
    }
};

original_symbol = Symbol {
  nudf = \this   -> error this "Undefined.",
  ledf = \this _ -> error this "Missing operator.",
  std  = Nothing,
  first = [], second = [], third = [],
  isAssignment = False, reserved = False, key = Nothing,
  id = undefined, arity = undefined, value = undefined,
  lbp = undefined, skope = undefined
}

data Symbol = Symbol { id    :: Id,
                       arity :: Arity,
                       value :: Value,
                       lbp   :: BindingPower,
                       reserved, isAssignment :: Bool,
                       nudf  :: NudFun,
                       ledf  :: LedFun,
                       std   :: Maybe StdFun,
                       skope :: Scope,
                       key   :: Maybe Value,
                       first, second, third :: [Symbol] }

type NudFun = This ->           Parser Symbol
type LedFun = This -> Symbol -> Parser Symbol
type StdFun = This ->           Parser [Symbol]

nud s = nudf s s
led s = ledf s s

Here is where we diverge from the JavaScript design. Whereas the original relied on the mutability of the symbol being returned to set the nud, etc., fields I noticed that they were the only ones set like that. So instead this code takes in the nud, led or std function as well and binds it during symbol creation. In addition, Haskell doesn't support variable numbers of parameters without some type-level fancyness which I'll avoid so two wrappers are provided.

var symbol = function (id, bp) {
    var s = symbol_table[id];
    bp = bp || 0;
    if (s) {
        if (bp >= s.lbp) {
            s.lbp = bp;
        }
    } else {
        s = Object.beget(original_symbol);
        s.id = s.value = id;
        s.lbp = bp;
        symbol_table[id] = s;
    }
    return s;
};

symbol id bp typ = do
  symbol_table <- get
  let s' = bind typ $
           case lookup id symbol_table of
             Just s -> if bp >= lbp s
                       then s { lbp = bp }
                       else s
             _ -> original_symbol { id    = id,
                                    value = id,
                                    lbp   = bp }
  put $ insert id s' symbol_table
  return s'

    where
      bind (Nud f) s = s { nudf = f }
      bind (Led f) s = s { ledf = f }
      bind (Std f) s = s { std  = Just f }
      bind _       s = s

symbol0 id = symbol1 id NilT
symbol1    = flip symbol 0

data SymbolType = NilT 
                | Nud NudFun
                | Led LedFun
                | Std StdFun

Some symbols are just terminators or seperators without any logic of their own. Haskell needs some additional context which will be referenced later in the main parse function.

symbol(":");
symbol(";");
symbol(",");
symbol(")");
symbol("]");
symbol("}");
symbol("else");

symbol("(end)");
symbol("(name)");

initial_symbol_table = execState ist Map.empty
    where 
      ist = do

      symbol0 ":"
      symbol0 ";"
      symbol0 ")"
      symbol0 "]"
      symbol0 "}"
      symbol0 ","
      symbol0 "else"
      
      symbol0 "(end)"
      symbol0 "(name)"

Tokens

Again, there's a wrapper for the optional parameter when advancing with a required next token. Note that Haskell can not receive an unexpected token so there is no corresponding case.

var advance = function (id) {
    var a, o, t, v;
    if (id && token.id !== id) {
        token.error("Expected '" + id + "'.");
    }
    if (token_nr >= tokens.length) {
        token = symbol_table["(end)"];
        return;
    }
    t = tokens[token_nr];
    token_nr += 1;
    v = t.value;
    a = t.type;
    if (a === "name") {
        o = scope.find(v);
    } else if (a === "operator") {
        o = symbol_table[v];
        if (!o) {
            t.error("Unknown operator.");
        }
    } else if (a === "string" || a ===  "number") {
        a = "literal";
        o = symbol_table["(literal)"];
    } else {
        t.error("Unexpected token.");
    }
    token = Object.beget(o);
    token.value = v;
    token.arity = a;
    return token;
};

advanceIf requiredId = do
  token <- gets token
  when (id token /= requiredId) $
    error token $ "Expected '" ++ requiredId ++ "'."
  advance

advance = do
  this <- get
  let (t, ts) = case tokens this of
        []
          -> (symbol_table this ! "(end)", [])
        t@(Token a v):tokens'
          -> let (o, a') = case a of
                   NameType
                     -> (find this $ v, Name)
                   OperatorType
                     -> case lookup v $ symbol_table this of
                          Just t' -> (t', Operator)
                          _       -> error t "Unknown operator."
                   NumberType
                     -> (symbol_table this ! "(literal)", Literal)
                   StringType
                     -> (symbol_table this ! "(literal)", Literal)
                   -- the next case can't happen and ghc throws a warning
                   -- _         -> error t "Unexpected token."
             in (o { value = v, arity = a' }, tokens')

  put this { token = t, tokens = ts }
  return t

Scope

The current scope is also part of the environment. The scope could be modelled by a list but this more closely matches the OO structure of the JavaScript.

var scope;

scope :: Scope
data Scope = Scope { def    :: SymbolTable,
                     parent :: Scope }
           | TopScope

The original_scope object is reworked as as set of top level functions as they actually need references back to the symbol_table which resides in the environment.

I'm not sure why led is set to null in the JavaScript as that seems to remove the default error handling so we replace by something more specific.

var itself = function () {
    return this;
};

var original_scope = {
    define: function (n) {
        var t = this.def[n.value];
        if (typeof t === "object") {
            n.error(t.reserved ?
                "Already reserved." :
                "Already defined.");
        }
        this.def[n.value] = n;
        n.reserved = false;
        n.nud      = itself;
        n.led      = null;
        n.std      = null;
        n.lbp      = 0;
        n.scope    = scope;
        return n;
    },

itself = return

define n@Symbol {value = value} = do
  this <- gets scope
  let t = def this ! value
  when (member value $ def this) $
    error n $ if reserved t
              then "Already reserved."
              else "Already defined."
  let n' = n { reserved = False,
               nudf     = itself,
               ledf     = \this _ -> error this "Undefined operator.",
               std      = Nothing,
               lbp      = 0,
               skope    = this }
  env <- get
  put env { scope = this {
    def = insert value n' $ def this
  } }
  return n'

find splits the behaviour based on the environment. It also uses simple recursion rather than some higher-order recursion function. That is also to match the JavaScript style of simple while(true) loops and is used throughout the code where ever the JavaScript uses loops.

    find: function (n) {
        var e = this;
        while (true) {
            var o = e.def[n];
            if (o) {
                return o;
            }
            e = e.parent;
            if (!e) {
                return symbol_table[
                    symbol_table.hasOwnProperty(n) ?
                    n : "(name)"];
            }
        }
    },

find env@Env {scope = Scope {def = def, parent = e}} n =
  case lookup n def of
    Just t -> t
    _      -> find env { scope = e} n

find Env { symbol_table = symbol_table } n =
  case lookup n symbol_table of
    Just t -> t
    _      -> symbol_table ! "(name)"

    pop: function () {
        scope = this.parent;
    },

pop = do
  env <- get
  put env { scope = parent $ scope env }

    reserve: function (n) {
        if (n.arity !== "name" || n.reserved) {
            return;
        }
        var t = this.def[n.value];
        if (t) {
            if (t.reserved) {
                return;
            }
            if (t.arity === "name") {
                n.error("Already defined.");
            }
        }
        this.def[n.value] = n;
        n.reserved = true;
    }
};

reserve n@Symbol {arity = Name, reserved = False, value = value} = do
  this <- gets scope
  let t = def this ! value
  when (member value $ def this) $ do
    when (reserved t) $
      return ()
    when (arity t == Name) $
      error n "Unreserved is already defined."
  env <- get
  put env { scope = this {
    def = insert value n { reserved = True } $ def this
  } }

reserve _ = return ()

var new_scope = function () {
    var s = scope;
    scope = Object.beget(original_scope);
    scope.def = {};
    scope.parent = s;
    return scope;
};

new_scope = do
  s <- gets scope
  let s' = Scope { def    = Map.empty,
                   parent = s }
  env <- get
  put env { scope = s' }
  return s'

Expressions

var expression = function (rbp) {
    var left;
    var t = token;
    advance();
    left = t.nud();
    while (rbp < token.lbp) {
        t = token;
        advance();
        left = t.led(left);
    }
    return left;
}

expression rbp = do
  t <- gets token
  advance
  left <- nud t
  let walkRight left = do
        t <- gets token
        if rbp < lbp t then do
            advance
            left <- led t left
            walkRight left
          else return left
  walkRight left

Infix Operators

infix is a Haskell keyword. The default behaviour is provided by the inphix0 version. Note the encoding of the simple symbol left as a single element list.

var infix = function (id, bp, led) {
    var s = symbol(id, bp);
    s.led = led || function (left) {
        this.first = left;
        this.second = expression(bp);
        this.arity = "binary";
        return this;
    };
    return s;
}

inphix0 s bp = inphix s bp $ \this left -> do
  right <- expression bp
  return this { first  = [left],
                second = [right],
                arity  = Binary }

inphix s bp f = symbol s bp $ Led f

These and other dimilar definitions are actually part of the initial_symbol_table started above.

infix("+", 50);
infix("-", 50);
infix("*", 60);
infix("/", 60);

      inphix0 "+" 50
      inphix0 "-" 50
      inphix0 "*" 60
      inphix0 "/" 60

infix("?", 20, function (left) {
    this.first = left;
    this.second = expression(0);
    advance(":");
    this.third = expression(0);
    this.arity = "ternary";
    return this;
});

      inphix "?" 20 $ \this left -> do
        whenTrue <- expression 0
        advanceIf ":"
        whenFalse <- expression 0
        return this { first  = [left],
                      second = [whenTrue],
                      third  = [whenFalse],
                      arity  = Ternary }

Even though the Javascript updates the token it is then immediately discarded by advance so we won't bother

infix(".", 80, function (left) {
    this.first = left;
    if (token.arity !== "name") {
        token.error("Expected a property name.");
    }
    token.arity = "literal";
    this.second = token;
    this.arity = "binary";
    advance();
    return this;
});

      inphix "." 80 $ \this left -> do
        token <- gets token
        when (arity token /= Name) $
          error token "Expected a property name."
        advance
        return this { first  = [left],
                      second = [token { arity = Literal }],
                      arity  = Binary }

infix("[", 80, function (left) {
    this.first = left;
    this.second = expression(0);
    this.arity = "binary";
    advance("]");
    return this;
});

      inphix "[" 80 $ \this left -> do
        right <- expression 0
        advanceIf "]"
        return this { first  = [left],
                      second = [right],
                      arity  = Binary }

In the Haskell version it becomes obvious that the base inphixr is identical to inphix above and only the default behaviour is different.

var infixr = function (id, bp, led) {
    var s = symbol(id, bp);
    s.led = led || function (left) {
        this.first = left;
        this.second = expression(bp - 1);
        this.arity = "binary";
        return this;
    };
    return s;
}

inphixr0 s bp = inphixr s bp $ \this left -> do
  right <- expression $ bp-1
  return this { first  = [left],
                second = [right],
                arity  = Binary }

inphixr = inphix

infixr("===", 40);
infixr("!==", 40);
infixr("<", 40);
infixr("<=", 40);
infixr(">", 40);
infixr(">=", 40);
infixr("&&", 30);
infixr("||", 30);

      inphixr0 "===" 40
      inphixr0 "!==" 40
      inphixr0 "<" 40
      inphixr0 "<=" 40
      inphixr0 ">" 40
      inphixr0 ">=" 40
      inphixr0 "&&" 30
      inphixr0 "||" 30

Prefix Operators

var prefix = function (id, nud) {
    var s = symbol(id);
    s.nud = nud || function () {
        scope.reserve(this);
        this.first = expression(70);
        this.arity = "unary";
        return this;
    };
    return s;
}

prephix0 s = prephix s $ \this -> do
  reserve this
  expr <- expression 70
  return this { first = [expr],
                arity = Unary }

prephix s f = symbol1 s $ Nud f

prefix("-");
prefix("!");
prefix("typeof");

      prephix0 "!"
      prephix0 "-"
      prephix0 "typeof"

prefix("(", function () {
    var e = expression(0);
    advance(")");
    return e;
});

      prephix "(" $ \this -> do
        e <- expression 0
        advanceIf ")"
        return e

Assignment Operators

var assignment = function (id) {
    return infixr(id, 10, function (left) {
        if (left.id !== "." && left.id !== "[" &&
                left.arity !== "name") {
            left.error("Bad lvalue.");
        }
        this.first = left;
        this.second = expression(9);
        this.assignment = true;
        this.arity = "binary";
        return this;
    });
};

assignment s = inphixr s 10 $ \this left -> do
  when (id left /= "." && id left /= "[" &&
        arity left /= Name) $
    error left "Bad lvalue."
  right <- expression 9
  return this { first  = [left],
                second = [right],
                arity  = Binary,
                isAssignment = True }

assignment("=");
assignment("+=");
assignment("-=");

      assignment "="
      assignment "+="
      assignment "-="

Constants

Rather than try and emulate all the kinds of JavaScript value the constant could be set to without much benefit it is simply ignored.

var constant = function (s, v) {
    var x = symbol(s);
    x.nud = function () {
        scope.reserve(this);
        this.value = symbol_table[this.id].value;
        this.arity = "literal";
        return this;
    };
    x.value = v;
    return x;
};

constant0 s v = constant s v $ \this -> do
  reserve this
  symbol_table <- gets symbol_table
  return this { value = value $ symbol_table ! (id this),
                arity = Literal }

constant s _ f = symbol1 s $ Nud f

constant("true", true);
constant("false", false);
constant("null", null);
constant("pi", 3.141592653589793);
constant("Object", {});
constant("Array", []);

      constant0 "true" True
      constant0 "false" False
      constant0 "null" undefined
      constant0 "pi" 3.141592653589793
      constant0 "Object" Map.empty
      constant0 "Array" []

symbol("(literal)").nud = itself;

      symbol1 "(literal)" $ Nud itself

Statements

var statement = function () {
    var n = token, v;
    if (n.std) {
        advance();
        scope.reserve(n);
        return n.std();
    }
    v = expression(0);
    if (!v.assignment && v.id !== "(") {
        v.error("Bad expression statement.");
    }
    advance(";");
    return v;
};

statement = do
  n <- gets token
  case n of
    Symbol { std = Just std } -> do
      advance
      reserve n
      std n
    otherwise -> do
      v <- expression 0
      when (not (isAssignment v) && id v /= "(") $
        error v "Bad expression statement."
      advanceIf ";"
      return [v]

Rather than change the return type, which has been fixed to symbol lists in any case, the pretty printer does the right thing. Also note no attempt to make this tail recursive - how huge are the functions being parsed going to be? :-)

var statements = function () {
    var a = [], s;
    while (true) {
        if (token.id === "}" || token.id === "(end)") {
            break;
        }
        s = statement();
        if (s) {
            a.push(s);
        }
    }
    return a.length === 0 ? null : a.length === 1 ? a[0] : a;
};

statements = do
  token <- gets token
  if id token == "}" || id token == "(end)"
    then return []
    else do
      s <- statement
      ss <- statements
      return $ s ++ ss

var stmt = function (s, f) {
    var x = symbol(s);
    x.std = f;
    return x;
};

stmt s f = symbol1 s $ Std f

stmt("{", function () {
    new_scope();
    var a = statements();
    advance("}");
    scope.pop();
    return a;
});

      stmt "{" $ \this -> do
        new_scope
        a <- statements
        advanceIf "}"
        pop
        return a

var block = function () {
    var t = token;
    advance("{");
    return t.std();
};

block = do
  t <- gets token
  advanceIf "{"
  case std t of
    Just s -> s t

We have to jump through a little hoop to only return variable definitions which have a default value. Here also was an opertunity to create a higher order recursion which recognised the seperator and which could have been used in similar later code.

Note that the Haskell makes use of advance's return value to save on retrieving the state again which the JavaScript never does thanks to it's natural statefullness. On the other hand, Haskell is more controlled in it's use of the 'global' variables. Whether that makes the logic more obvious...

stmt("var", function () {
    var a = [], n, t;
    while (true) {
        n = token;
        if (n.arity !== "name") {
            n.error("Expected a new variable name.");
        }
        scope.define(n);
        advance();
        if (token.id === "=") {
            t = token;
            advance("=");
            t.first = n;
            t.second = expression(0);
            t.arity = "binary";
            a.push(t);
        }
        if (token.id !== ",") {
            break;
        }
        advance(",");
    }
    advance(";");
    return a.length === 0 ? null : a.length === 1 ? a[0] : a;
});

      stmt "var" $ \this -> do
        let vars = do
              n <- gets token
              when (arity n /= Name) $
                error n "Expected a new variable name."
              define n
              t <- advance
              a <- if id t == "=" then do
                     advanceIf "="
                     s <- expression 0
                     let t' = t { first  = [n],
                                  second = [s],
                                  arity  = Binary,
                                  isAssignment = True }
                     return [t']
                   else return []
              t <- gets token
              if id t /= ","
                then return a
                else do
                  advanceIf ","
                  v <- vars
                  return $ a++v
        a <- vars
        advanceIf ";"
        return a

stmt("while", function () {
    advance("(");
    this.first = expression(0);
    advance(")");
    this.second = block();
    this.arity = "statement";
    return this;
});

      stmt "while" $ \this -> do
        advanceIf "("
        f <- expression 0
        advanceIf ")"
        s <- block
        return [this { first  = [f],
                       second = s,
                       arity  = Statement }]

stmt("if", function () {
    advance("(");
    this.first = expression(0);
    advance(")");
    this.second = block();
    if (token.id === "else") {
        scope.reserve(token);
        advance("else");
        this.third = token.id === "if" ? statement() : block();
    } else {
        this.third = null;
    }
    this.arity = "statement";
    return this;
});

      stmt "if" $ \this -> do
        advanceIf "("
        test <- expression 0
        advanceIf ")"
        body <- block
        token <- gets token
        els <- if id token == "else" then do
                 reserve token
                 token <- advanceIf "else"
                 if id token == "if" then statement else block
               else return []
        return [this { first  = [test],
                       second = body,
                       third  = els,
                       arity  = Statement }]

stmt("break", function () {
    advance(";");
    if (token.id !== "}") {
        token.error("Unreachable statement.");
    }
    this.arity = "statement";
    return this;
});

      stmt "break" $ \this -> do
        t <- advanceIf ";"
        when (id t /= "}") $
          error t "Unreachable statement."
        return [this { arity = Statement }]

stmt("return", function () {
    if (token.id !== ";") {
        this.first = expression(0);
    }
    advance(";");
    if (token.id !== "}") {
        token.error("Unreachable statement.");
    }
    this.arity = "statement";
    return this;
});

      stmt "return" $ \this -> do
        t <- gets token
        first <- if id t /= ";" then do
                   e <- expression 0
                   return [e]
                 else return []
        t <- advanceIf ";"
        when (id t /= "}") $
          error t "Unreachable statement."
        return [this { first = first,
                       arity = Statement }]

Functions

prefix("function", function () {
    var a = [];
    new_scope();
    if (token.arity === "name") {
        scope.define(token);
        this.name = token.value;
        advance();
    }
    advance("(");
    if (token.id !== ")") {
        while (true) {
            if (token.arity !== "name") {
                token.error("Expected a parameter name.");
            }
            scope.define(token);
            a.push(token);
            advance();
            if (token.id !== ",") {
                break;
            }
            advance(",");
        }
    }
    this.first = a;
    advance(")");
    advance("{");
    this.second = statements();
    advance("}");
    this.arity = "function";
    scope.pop();
    return this;
});

      prephix "function" $ \this -> do
        new_scope
        t <- gets token
        n <- if arity t == Name then do
               define t
               advance
               return $ Just $ value t
             else return Nothing
        t <- advanceIf "("
        a <- if id t /= ")" then
               let params = do
                     t <- gets token
                     when (arity t /= Name) $
                       error t "Expected a parameter name."
                     define t
                     token <- advance
                     if id token /= ","
                       then return [t]
                       else do
                         advanceIf ","
                         p <- params
                         return $ t:p
               in params
             else return []
        advanceIf ")"
        advanceIf "{"
        s <- statements
        advanceIf "}"
        pop
        return this { first  = a,
                      second = s,
                      arity  = Function { name = n } }

In Haskell the non-variable a can't be used until after it's been populated.

infix("(", 80, function (left) {
    var a = [];
    if (left.id === "." || left.id === "[") {
        this.arity = "ternary";
        this.first = left.first;
        this.second = left.second;
        this.third = a;
    } else {
        this.arity = "binary";
        this.first = left;
        this.second = a;
        if ((left.arity !== "unary" || left.id !== "function") &&
                 left.arity !== "name" &&
                 left.id !== "(" &&
                 left.id !== "&&" &&
                 left.id !== "||" &&
                 left.id !== "?") {
            left.error("Expected a variable name.");
        }
    }
    if (token.id !== ")") {
        while (true)  {
            a.push(expression(0));
            if (token.id !== ",") {
                break;
            }
            advance(",");
        }
    }
    advance(")");
    return this;
});

      inphix "(" 80 $ \this left -> do
        t <- gets token
        a <- if id t /= ")" then
               let vars = do
                     e <- expression 0
                     token <- gets token
                     if id token /= ","
                       then return [e]
                       else do
                         advanceIf ","
                         v <- vars
                         return $ e:v
               in vars
             else return []
        let this' = if id left == "." || id left == "["
                    then this { first  = first left,
                                second = second left,
                                third  = a,
                                arity  = Ternary }
                    else if (arity left /= Unary || id left /= "function") &&
                            arity left /= Name && id left /= "(" &&
                            id left /= "&&" && id left /= "||" && id left /= "?"
                         then error left "Expected a variable name."
                         else this { first  = [left],
                                     second = a,
                                     arity  = Binary }
        advanceIf ")"
        return this'

symbol("this").nud = function () {
    scope.reserve(this);
    this.arity = "this";
    return this;
};

      symbol1 "this" $ Nud $ \this -> do
        reserve this
        return this { arity = This }

Object Literals

prefix("[", function () {
    var a = [];
    if (token.id !== "]") {
        while (true) {
            a.push(expression(0));
            if (token.id !== ",") {
                break;
            }
            advance(",");
        }
    }
    advance("]");
    this.first = a;
    this.arity = "unary";
    return this;
});

      prephix "[" $ \this -> do
        t <- gets token
        a <- if id t /= "]" then
               let entries = do
                     v <- expression 0
                     token <- gets token
                     if id token /= ","
                       then return [v]
                       else do
                         advanceIf ","
                         e <- entries
                         return $ v:e
               in entries
             else return []
        advanceIf "]"
        return this { first = a,
                      arity = Unary }

prefix("{", function () {
    var a = [];
    if (token.id !== "}") {
        while (true) {
            var n = token;
            if (n.arity !== "name" && n.arity !== "literal") {
                token.error("Bad key.");
            }
            advance();
            advance(":");
            var v = expression(0);
            v.key = n.value;
            a.push(v);
            if (token.id !== ",") {
                break;
            }
            advance(",");
        }
    }
    advance("}");
    this.first = a;
    this.arity = "unary";
    return this;
});

      prephix "{" $ \this -> do
        t <- gets token
        a <- if id t /= "}" then
               let entries = do
                     n <- gets token
                     when (arity n /= Name && arity n /= Literal) $
                       error n "Bad property name."
                     advance
                     advanceIf ":"
                     v <- expression 0
                     let v' = v { key = Just $ value n }
                     token <- gets token
                     if id token /= ","
                       then return [v']
                       else do
                         advanceIf ","
                         e <- entries
                         return $ v':e
               in entries
             else return []
        advanceIf "}"
        return this { first = a, 
                      arity = Unary }

The Parser

Finally the initial_symbol_table is used along with the tokeniser.

return function (source) {
    tokens = source.tokens('=<>!+-*&|/%^', '=<>&|');
    token_nr = 0;
    new_scope();
    advance();
    var s = statements();
    advance("(end)");
    scope.pop();
    return s;
};

parse source = 
  evalState ( do
    new_scope
    advance
    s <- statements
    advanceIf "(end)"
    return s
  ) Env { tokens       = tokenise source,
          scope        = TopScope,
          token        = original_symbol { id = "(start)" },
          symbol_table = initial_symbol_table }

Conclusion

The Simplified JavaScript (SJ) and Statefull Simplified Haskell (SSH) versions are very similar in complexity and structure. Because full mutable state is not 'native' to Haskell it's occasionally a little more verbose. On the other hand, pattern matching allows better separation of code in some parts and easy access to the internals of (non-environmental) parameters.

Whether the beauty of the SJ parser in SJ rubs off on the SJ parser in SSH is debatable. Part is due to keeping to a SJ coding style so that common Haskell idioms, especially the use of higher-order functions over simple recursion, are not used.

The Tokeniser

This code makes as much use of GHC's standard libraries, particularly the (Haskell) lexer. Here is the entire thing.

module Tokeniser where

import Text.Read (lex)
import Data.Char (isAlpha, isNumber)

data Token     = Token TokenType String
                 deriving Show
data TokenType = NameType | StringType | NumberType | OperatorType
                 deriving Show

tokenise = tokens . head . lex
    where
      tokens (t, "") = [token t]
      tokens (t, s)  =  token t : tokenise s

      token t@(c:_) = Token tokenType text
          where
            tokenType | isAlpha  c = NameType     
                      | isNumber c = NumberType   
                      | '"' ==   c = StringType
                      | otherwise  = OperatorType 

            text | c == '"'  = drop 1 $ take (length t-1) t
                 | otherwise = t

      token _ = Token OperatorType "(end)"

The Pretty Printer

This also makes very good use of the standard pretty printing library. Yes, this does produce output very similar to that shown in the JavaScript demo.

module PrettyPrint where

import Text.PrettyPrint

import TopDownParserState

pp = ppList ""

ppList l []     = empty
ppList l (s:[]) = ppSymbol l s
ppList l s      = bracket l $
                    vcat $ map (ppSymbol "") s

ppSymbol l Symbol {key = k, value = v, arity = a,
                   first = f, second = s, third = t } =
    brace l $
      ppMaybe "key: " k $$
      ppValue v $$
      ppArity a $$
      ppList "first: " f $$
      ppList "second: " s $$
      ppList "third: " t

ppMaybe l (Just k) = text l <> (text $ show k)
ppMaybe l Nothing  = empty

ppValue v = text "value: " <> (text $ show v)

ppArity a@Function {name = n} =
    text "arity: Function" $$
    ppMaybe "name: " n
ppArity a = text "arity: " <> (text $ show a)

bracket l s = (text l <> lbrack) $+$ indent s $$ rbrack
brace   l s = (text l <> lbrace) $+$ indent s $$ rbrace

indent = nest 4