While reading Top
Down Operator Precedence by Douglas Crockford I was struck by the notion
that here was a chance for me to convert my Haskell knowledge from read-only
to something more read-write. The particular point was ... his
technique is most effective when used in a dynamic, functional programming
language.
Well Haskell is certainly functional but what about
dynamic? Here was a small working system, claimed to be beautiful, which I
could translate without having to consider design nor correctness issues. Of
course for the comparison I'd have to keep the Haskell structure close to the
JavaScript, which I think I've achieved in the majority.
I made several abortive attempts at building the full parser:
The main reason the bottom up versions failed was that testing them with all that state and references was tricky. On the other hand the top down versions could have simple tests because the top-level interface is simple and I could avoid writing quite some code and still compile by using 'error' and type signatures for dependent functions I hadn't written yet. Any errors raised that way simply indicated the next function to write to get the tests running.
The switch to pure functional was prompted by the uglyness of the stateful code, especially a mix of state and references. The resulting environment passing tangle soon made the pure version less beautiful but it highlighted exactly where I could use state to clean the code and a crucial diversion from the JavaScript design which removed all of the reference-based horror. The last half of the parser coding work because almost mecahnical transliteration.
The current version of the Haskell code is available
From now on there's a side by side comparison of the two versions although I've elided some of Douglas Crockford's exposition where it didn't seem relevant for comparison.
... JavaScript is a dynamic, functional language, but syntactically it is obviously a member of the C family. ... We will build a parser that can process Simplified JavaScript. We will write the parser in Simplified JavaScript. Simplified JavaScript is just the good stuff, including:
... Our implementation depends on an Object.beget
method (which makes a new object that inherits members from an
existing object) and a tokenizer ...
Haskell is a large functional language and syntactically not at all a member of the C family. So the beautiful vision of a Simplified Haskell parser written in Simplified Haskell will have to wait. Simplified Haskell would have to include:
Our implementation depends on Haskell's record syntax (which
creates a record based on another record with modifications) and a
tokenizer. Unlike JavaScript neither state nor data maps are
supported directly so we also depend on the Data.Map and
Control.Monad.State libraries.
Whereas the JavaScript code makes use of some variables scoped
across the entire parser, we place symbol_table and some others into
an environment record which will be passed around the parser thanks
to the State library.
var symbol_table = {};
symbol_table :: SymbolTable type SymbolTable = Map Value Symbol
original_symbol has quite a few more fields set to
avoid some warnings and provide the correct defaults.
Haskellers will note that the standard function id has
been excluded for use in the Symbol record and
the standard error has been replaced by a custom
version. Also the use of skope is to avoid name
collision with another record - Haskell records are somewhat
strange.
To ensure that the nud, etc., 'methods' are called
with the correct object scope we use some helper functions
In the JavaScript code first, second
and third can be null, a symbol or a list of symbols. To
avoid that dynamism we encode all of those as a list of symbols and
the pretty printer transforms them uniformly back to JavaScript-alike
form. Because this is done everywhere there are cases where our
output is slightly different from the original.
var original_symbol = {
nud: function () {
this.error("Undefined.");
},
led: function (left) {
this.error("Missing operator.");
}
};
original_symbol = Symbol {
nudf = \this -> error this "Undefined.",
ledf = \this _ -> error this "Missing operator.",
std = Nothing,
first = [], second = [], third = [],
isAssignment = False, reserved = False, key = Nothing,
id = undefined, arity = undefined, value = undefined,
lbp = undefined, skope = undefined
}
data Symbol = Symbol { id :: Id,
arity :: Arity,
value :: Value,
lbp :: BindingPower,
reserved, isAssignment :: Bool,
nudf :: NudFun,
ledf :: LedFun,
std :: Maybe StdFun,
skope :: Scope,
key :: Maybe Value,
first, second, third :: [Symbol] }
type NudFun = This -> Parser Symbol
type LedFun = This -> Symbol -> Parser Symbol
type StdFun = This -> Parser [Symbol]
nud s = nudf s s
led s = ledf s s
Here is where we diverge from the JavaScript design. Whereas the
original relied on the mutability of the symbol being returned
to set the nud, etc., fields I noticed that they
were the only ones set like that. So instead this code takes in
the nud, led or std function as well and binds it during symbol
creation. In addition, Haskell doesn't support variable numbers
of parameters without some type-level fancyness which I'll avoid
so two wrappers are provided.
var symbol = function (id, bp) {
var s = symbol_table[id];
bp = bp || 0;
if (s) {
if (bp >= s.lbp) {
s.lbp = bp;
}
} else {
s = Object.beget(original_symbol);
s.id = s.value = id;
s.lbp = bp;
symbol_table[id] = s;
}
return s;
};
symbol id bp typ = do
symbol_table <- get
let s' = bind typ $
case lookup id symbol_table of
Just s -> if bp >= lbp s
then s { lbp = bp }
else s
_ -> original_symbol { id = id,
value = id,
lbp = bp }
put $ insert id s' symbol_table
return s'
where
bind (Nud f) s = s { nudf = f }
bind (Led f) s = s { ledf = f }
bind (Std f) s = s { std = Just f }
bind _ s = s
symbol0 id = symbol1 id NilT
symbol1 = flip symbol 0
data SymbolType = NilT
| Nud NudFun
| Led LedFun
| Std StdFun
Some symbols are just terminators or seperators without any logic
of their own. Haskell needs some additional context which will be
referenced later in the main parse function.
symbol(":");
symbol(";");
symbol(",");
symbol(")");
symbol("]");
symbol("}");
symbol("else");
symbol("(end)");
symbol("(name)");
initial_symbol_table = execState ist Map.empty
where
ist = do
symbol0 ":"
symbol0 ";"
symbol0 ")"
symbol0 "]"
symbol0 "}"
symbol0 ","
symbol0 "else"
symbol0 "(end)"
symbol0 "(name)"
Again, there's a wrapper for the optional parameter when advancing with a required next token. Note that Haskell can not receive an unexpected token so there is no corresponding case.
var advance = function (id) {
var a, o, t, v;
if (id && token.id !== id) {
token.error("Expected '" + id + "'.");
}
if (token_nr >= tokens.length) {
token = symbol_table["(end)"];
return;
}
t = tokens[token_nr];
token_nr += 1;
v = t.value;
a = t.type;
if (a === "name") {
o = scope.find(v);
} else if (a === "operator") {
o = symbol_table[v];
if (!o) {
t.error("Unknown operator.");
}
} else if (a === "string" || a === "number") {
a = "literal";
o = symbol_table["(literal)"];
} else {
t.error("Unexpected token.");
}
token = Object.beget(o);
token.value = v;
token.arity = a;
return token;
};
advanceIf requiredId = do
token <- gets token
when (id token /= requiredId) $
error token $ "Expected '" ++ requiredId ++ "'."
advance
advance = do
this <- get
let (t, ts) = case tokens this of
[]
-> (symbol_table this ! "(end)", [])
t@(Token a v):tokens'
-> let (o, a') = case a of
NameType
-> (find this $ v, Name)
OperatorType
-> case lookup v $ symbol_table this of
Just t' -> (t', Operator)
_ -> error t "Unknown operator."
NumberType
-> (symbol_table this ! "(literal)", Literal)
StringType
-> (symbol_table this ! "(literal)", Literal)
-- the next case can't happen and ghc throws a warning
-- _ -> error t "Unexpected token."
in (o { value = v, arity = a' }, tokens')
put this { token = t, tokens = ts }
return t
The current scope is also part of the environment. The scope could be modelled by a list but this more closely matches the OO structure of the JavaScript.
var scope;
scope :: Scope
data Scope = Scope { def :: SymbolTable,
parent :: Scope }
| TopScope
The original_scope object is reworked as as set of top
level functions as they actually need references back to the
symbol_table which resides in the environment.
I'm not sure why led is set to null in the JavaScript
as that seems to remove the default error handling so we replace by
something more specific.
var itself = function () {
return this;
};
var original_scope = {
define: function (n) {
var t = this.def[n.value];
if (typeof t === "object") {
n.error(t.reserved ?
"Already reserved." :
"Already defined.");
}
this.def[n.value] = n;
n.reserved = false;
n.nud = itself;
n.led = null;
n.std = null;
n.lbp = 0;
n.scope = scope;
return n;
},
itself = return
define n@Symbol {value = value} = do
this <- gets scope
let t = def this ! value
when (member value $ def this) $
error n $ if reserved t
then "Already reserved."
else "Already defined."
let n' = n { reserved = False,
nudf = itself,
ledf = \this _ -> error this "Undefined operator.",
std = Nothing,
lbp = 0,
skope = this }
env <- get
put env { scope = this {
def = insert value n' $ def this
} }
return n'
find splits the behaviour based on the environment.
It also uses simple recursion rather than some higher-order recursion
function. That is also to match the JavaScript style of
simple while(true) loops and is used throughout the code
where ever the JavaScript uses loops.
find: function (n) {
var e = this;
while (true) {
var o = e.def[n];
if (o) {
return o;
}
e = e.parent;
if (!e) {
return symbol_table[
symbol_table.hasOwnProperty(n) ?
n : "(name)"];
}
}
},
find env@Env {scope = Scope {def = def, parent = e}} n =
case lookup n def of
Just t -> t
_ -> find env { scope = e} n
find Env { symbol_table = symbol_table } n =
case lookup n symbol_table of
Just t -> t
_ -> symbol_table ! "(name)"
pop: function () {
scope = this.parent;
},
pop = do
env <- get
put env { scope = parent $ scope env }
reserve: function (n) {
if (n.arity !== "name" || n.reserved) {
return;
}
var t = this.def[n.value];
if (t) {
if (t.reserved) {
return;
}
if (t.arity === "name") {
n.error("Already defined.");
}
}
this.def[n.value] = n;
n.reserved = true;
}
};
reserve n@Symbol {arity = Name, reserved = False, value = value} = do
this <- gets scope
let t = def this ! value
when (member value $ def this) $ do
when (reserved t) $
return ()
when (arity t == Name) $
error n "Unreserved is already defined."
env <- get
put env { scope = this {
def = insert value n { reserved = True } $ def this
} }
reserve _ = return ()
var new_scope = function () {
var s = scope;
scope = Object.beget(original_scope);
scope.def = {};
scope.parent = s;
return scope;
};
new_scope = do
s <- gets scope
let s' = Scope { def = Map.empty,
parent = s }
env <- get
put env { scope = s' }
return s'
var expression = function (rbp) {
var left;
var t = token;
advance();
left = t.nud();
while (rbp < token.lbp) {
t = token;
advance();
left = t.led(left);
}
return left;
}
expression rbp = do
t <- gets token
advance
left <- nud t
let walkRight left = do
t <- gets token
if rbp < lbp t then do
advance
left <- led t left
walkRight left
else return left
walkRight left
infix is a Haskell keyword. The default behaviour is
provided by the inphix0 version. Note the encoding of
the simple symbol left as a single element list.
var infix = function (id, bp, led) {
var s = symbol(id, bp);
s.led = led || function (left) {
this.first = left;
this.second = expression(bp);
this.arity = "binary";
return this;
};
return s;
}
inphix0 s bp = inphix s bp $ \this left -> do
right <- expression bp
return this { first = [left],
second = [right],
arity = Binary }
inphix s bp f = symbol s bp $ Led f
These and other dimilar definitions are actually part of
the initial_symbol_table started above.
infix("+", 50);
infix("-", 50);
infix("*", 60);
infix("/", 60);
inphix0 "+" 50
inphix0 "-" 50
inphix0 "*" 60
inphix0 "/" 60
infix("?", 20, function (left) {
this.first = left;
this.second = expression(0);
advance(":");
this.third = expression(0);
this.arity = "ternary";
return this;
});
inphix "?" 20 $ \this left -> do
whenTrue <- expression 0
advanceIf ":"
whenFalse <- expression 0
return this { first = [left],
second = [whenTrue],
third = [whenFalse],
arity = Ternary }
Even though the Javascript updates the token it is then
immediately discaded by advance so we won't bother
infix(".", 80, function (left) {
this.first = left;
if (token.arity !== "name") {
token.error("Expected a property name.");
}
token.arity = "literal";
this.second = token;
this.arity = "binary";
advance();
return this;
});
inphix "." 80 $ \this left -> do
token <- gets token
when (arity token /= Name) $
error token "Expected a property name."
advance
return this { first = [left],
second = [token { arity = Literal }],
arity = Binary }
infix("[", 80, function (left) {
this.first = left;
this.second = expression(0);
this.arity = "binary";
advance("]");
return this;
});
inphix "[" 80 $ \this left -> do
right <- expression 0
advanceIf "]"
return this { first = [left],
second = [right],
arity = Binary }
In the Haskell version it becomes obvious that the
base inphixr is identical to inphix above
and only the default behaviour is different.
var infixr = function (id, bp, led) {
var s = symbol(id, bp);
s.led = led || function (left) {
this.first = left;
this.second = expression(bp - 1);
this.arity = "binary";
return this;
};
return s;
}
inphixr0 s bp = inphixr s bp $ \this left -> do
right <- expression $ bp-1
return this { first = [left],
second = [right],
arity = Binary }
inphixr = inphix
infixr("===", 40);
infixr("!==", 40);
infixr("<", 40);
infixr("<=", 40);
infixr(">", 40);
infixr(">=", 40);
infixr("&&", 30);
infixr("||", 30);
inphixr0 "===" 40
inphixr0 "!==" 40
inphixr0 "<" 40
inphixr0 "<=" 40
inphixr0 ">" 40
inphixr0 ">=" 40
inphixr0 "&&" 30
inphixr0 "||" 30
var prefix = function (id, nud) {
var s = symbol(id);
s.nud = nud || function () {
scope.reserve(this);
this.first = expression(70);
this.arity = "unary";
return this;
};
return s;
}
prephix0 s = prephix s $ \this -> do
reserve this
expr <- expression 70
return this { first = [expr],
arity = Unary }
prephix s f = symbol1 s $ Nud f
prefix("-");
prefix("!");
prefix("typeof");
prephix0 "!"
prephix0 "-"
prephix0 "typeof"
prefix("(", function () {
var e = expression(0);
advance(")");
return e;
});
prephix "(" $ \this -> do
e <- expression 0
advanceIf ")"
return e
var assignment = function (id) {
return infixr(id, 10, function (left) {
if (left.id !== "." && left.id !== "[" &&
left.arity !== "name") {
left.error("Bad lvalue.");
}
this.first = left;
this.second = expression(9);
this.assignment = true;
this.arity = "binary";
return this;
});
};
assignment s = inphixr s 10 $ \this left -> do
when (id left /= "." && id left /= "[" &&
arity left /= Name) $
error left "Bad lvalue."
right <- expression 9
return this { first = [left],
second = [right],
arity = Binary,
isAssignment = True }
assignment("=");
assignment("+=");
assignment("-=");
assignment "="
assignment "+="
assignment "-="
Rather than try and emulate all the kinds of JavaScript value the constant could be set to without much benefit it is simply ignored.
var constant = function (s, v) {
var x = symbol(s);
x.nud = function () {
scope.reserve(this);
this.value = symbol_table[this.id].value;
this.arity = "literal";
return this;
};
x.value = v;
return x;
};
constant0 s v = constant s v $ \this -> do
reserve this
symbol_table <- gets symbol_table
return this { value = value $ symbol_table ! (id this),
arity = Literal }
constant s _ f = symbol1 s $ Nud f
constant("true", true);
constant("false", false);
constant("null", null);
constant("pi", 3.141592653589793);
constant("Object", {});
constant("Array", []);
constant0 "true" True
constant0 "false" False
constant0 "null" undefined
constant0 "pi" 3.141592653589793
constant0 "Object" Map.empty
constant0 "Array" []
symbol("(literal)").nud = itself;
symbol1 "(literal)" $ Nud itself
var statement = function () {
var n = token, v;
if (n.std) {
advance();
scope.reserve(n);
return n.std();
}
v = expression(0);
if (!v.assignment && v.id !== "(") {
v.error("Bad expression statement.");
}
advance(";");
return v;
};
statement = do
n <- gets token
case n of
Symbol { std = Just std } -> do
advance
reserve n
std n
otherwise -> do
v <- expression 0
when (not (isAssignment v) && id v /= "(") $
error v "Bad expression statement."
advanceIf ";"
return [v]
Rather than change the return type, which has been fixed to symbol lists in any case, the pretty printer does the right thing. Also note no attempt to make this tail recursive - how huge are the functions being parsed going to be? :-)
var statements = function () {
var a = [], s;
while (true) {
if (token.id === "}" || token.id === "(end)") {
break;
}
s = statement();
if (s) {
a.push(s);
}
}
return a.length === 0 ? null : a.length === 1 ? a[0] : a;
};
statements = do
token <- gets token
if id token == "}" || id token == "(end)"
then return []
else do
s <- statement
ss <- statements
return $ s ++ ss
var stmt = function (s, f) {
var x = symbol(s);
x.std = f;
return x;
};
stmt s f = symbol1 s $ Std f
stmt("{", function () {
new_scope();
var a = statements();
advance("}");
scope.pop();
return a;
});
stmt "{" $ \this -> do
new_scope
a <- statements
advanceIf "}"
pop
return a
var block = function () {
var t = token;
advance("{");
return t.std();
};
block = do
t <- gets token
advanceIf "{"
case std t of
Just s -> s t
We have to jump through a little hoop to only return variable definitions which have a default value. Here also was an opertunity to create a higher order recursion which recognised the seperator and which could have been used in similar later code.
Note that the Haskell makes use of advance's return
value to save on retrieving the state again which the JavaScript never
does thanks to it's natural statefullness. On the other hand, Haskell
is more controlled in it's use of the 'global' variables. Whether
that makes the logic more obvious...
stmt("var", function () {
var a = [], n, t;
while (true) {
n = token;
if (n.arity !== "name") {
n.error("Expected a new variable name.");
}
scope.define(n);
advance();
if (token.id === "=") {
t = token;
advance("=");
t.first = n;
t.second = expression(0);
t.arity = "binary";
a.push(t);
}
if (token.id !== ",") {
break;
}
advance(",");
}
advance(";");
return a.length === 0 ? null : a.length === 1 ? a[0] : a;
});
stmt "var" $ \this -> do
let vars = do
n <- gets token
when (arity n /= Name) $
error n "Expected a new variable name."
define n
t <- advance
a <- if id t == "=" then do
advanceIf "="
s <- expression 0
let t' = t { first = [n],
second = [s],
arity = Binary,
isAssignment = True }
return [t']
else return []
t <- gets token
if id t /= ","
then return a
else do
advanceIf ","
v <- vars
return $ a++v
a <- vars
advanceIf ";"
return a
stmt("while", function () {
advance("(");
this.first = expression(0);
advance(")");
this.second = block();
this.arity = "statement";
return this;
});
stmt "while" $ \this -> do
advanceIf "("
f <- expression 0
advanceIf ")"
s <- block
return [this { first = [f],
second = s,
arity = Statement }]
stmt("if", function () {
advance("(");
this.first = expression(0);
advance(")");
this.second = block();
if (token.id === "else") {
scope.reserve(token);
advance("else");
this.third = token.id === "if" ? statement() : block();
} else {
this.third = null;
}
this.arity = "statement";
return this;
});
stmt "if" $ \this -> do
advanceIf "("
test <- expression 0
advanceIf ")"
body <- block
token <- gets token
els <- if id token == "else" then do
reserve token
token <- advanceIf "else"
if id token == "if" then statement else block
else return []
return [this { first = [test],
second = body,
third = els,
arity = Statement }]
stmt("break", function () {
advance(";");
if (token.id !== "}") {
token.error("Unreachable statement.");
}
this.arity = "statement";
return this;
});
stmt "break" $ \this -> do
t <- advanceIf ";"
when (id t /= "}") $
error t "Unreachable statement."
return [this { arity = Statement }]
stmt("return", function () {
if (token.id !== ";") {
this.first = expression(0);
}
advance(";");
if (token.id !== "}") {
token.error("Unreachable statement.");
}
this.arity = "statement";
return this;
});
stmt "return" $ \this -> do
t <- gets token
first <- if id t /= ";" then do
e <- expression 0
return [e]
else return []
t <- advanceIf ";"
when (id t /= "}") $
error t "Unreachable statement."
return [this { first = first,
arity = Statement }]
prefix("function", function () {
var a = [];
new_scope();
if (token.arity === "name") {
scope.define(token);
this.name = token.value;
advance();
}
advance("(");
if (token.id !== ")") {
while (true) {
if (token.arity !== "name") {
token.error("Expected a parameter name.");
}
scope.define(token);
a.push(token);
advance();
if (token.id !== ",") {
break;
}
advance(",");
}
}
this.first = a;
advance(")");
advance("{");
this.second = statements();
advance("}");
this.arity = "function";
scope.pop();
return this;
});
prephix "function" $ \this -> do
new_scope
t <- gets token
n <- if arity t == Name then do
define t
advance
return $ Just $ value t
else return Nothing
t <- advanceIf "("
a <- if id t /= ")" then
let params = do
t <- gets token
when (arity t /= Name) $
error t "Expected a parameter name."
define t
token <- advance
if id token /= ","
then return [t]
else do
advanceIf ","
p <- params
return $ t:p
in params
else return []
advanceIf ")"
advanceIf "{"
s <- statements
advanceIf "}"
pop
return this { first = a,
second = s,
arity = Function { name = n } }
In Haskell the non-variable a can't be used until
after it's been populated.
infix("(", 80, function (left) {
var a = [];
if (left.id === "." || left.id === "[") {
this.arity = "ternary";
this.first = left.first;
this.second = left.second;
this.third = a;
} else {
this.arity = "binary";
this.first = left;
this.second = a;
if ((left.arity !== "unary" || left.id !== "function") &&
left.arity !== "name" &&
left.id !== "(" &&
left.id !== "&&" &&
left.id !== "||" &&
left.id !== "?") {
left.error("Expected a variable name.");
}
}
if (token.id !== ")") {
while (true) {
a.push(expression(0));
if (token.id !== ",") {
break;
}
advance(",");
}
}
advance(")");
return this;
});
inphix "(" 80 $ \this left -> do
t <- gets token
a <- if id t /= ")" then
let vars = do
e <- expression 0
token <- gets token
if id token /= ","
then return [e]
else do
advanceIf ","
v <- vars
return $ e:v
in vars
else return []
let this' = if id left == "." || id left == "["
then this { first = first left,
second = second left,
third = a,
arity = Ternary }
else if (arity left /= Unary || id left /= "function") &&
arity left /= Name && id left /= "(" &&
id left /= "&&" && id left /= "||" && id left /= "?"
then error left "Expected a variable name."
else this { first = [left],
second = a,
arity = Binary }
advanceIf ")"
return this'
symbol("this").nud = function () {
scope.reserve(this);
this.arity = "this";
return this;
};
symbol1 "this" $ Nud $ \this -> do
reserve this
return this { arity = This }
prefix("[", function () {
var a = [];
if (token.id !== "]") {
while (true) {
a.push(expression(0));
if (token.id !== ",") {
break;
}
advance(",");
}
}
advance("]");
this.first = a;
this.arity = "unary";
return this;
});
prephix "[" $ \this -> do
t <- gets token
a <- if id t /= "]" then
let entries = do
v <- expression 0
token <- gets token
if id token /= ","
then return [v]
else do
advanceIf ","
e <- entries
return $ v:e
in entries
else return []
advanceIf "]"
return this { first = a,
arity = Unary }
prefix("{", function () {
var a = [];
if (token.id !== "}") {
while (true) {
var n = token;
if (n.arity !== "name" && n.arity !== "literal") {
token.error("Bad key.");
}
advance();
advance(":");
var v = expression(0);
v.key = n.value;
a.push(v);
if (token.id !== ",") {
break;
}
advance(",");
}
}
advance("}");
this.first = a;
this.arity = "unary";
return this;
});
prephix "{" $ \this -> do
t <- gets token
a <- if id t /= "}" then
let entries = do
n <- gets token
when (arity n /= Name && arity n /= Literal) $
error n "Bad property name."
advance
advanceIf ":"
v <- expression 0
let v' = v { key = Just $ value n }
token <- gets token
if id token /= ","
then return [v']
else do
advanceIf ","
e <- entries
return $ v':e
in entries
else return []
advanceIf "}"
return this { first = a,
arity = Unary }
Finally the initial_symbol_table is used along with
the tokeniser.
return function (source) {
tokens = source.tokens('=<>!+-*&|/%^', '=<>&|');
token_nr = 0;
new_scope();
advance();
var s = statements();
advance("(end)");
scope.pop();
return s;
};
parse source =
evalState ( do
new_scope
advance
s <- statements
advanceIf "(end)"
return s
) Env { tokens = tokenise source,
scope = TopScope,
token = original_symbol { id = "(start)" },
symbol_table = initial_symbol_table }
The Simplified JavaScript (SJ) and Statefull Simplified Haskell (SSH) versions are very similar in complexity and structure. Because full mutable state is not 'native' to Haskell it's occasionally a little more verbose. On the other hand, pattern matching allows better separation of code in some parts and easy access to the internals of (non-environmental) parameters.
Whether the beauty of the SJ parser in SJ rubs off on the SJ parser in SSH is debatable. Part is due to keeping to a SJ coding style so that common Haskell idioms, especially the use of higher-order functions over simple recursion, are not used.
This code makes as much use of GHC's standard libraries, particularly the (Haskell) lexer. Here is the entire thing.
module Tokeniser where
import Text.Read (lex)
import Data.Char (isAlpha, isNumber)
data Token = Token TokenType String
deriving Show
data TokenType = NameType | StringType | NumberType | OperatorType
deriving Show
tokenise = tokens . head . lex
where
tokens (t, "") = [token t]
tokens (t, s) = token t : tokenise s
token t@(c:_) = Token tokenType text
where
tokenType | isAlpha c = NameType
| isNumber c = NumberType
| '"' == c = StringType
| otherwise = OperatorType
text | c == '"' = drop 1 $ take (length t-1) t
| otherwise = t
token _ = Token OperatorType "(end)"
This also makes very good use of the standard pretty printing library. Yes, this does produce output very similar to that shown in the JavaScript demo.
module PrettyPrint where
import Text.PrettyPrint
import TopDownParserState
pp = ppList ""
ppList l [] = empty
ppList l (s:[]) = ppSymbol l s
ppList l s = bracket l $
vcat $ map (ppSymbol "") s
ppSymbol l Symbol {key = k, value = v, arity = a,
first = f, second = s, third = t } =
brace l $
ppMaybe "key: " k $$
ppValue v $$
ppArity a $$
ppList "first: " f $$
ppList "second: " s $$
ppList "third: " t
ppMaybe l (Just k) = text l <> (text $ show k)
ppMaybe l Nothing = empty
ppValue v = text "value: " <> (text $ show v)
ppArity a@Function {name = n} =
text "arity: Function" $$
ppMaybe "name: " n
ppArity a = text "arity: " <> (text $ show a)
bracket l s = (text l <> lbrack) $+$ indent s $$ rbrack
brace l s = (text l <> lbrace) $+$ indent s $$ rbrace
indent = nest 4
Create a free website with Weebly