Formal grammars are often presented as Context-Free Grammars. To describe the grammar of SimpleTalk we use EBNF. Here is a discussion of why this is suitable.
The main thesis here is that
We present the conclusion first, and then develop the argument for it.
A simple example where associativity is important is subtraction. The language and grammar are:
It is crucial that this grammar is not recursive — that is, the left-hand side Expression does not appear on the right. This is what makes the language conveniently parsable with iteration.
This form is easily mapped to the parser, whose structure for building an Expression node is iterative rather than recursive, viz.:
Note carefully line 6: we are replacing the expression node e1 with a new value that contains its old value as the left-hand branch of a binary expression. This left-substitution leads to the desired left-associativity.
To see this, consider the sequence of values for e1 and e2 as the function parses the expression “1 − 2 − 3”, numbered by the line numbers in the above function:
You might hope that you could reverse the associativity by reversing the arguments to the BinaryNode() function in line 6, but this would also reverse the order of the operations, yielding a tree for 3 − (2 − 1).
It is harder to get right-associativity from a left-to-right iteration, but here is one solution:
Where we do want right-associativity for SimpleTalk, we do in fact write the grammar recursively. For example, the list item selection expression was originally in the BNF form
In this form E10 appeared only on the left-hand side. This production was originally parsed using the above stack-style iteration. But it proved easier to write as the right-recursive CFG
Note the difference: E10 appears on both sides of the rule. The SimpleTalk parser now uses this structure to parse the item construct:
CFGs are commonly used to describe formal languages, particularly when the generation of parsing is to be automated. This helps ensure that the language recognized by the parser is the one intended by the designer. The SimpleTalk compiler is (currently) written from scratch, and comprises about 2000 lines of JavaScript. (And about 1200 lines of Objective-C in the OSX/iOS version.) We have used EBNF, which can map directly to a less-recursive parser and provides left-associativity in the bargain.
To explore this, we start from the most basic example and work forward. Here are the languages, and their grammars that we will discuss. They are presented up-front, and in parallel for easier direct comparison. Following each grammar is a discussion.
The direct application of top-down, recursive descent to parsing the left-recursive production would be:
which is head-recursive and will never finish for any non-empty string of as. In other words, left-recursive grammars are better suited to bottom-up parsing.
Transforming this grammar into right-recursion is simply a matter of reversing the first production, as shown above. That results in a grammar that translates into this parser:
Conclusion: right-recursive grammars are better suited for top-down parsing. (If a grammar has no recursion, its productions are too limited for a general programming language.) However, the right-recursive grammar leads to right-associativity, as the next example will show.
Now consider the expression aaa. Its left-recursive derivation and parse tree are shown above: the tree implements left-associativity.
By contrast, using the right-recursive grammar for the same production yields right-associative tree. This grammar leads directly to a recursive top-down parse algorithm:
More to the point, we would construct a parse tree like so:
As the tree shows, right-recursion leads to right-associativity. You might think that you could reverse the associativity by reversing the arguments to the BinaryNode() function, but this would in fact reverse the order of the operations.
To sum up: a CFG comes not only with a language it generates, but a derivation for each production that is unique (for unambiguous languages) and is isomorphic to a parse tree that is biased to the left or right side. And although we may be indifferent to what the derivation is, we care about the shape of the parse tree, because it determines associativity.


Here we see the advantages and shortcomings of each approach. The left-recursive grammar leads to a left-associative tree. But it cannot be implemented directly with top-down recursion. Moreover, examine its derivation: it proceeds backwards from the last number to the first. This is one reason that LR grammars and LR parsers can be hard to understand. (See shift-reduce conflicts.)
By contrast, the right-recursive grammar leads directly to a top-down recursive implementation...that has the wrong associativity for subtraction.
In view of the foregoing, using EBNF grammar and its corresponding iterative parsing implementation provide the best of both worlds.