We proceed to compare how an interpreter for arithmetic expressions can be implemented using conventional tools and how it can be implemented using the model-driven approach using ModelCC. Albeit the following calculator example is necessarily simplistic, it already provides some hints on the potential benefits model-driven language specification can bring to more challenging endeavors.
First, we will outline the features we wish to include in our simple arithmetic expression language. Later, we will describe how an interpreter for this language is built using two of the most established tools in use by language designers: lex & yacc on the one hand, ANTLR on the other. Finally, we will implement the same language processor using ModelCC by defining an abstract syntax model. This ASM will be annotated to specify the required ASM-CSM mapping and it will also include the necessary logic for evaluating arithmetic expressions. This example will let us compare ModelCC against conventional parser generators and it will be used for discussing the potential advantages provided by our model-driven language specification approach.
Our calculator will employ classical arithmetic expressions in infix notation. The language will feature the following capabilities:
Unary operators: +, and -.
Binary operators: +, -, *, and /, being - and / left-associative.
Operator priorities: * and / precede + and -.
Expression groups (i.e. parenthesized expressions).
Integer and floating-point literals.
Example input and output:
Using conventional tools, the language designer would start by specifying the grammar defining the simple arithmetic expression language in a BNF-like notation. The following BNF grammar meets the requirements of our simple calculator, albeit it is not yet suitable for being used with existing parser generators, since they impose specific constraints on the format of the grammar depending on the parsing algorithms they employ.
lex & yacc Implementation
When using lex & yacc, the language designer converts the BNF grammar into a grammar suitable for LR parsing. A suitable lex & yacc implementation defining the arithmetic expression grammar is the following.
// lex part of the grammar specification
extern YYSTYPE yylval;
[0-9]+\.[0.9]* return RealLiteral;
[0-9]+ return IntegerLiteral;
\+|\- return UnaryOrPriority2BinaryOperator;
\/|\* return Priority1BinaryOperator;
\( return LeftParenthesis;
\) return RightParenthesis;
Since lex does not support lexical ambiguities, the UnaryOperator and BinaryOperator nonterminals from the BNF grammar have to be refactored in order to avoid the ambiguities introduced by the use of + and - both as unary and binary operators. A typical solution consists of creating the UnaryOrPriority2BinaryOperator token type for representing them and then adjusting the grammar accordingly. This token will act as an UnaryOperator in UnaryExpressions, and as a BinaryOperator in BinaryExpressions.
A similar solution is necessary for distinguishing different operator priorities, hence different token types are defined for each precedence level in the language, even though they perform the same role from a conceptual point of view. The order in which they are declared in the yacc specification determines their relative priorities (please, note that these declarations are also employed to define operator associativity).
Unfortunately, the requirement to resolve ambiguities by refactoring the grammar defining the language involves the introduction of a certain degree of duplication in the language specification: separate token types in the lexer and multiple parallel production rules in the parser.
Once all ambiguities have been resolved, the language designer completes the lex & yacc introducing semantic actions to perform the necessary operations. In this case, albeit somewhat verbose in C syntax, the implementation of an arithmetic expression evaluator is relatively straightforward using the yacc $ notation, as shown in the code below. In our particular implementation of the simple arithmetic expression language interpreter, carriage returns are employed to output results, hence our use of the ancillary Line token type and LineReturn nonterminal symbol.
// lex part of the implementation
// yacc part of the implementation
When using ANTLR, the language designer converts the BNF grammar into a grammar suitable for LL parsing. An ANTLR specification of our arithmetic expression language is the following.
Since ANTLR provides no mechanism for the declarative specification of token precedences, such precedences have to be incorporated into the grammar. The usual solution involves the creation of different nonterminal symbols in the grammar, so that productions corresponding to the same precedence levels are grouped together. The productions with expression1 and expression2 in their left-hand side were introduced with this purpose in our simple arithmetic expression grammar.
Likewise, since ANTLR generates a LL(*) parser, which does not support left-recursion, left-recursive grammar productions in the BNF grammar have to be refactored. In our example, a simple solution involves the introduction of the expression3 nonterminal, which in conjunction with the aforementioned expression1 and expression2 nonterminals, eliminates left-recursion from our grammar.
Once the grammar is adjusted to satisfy the constraints imposed by the ANTLR parser generator, the language designer can define the semantic actions needed to implement our arithmetic expression interpreter. The resulting ANTLR implementation is shown in the code below. The streamlined syntax of the scannerless ANTLR parser generator makes this implementation significantly more concise than the equivalent lex & yacc implementation. However, the constraints imposed by the underlying parsing algorithm forces explicit changes on the language grammar.
When following a model-based language specification approach, the language designer starts by elaborating an abstract syntax model, which will later be mapped to a concrete syntax model by imposing constraints on the abstract syntax model. These constraints can also be specified as metadata annotations on the abstract syntax model and the resulting annotated model can be processed by automated tools, such as ModelCC, to generate the corresponding lexers and parsers.
Annotated models can be represented graphically, as the following UML diagram.
Annotated models can also be implemented using conventional programming languages, as the following Java implementation.
The actual code needed to generate and invoke the parser in ModelCC is the following.
Since the abstract syntax model in ModelCC is not constrained by the vagaries of particular parsing algorithms, the language design process can be focused on its conceptual design, without having to introduce artifacts in the design just to satisfy the demands of particular tools.