Arithmetic calculator
We proceed to compare how an interpreter for arithmetic expressions can be implemented using conventional tools and how it can be implemented using the model-driven approach using ModelCC. Albeit the following calculator example is necessarily simplistic, it already provides some hints on the potential benefits model-driven language specification can bring to more challenging endeavors.
First, we will outline the features we wish to include in our simple arithmetic expression language. Later, we will describe how an interpreter for this language is built using two of the most established tools in use by language designers: lex & yacc on the one hand, ANTLR on the other. Finally, we will implement the same language processor using ModelCC by defining an abstract syntax model. This ASM will be annotated to specify the required ASM-CSM mapping and it will also include the necessary logic for evaluating arithmetic expressions. This example will let us compare ModelCC against conventional parser generators and it will be used for discussing the potential advantages provided by our model-driven language specification approach.
Language Specification
Our calculator will employ classical arithmetic expressions in infix notation. The language will feature the following capabilities:
Unary operators: +, and -.
Binary operators: +, -, *, and /, being - and / left-associative.
Operator priorities: * and / precede + and -.
Expression groups (i.e. parenthesized expressions).
Integer and floating-point literals.
Example input and output:
Input: (3+2)*2
Output: 10.0
Input: 3+2*2
Output: 7.0
Input: 2*3+2
Output: 8.0
Input: (2*2)+2
Output: 6.0
Input: 10*5/5
Output: 10.0
Input: 6*2-3*3
Output: 3.0
Input: 1+2+3+4+5+6+7+8+9*10
Output: 126.0
Input: 10*4+5-22+(0.6*52-22)
Output: 32.2
Conventional Implementation
Using conventional tools, the language designer would start by specifying the grammar defining the simple arithmetic expression language in a BNF-like notation. The following BNF grammar meets the requirements of our simple calculator, albeit it is not yet suitable for being used with existing parser generators, since they impose specific constraints on the format of the grammar depending on the parsing algorithms they employ.
<Expression> ::= <ExpressionGroup> | <BinaryExpression> | <UnaryExpression> | <LiteralExpression>
<ExpressionGroup> ::= '(' <Expression> ')'
<BinaryExpression> ::= <Expression> <BinaryOperator> <Expression>
<UnaryExpression> ::= <UnaryOperator> <Expression>
<LiteralExpression> ::= <RealLiteral> | <IntegerLiteral>
<BinaryOperator> ::= '+' | '-' | '/' | '*'
<UnaryOperator> ::= '+' | '-'
<RealLiteral> ::= <IntegerLiteral> '.' | <IntegerLiteral> '.' <IntegerLiteral>
<IntegerLiteral> ::= <Digit> <IntegerLiteral> | <Digit>
<Digit> ::= '0' | '1' |'2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
lex & yacc Implementation
When using lex & yacc, the language designer converts the BNF grammar into a grammar suitable for LR parsing. A suitable lex & yacc implementation defining the arithmetic expression grammar is the following.
// lex part of the grammar specification
%{
#include "y.tab.h"
extern YYSTYPE yylval;
%}
%%
[0-9]+\.[0.9]* return RealLiteral;
[0-9]+ return IntegerLiteral;
\+|\- return UnaryOrPriority2BinaryOperator;
\/|\* return Priority1BinaryOperator;
\( return LeftParenthesis;
\) return RightParenthesis;
. ;
%%
// yacc part of the grammar specification
%left UnaryOrPriority2BinaryOperator
%left Priority1BinaryOperator
%token IntegerLiteral RealLiteral LeftParenthesis RightParenthesis
%start Expression
%%
Expression : RealLiteral
| IntegerLiteral
| LeftParenthesis Expression RightParenthesis
| UnaryOrPriority2BinaryOperator Expression
| Expression UnaryOrPriority2BinaryOperator Expression
| Expression Priority1BinaryOperator Expression
;
%%
#include "lex.yy.c"
int main(int argc,char *argv[]) { yyparse(); }
int yyerror(char *s) { printf("%s",s); }
Since lex does not support lexical ambiguities, the UnaryOperator and BinaryOperator nonterminals from the BNF grammar have to be refactored in order to avoid the ambiguities introduced by the use of + and - both as unary and binary operators. A typical solution consists of creating the UnaryOrPriority2BinaryOperator token type for representing them and then adjusting the grammar accordingly. This token will act as an UnaryOperator in UnaryExpressions, and as a BinaryOperator in BinaryExpressions.
A similar solution is necessary for distinguishing different operator priorities, hence different token types are defined for each precedence level in the language, even though they perform the same role from a conceptual point of view. The order in which they are declared in the yacc specification determines their relative priorities (please, note that these declarations are also employed to define operator associativity).
Unfortunately, the requirement to resolve ambiguities by refactoring the grammar defining the language involves the introduction of a certain degree of duplication in the language specification: separate token types in the lexer and multiple parallel production rules in the parser.
Once all ambiguities have been resolved, the language designer completes the lex & yacc introducing semantic actions to perform the necessary operations. In this case, albeit somewhat verbose in C syntax, the implementation of an arithmetic expression evaluator is relatively straightforward using the yacc $ notation, as shown in the code below. In our particular implementation of the simple arithmetic expression language interpreter, carriage returns are employed to output results, hence our use of the ancillary Line token type and LineReturn nonterminal symbol.
// lex part of the implementation
%{
#include "string.h"
#include "y.tab.h"
extern YYSTYPE yylval;
%}
%%
[0-9]+\.[0.9]* { yylval.value = atof(yytext); return RealLiteral; }
[0-9]+ { yylval.value = (double)atoi(yytext); return IntegerLiteral; }
\+|\- { if (yytext[0] == '+') yylval.operator = PLUSORADDITION;
else /*yytext[0] == '-'*/ yylval.operator = MINUSORSUBTRACTION;
return UnaryOrPriority2BinaryOperator; }
\/|\* { if (yytext[0] == '*') yylval.operator = MULTIPLICATION;
else /*yytext[0] == '/'*/ yylval.operator = DIVISION;
return Priority1BinaryOperator; }
\( { return LeftParenthesis; }
\) { return RightParenthesis; }
\n { return LineReturn; }
. ;
%%
// yacc part of the implementation
%left UnaryOrPriority2BinaryOperator
%left Priority1BinaryOperator
%token IntegerLiteral RealLiteral LeftParenthesis RightParenthesis
%start Line
%{
#include <stdio.h>
#define YYSTYPE attributes
typedef enum { PLUSORADDITION, MINUSORSUBTRACTION, MULTIPLICATION, DIVISION } optype;
typedef struct {
optype operator;
double value;
} attributes;
%}
%%
Expression : RealLiteral { $$.value = $1.value; }
| IntegerLiteral { $$.value = $1.value; }
| LeftParenthesis Expression RightParenthesis { $$.value = $2.value; }
| UnaryOrPriority2BinaryOperator Expression
{ if ($1.operator == PLUSORADDITION) $$.value = $2.value;
else /*$1.operator == MINUSORSUBTRACTION*/ $$.value = -$2.value; }
| Expression UnaryOrPriority2BinaryOperator Expression
{ if ($2.operator == PLUSORADDITION) $$.value = $1.value+$3.value;
else /*$2.operator == MINUSORSUBTRACTION*/ $$.value = $1.value-$3.value; }
| Expression Priority1BinaryOperator Expression
{ if ($2.operator == MULTIPLICATION) $$.value = $1.value*$3.value;
else /*$2.operator == DIVISION*/ $$.value = $1.value/$3.value; } ;
Line : Expression LineReturn { printf("%f\n",$1.value); } ;
%%
#include "lex.yy.c"
int main(int argc,char *argv[]) { yyparse(); }
int yyerror(char *s) { printf("%s",s); }
ANTLR Implementation
When using ANTLR, the language designer converts the BNF grammar into a grammar suitable for LL parsing. An ANTLR specification of our arithmetic expression language is the following.
grammar ExpressionEvaluator;
expression1 : expression2 ( '+' expression1 | '-' expression1 )* ;
expression2 : expression3 ( '*' expression2 | '/' expression2 )* ;
expression3 : '(' expression1 ')'
| '+' expression1
| '-' expression1
| INTEGERLITERAL
| FLOATLITERAL
;
INTEGERLITERAL : '0'..'9'+ ;
FLOATLITERAL : ('0'..'9')+ '.' ('0'..'9')* ;
NEWLINE : '\r'? '\n' ;
Since ANTLR provides no mechanism for the declarative specification of token precedences, such precedences have to be incorporated into the grammar. The usual solution involves the creation of different nonterminal symbols in the grammar, so that productions corresponding to the same precedence levels are grouped together. The productions with expression1 and expression2 in their left-hand side were introduced with this purpose in our simple arithmetic expression grammar.
Likewise, since ANTLR generates a LL(*) parser, which does not support left-recursion, left-recursive grammar productions in the BNF grammar have to be refactored. In our example, a simple solution involves the introduction of the expression3 nonterminal, which in conjunction with the aforementioned expression1 and expression2 nonterminals, eliminates left-recursion from our grammar.
Once the grammar is adjusted to satisfy the constraints imposed by the ANTLR parser generator, the language designer can define the semantic actions needed to implement our arithmetic expression interpreter. The resulting ANTLR implementation is shown in the code below. The streamlined syntax of the scannerless ANTLR parser generator makes this implementation significantly more concise than the equivalent lex & yacc implementation. However, the constraints imposed by the underlying parsing algorithm forces explicit changes on the language grammar.
grammar ExpressionEvaluator;
expression1 returns [double value]
: e=expression2 {$value = $e.value;}
( '+' e2=expression1 {$value += $e2.value;}
| '-' e2=expression1 {$value -= $e2.value;}
)* ;
expression2 returns [double value]
: e=expression3 {$value = $e.value;}
( '*' e2=expression2 {$value *= $e2.value;}
| '/' e2=expression2 {$value -= $e2.value;}
)* ;
expression3 returns [double value]
: '(' e=expression1 ')' {$value = $e.value;}
| '+' e=expression1 {$value = $e.value;}
| '-' e=expression1 {$value = -$e.value;}
| i=INTEGERLITERAL
{$value = (double)Integer.parseInt($i.text);}
| f=FLOATLITERAL
{$value = Double.parseDouble($f.text);} ;
INTEGERLITERAL : '0'..'9'+ ;
FLOATLITERAL : ('0'..'9')+ '.' ('0'..'9')* ;
NEWLINE : '\r'? '\n' ;
ModelCC Implementation
When following a model-based language specification approach, the language designer starts by elaborating an abstract syntax model, which will later be mapped to a concrete syntax model by imposing constraints on the abstract syntax model. These constraints can also be specified as metadata annotations on the abstract syntax model and the resulting annotated model can be processed by automated tools, such as ModelCC, to generate the corresponding lexers and parsers.
Annotated models can be represented graphically, as the following UML diagram.
Annotated models can also be implemented using conventional programming languages, as the following Java implementation.
public abstract class Expression implements IModel {
public abstract double eval();
}
@Prefix("\\(")
@Suffix("\\)")
public class ExpressionGroup extends Expression implements IModel {
Expression e;
@Override public double eval() { return e.eval(); }
}
public abstract class LiteralExpression extends Expression implements IModel {
}
public class UnaryExpression extends Expression implements IModel {
UnaryOperator op;
Expression e;
@Override public double eval() { return op.eval(e); }
}
public class BinaryExpression extends Expression implements IModel {
Expression e1;
BinaryOperator op;
Expression e2;
@Override public double eval() { return op.eval(e1,e2); }
}
public class IntegerLiteral extends LiteralExpression implements IModel {
@Value int value;
@Override public double eval() { return (double)value; }
}
public class RealLiteral extends LiteralExpression implements IModel {
@Value double value;
@Override public double eval() { return value; }
}
public abstract class UnaryOperator implements IModel {
public abstract double eval(Expression e);
}
@Pattern(regExp="\\+")
public class PlusOperator extends UnaryOperator implements IModel {
@Override public double eval(Expression e) { return e.eval(); }
}
@Pattern(regExp="-")
public class MinusOperator extends UnaryOperator implements IModel {
@Override public double eval(Expression e) { return -e.eval(); }
}
@Associativity(AssociativityType.LEFT_TO_RIGHT)
public abstract class BinaryOperator implements IModel {
public abstract double eval(Expression e1,Expression e2);
}
@Priority(value=2)
@Pattern(regExp="\\+")
public class AdditionOperator extends BinaryOperator implements IModel {
@Override public double eval(Expression e1,Expression e2) { return e1.eval()+e2.eval(); }
}
@Priority(value=2)
@Pattern(regExp="-")
public class SubtractionOperator extends BinaryOperator implements IModel {
@Override public double eval(Expression e1,Expression e2) { return e1.eval()-e2.eval(); }
}
@Priority(value=1)
@Pattern(regExp="\\*")
public class MultiplicationOperator extends BinaryOperator implements IModel {
@Override public double eval(Expression e1,Expression e2) { return e1.eval()*e2.eval(); }
}
@Priority(value=1)
@Pattern(regExp="\\/")
public class DivisionOperator extends BinaryOperator implements IModel {
@Override public double eval(Expression e1,Expression e2) { return e1.eval()/e2.eval(); }
}
The actual code needed to generate and invoke the parser in ModelCC is the following.
ModelReader reader = new JavaModelReader(Expression.class); // Prepare the Java model reader
Model model = reader.read(); // Read the model
Parser<Expression> parser = ParserGenerator.create(model); // Generate the parser
Expression expression = parser.parse("10/(2+3)*0.5+1"); // Parse the string and
// instantiate the corresponding expression
double value = expression.eval(); // Evaluate the expression
Since the abstract syntax model in ModelCC is not constrained by the vagaries of particular parsing algorithms, the language design process can be focused on its conceptual design, without having to introduce artifacts in the design just to satisfy the demands of particular tools.