29th March 2019
ANTLR4 and expression parsing (Part 2)
Introduction
If you have not read part 1 of this blog post, you can do so here:
Using ANTLR to parse and calculate expressions (Part 1)
In the previous post, we talked about creating a custom ANTLR4 grammar. Our grammar can parse formulas with basic mathematical operators and it defines some custom expressions that allow us to implement domain logic which, combined with the formula parsing, allow us to calculate domain-specific concepts like prices, amounts and conversion factors between units of measure.
In this post, we’ll get a decimal result out of some simple expressions and write some tests to make sure our initial calculator is working.
In our grammar, we have two operators that have widespread use in probably every trading and banking system in the world. They are:
- UomConvert(FromUnitOfMeasure, ToUnitOfMeasure)
- FXRate(‘CurrencyPair’)
The first one converts between units of measure. The second one is a simplified FX rate. We can interpret it as an FX Spot Rate between two currencies. While FX Spot Rates are meaningless without a date component, for the purposes of this post, we will keep it simple and always assume ‘latest known FX Rate’.
Our first step is to get a basic calculator working. To do that we’ll need to be able to handle the different inputs in the following table.
Input |
Result |
Comments |
2 | 2 | Number identity should work. |
2 * 2 | 4 | Standard math operators (addition, multiplication, subtraction and division) |
(2 + 2) * 4 | 16 | Parenthesis priority should be respected. |
Extending the base visitor
In our previous post we did all the plumbing necessary to generate some code that we can work with. This generated code is added automatically into our project thanks to the new C# project file (*.csproj) format.
As a result of the plumbing work, we’re telling the ANTLR runtime to generate a visitor. This visitor gives us some classes we can inherit from, and add some logic to, to get the desired result. The generated code will give us an expression tree, as well as the necessary tools to perform custom actions each time we visit a node in the tree. Nodes can be a recognized character, an operator, or an expression.
Our main goal is to calculate the result of our expression. To do this, we first need to inherit from the MyGrammarBaseVisitor class, which has a generic type parameter that defines our expression’s result type.
To start, the visitor that will inherit from our base class will—for now—override the basic mathematical operations that we defined in our grammar file. These are ‘Number’, ‘Parens’ (parentheses), and the operators ‘ADD’, ‘SUB’, ‘MUL’, and ‘DIV’, using decimal as a return type.
public class MyGrammarVisitor : MyGrammarBaseVisitor<decimal> { public override decimal VisitParens(MyGrammarParser.ParensContext context) { return base.VisitParens(context); } public override decimal VisitNum(MyGrammarParser.NumContext context) { return base.VisitNum(context); } public override decimal VisitAddSub(MyGrammarParser.AddSubContext context) { return base.VisitAddSub(context); } public override decimal VisitMulDiv(MyGrammarParser.MulDivContext context) { return base.VisitMulDiv(context); } }
We override the operations defined in our grammar labelled with the # symbol (these are for convenience as they let us know which method names we need to override). The overridden methods cover all the operations we need to implement the calculator.
Context
Each overridden method is handed a context variable whose type is specific to the method we’re overriding. This is extremely helpful because within the method we define what steps need taking via code, and which data to perform the steps on is provided by the parameter.
It’s time to implement the logic for each of these methods. But how? Let’s go back to our grammar definition to remind ourselves:
expr: expr op=(MUL | DIV) expr #mulDiv | expr op=(ADD | SUB) expr #addSub | number #num | '(' expr ')' #parens | fxRateFunc #fxRate //let’s ignore these last two for now. | uomConvertFunc #uomFactor ;
Let’s start simple, we’ll first implement VisitNum. Because #num is just ‘number’ all we need to do is get the text content of our context parameter and parse it to decimal.
public override decimal VisitNum(MyGrammarParser.NumContext context) { return decimal.Parse(context.GetText()); }
Easy peasy…let’s move on to VisitParens.
In the grammar, parens is defined as such:
| '(' expr ')' #parens
We have to visit the expression contained within the parenthesis, so we simply call Visit on it:
public override decimal VisitParens(MyGrammarParser.ParensContext context) { return Visit(context.expr()); }
Note above how expr() is a method. This is because we’ve recursively defined it: we need to treat it as a new expression, which is a tree itself, so we have to make sure we’re traversing the entire tree. The method will then be dispatched into whatever Visit override is appropriate.
The last two operators are similar, so we’ll do them at the same time.
expr: expr op=(MUL | DIV) expr #mulDiv | expr op=(ADD | SUB) expr #addSub
As we can see from the grammar, a math operator has two expressions, one on each side. We need to first visit both the left and right sides to get our operands. Once we have these two values, we check what operator we’re using and apply the proper calculation.
public override decimal VisitAddSub(MyGrammarParser.AddSubContext context) { // The result of this should always be the operands var left = Visit(context.expr(0)); var right = Visit(context.expr(1)); return context.op.Type == MyGrammarParser.ADD ? left + right : left - right; } public override decimal VisitMulDiv(MyGrammarParser.MulDivContext context) { // The result of this should always be the operands var left = Visit(context.expr(0)); var right = Visit(context.expr(1)); return context.op.Type == MyGrammarParser.MUL ? left * right : left / right; }
Note that access to the left– and right-hand sides are indexed-based, where 0 is left, and 1 is right. It’s also worth noting how we need to check the operation’s Type in order to know what operation we want to perform. This is all part of the generated code, and you can just browse it in your IDE.
Let’s parse a formula!
Now we need something that goes from an input string to a parsed visitor and ultimately a decimal. For that, we need to construct an AntlrInputStream object, a lexer, and a TokenStream. This is all part of the plumbing that needs to happen before we can create an actual MyGrammarParser and send that expression tree through to our visitor. I’ve created a class called MyGrammarExpressionEvaluator to do this part. The code is as follows:
public static class MyGrammarExpressionEvaluator { public static decimal Eval(string inputExpression) { using (var stringReader = new StringReader(inputExpression)) { var inputStream = new AntlrInputStream(stringReader); var lexer = new MyGrammarLexer(inputStream); var tokens = new CommonTokenStream(lexer); var parser = new MyGrammarParser(tokens); var expressionTree = parser.expr();// lets parse the whole thing var visitor = new MyGrammarVisitor(); // now that we have a "parsed expression" var result = visitor.Visit(expressionTree); // let's visit all the nodes. return result; } } }
Now we can write some unit tests that will ensure we’re on track. I’ve used NUnit, with the TestCase attribute to reuse code and make sure that the basic math operators are doing what we expect them to do.
Also note that we’re testing expressions that contain parentheses:
[TestCase("2+2", 4)] [TestCase("2*2", 4)] [TestCase("2-2", 0)] [TestCase("2/2", 1)] [TestCase("(2+2)", 4)] [TestCase("(2+2)*2", 8)] [TestCase("(2+2)/2", 2)] public void BasicMathsWork(string numberExpression, decimal expectedResult) { var result = MyGrammarExpressionEvaluator.Eval(numberExpression); Assert.That(result, Is.EqualTo(expectedResult)); }
So far, so good; we’ve got basic calculation working! Good times! No, really, now the plot thickens. I’ve purposefully only implemented these operations, as the other two supported expressions—fxRateFunc and uomConvertFunc—require some more work before we can use them as input. See Part 3 (coming soon!) for the details on UomConvert!
Written by an Adaptive Consultant.