ANTLR4 and expression parsing (Part 2)

Introduction

If you have not read part 1 of this blog post, you can do so here:

Using ANTLR to parse and calculate expressions (Part 1)

In the previous post, we talked about creating a custom ANTLR4 grammar. Our grammar can parse formulas with basic mathematical operators and it defines some custom expressions that allow us to implement domain logic which, combined with the formula parsing, allow us to calculate domain-specific concepts like prices, amounts and conversion factors between units of measure.

In this post, we’ll get a decimal result out of some simple expressions and write some tests to make sure our initial calculator is working.

In our grammar, we have two operators that have widespread use in probably every trading and banking system in the world. They are:

  • UomConvert(FromUnitOfMeasure, ToUnitOfMeasure)
  • FXRate(‘CurrencyPair’)

The first one converts between units of measure. The second one is a simplified FX rate. We can interpret it as an FX Spot Rate between two currencies. While FX Spot Rates are meaningless without a date component, for the purposes of this post, we will keep it simple and always assume ‘latest known FX Rate.

Our first step is to get a basic calculator working. To do that we’ll need to be able to handle the different inputs in the following table.

Input

Result

Comments

2 2 Number identity should work.
2 * 2 4 Standard math operators (addition, multiplication, subtraction and division)
(2 + 2) * 4 16 Parenthesis priority should be respected.

Extending the base visitor

In our previous post we did all the plumbing necessary to generate some code that we can work with. This generated code is added automatically into our project thanks to the new C# project file (*.csproj) format.

As a result of the plumbing work, we’re telling the ANTLR runtime to generate a visitor. This visitor gives us some classes we can inherit from, and add some logic to, to get the desired result. The generated code will give us an expression tree, as well as the necessary tools to perform custom actions each time we visit a node in the tree. Nodes can be a recognized character, an operator, or an expression.

Our main goal is to calculate the result of our expression. To do this, we first need to inherit from the MyGrammarBaseVisitor class, which has a generic type parameter that defines our expression’s result type.

To start, the visitor that will inherit from our base class will—for now—override the basic mathematical operations that we defined in our grammar file. These are ‘Number’, ‘Parens’ (parentheses), and the operators ‘ADD’, ‘SUB’, ‘MUL’, and ‘DIV’, using decimal as a return type.

We override the operations defined in our grammar labeled with the # symbol (these are for convenience as they let us know which method names we need to override). The overridden methods cover all the operations we need to implement the calculator.

Context

Each overridden method is handed a context variable whose type is specific to the method we’re overriding. This is extremely helpful because within the method we define what steps need taking via code, and which data to perform the steps on is provided by the parameter.

It’s time to implement the logic for each of these methods. But how? Let’s go back to our grammar definition to remind ourselves:

Let’s start simple, we’ll first implement VisitNum. Because #num is just ‘number’ all we need to do is get the text content of our context parameter and parse it to decimal.

Easy peasy…let’s move on to VisitParens.

In the grammar, parens is defined as such:

We have to visit the expression contained within the parenthesis, so we simply call Visit on it:

Note above how expr() is a method. This is because we’ve recursively defined it: we need to treat is as a new expression, which is a tree itself, so we have to make sure we’re traversing the entire tree. The method will then be dispatched into whatever Visit override is appropriate.

The last two operators are similar, so we’ll do them at the same time.

As we can see from the grammar, a math operator has two expressions, one on each side. We need to first visit both the left and right sides to get our operands. Once we have these two values, we check what operator we’re using and apply the proper calculation.

Note that access to the left– and right-hand sides is indexed-based, where 0 is left, and 1 is right. It’s also worth noting how we need to check the operation’s Type in order to know what operation we want to perform. This is all part of the generated code, and you can just browse to it in your IDE.

Let’s parse a formula!

Now we need something that goes from an input string to a parsed visitor and ultimately a decimal. For that we need to construct an AntlrInputStream object, a lexer, and a TokenStream. This is all part of the plumbing that needs to happen before we can create an actual MyGrammarParser and send that expression tree through to our visitor. I’ve created a class called MyGrammarExpressionEvaluator to do this part. The code is as follows:

Now we can write some unit tests that will ensure we’re on track. I’ve used NUnit, with the TestCase attribute to reuse code and make sure that the basic math operators are doing what we expect them to do.

Also note that we’re testing expressions that contain parentheses:

So far, so good; we’ve got basic calculation working! Good times! No, really, now the plot thickens. I’ve purposefully only implemented these operations, as the other two supported expressions—fxRateFunc and uomConvertFunc—require some more work before we can use them as input. See Part 3 (coming soon!) for the details on UomConvert!

Link to the Github Repo for this entry:

https://github.com/AdaptiveConsulting/Blog/tree/master/AntlrExpressions/Part%202/ExpressionParser

Author:

 

 

 

 

 

 

 


Carlos Fernandez

Senior Software Engineer, Adaptive Barcelona