ANTLR4 and expression parsing (Part 3)

For Part 2 of this exercise, click here

Introduction

We ended part 2 with a working calculator, although it was basic. Given a simple mathematical expression we can get a result back. The goal of this post is to get the UomConvert expression workingWe will also cover FxSpot calculations in a future blog post.

Implementing UomConvert

Let’s remind ourselves what UomConvert looks like (refer back to the grammar file). We want to parse something like the following example:

2 * UomConvert(MT,Kg)

The result of the above expression should be 2 multiplied by the conversion factor of kilograms from metric tons. This one is simple enough: 1MT = 1000 Kg. So the conversion factor from kilograms to metric tons is 1000 and our final result should be 2000.  

We can use this for our tests after the implementation is complete.

Let’s begin.

Returning a decimal is not enough

The first step is simple and it is the same as in part 2: we have to override the VisitUomConvertFunc method from our base class. We can extract the from and to units of measurement from the context, but then what? How do we get a decimal value?

Returning a decimal is no longer an option with the current implementation because we don’t have that information:

Another way to think about this is: we have a dependency on some external information that needs to be retrieved. All we have is a term with the symbols of the units we want to convert from and to: “UomConvert(MT,Kg)”. The actual value of this conversion rate comes from somewhere else.

Let’s get Func-y?

First, let’s change the result type from decimal to something that will give us a bit more flexibility. This is where we can work with C#’s Func type and we can have someone else giving us the missing data once we have it.

In the end, what we have is a calculation. The calculation is composed of terms and, to each term, we need to assign a decimal value. This rule will apply to all our expressions since we are, after all, a calculator. This pattern will repeat again when we implement FXRate: we will need to go and fetch the rates from somewhere.

Let’s have our visitor keep track of the unique set of terms that it has visited (those it can’t immediately assign a decimal value to!) and expose them so that we can give them a value later.

This is how we’ll represent a term:

The TermId in this case can be a string that is either a currency pair or a FromTo unit of measure conversion. Note that this initial representation is not ideal. We will come back to it soon, as it would be better to know exactly what we’re dealing with in each specific term.

Our visitor will now have the following fields/properties:

Our new return type is then a Func that, given a list of terms with decimal values assigned to them, will result in a decimal (i.e. the result of our calculation). After applying the changes to all the overloads it looks like:

Let’s go through the VisitUomConvertFunc:

  1. We extract the fromUom and toUom codes.
  2. We construct a new ExpressionTerm and add it to our internal collection. We return a function that takes a “hydrated” list of terms. By “hydrated” we mean that the caller has  already gotten all the necessary information and used the SetValue method on our terms. This is probably a good time to mention that there is currently no error handling and we just assume it’s there, but let’s keep things simple for now.

Coming back to our ExpressionTerm implementation, remember when we said “we will come back to it”? We were referring to our ExpressionTerm. Let’s improve it a bit.

The problem is that ExpressionTerm is too generic. Right now we are only implementing VisitUomConvertFunc, but we will also need to implement FxRateFunc (in a future blog post). How do we get around having different types of terms and different ways of assigning values to those terms before getting a result from our formula? Let’s crack the visitor pattern up to eleven to solve this one.

The visitor strikes again

That’s right, we can just use another visitor! This second visitor will take different implementations of a term, and will know how to get the respective information through its dependencies.

Let’s extract an interface and we’ll turn our ExpressionTerm into an abstract class to handle the common functionality of our terms. That is: they should all have a TermId and a decimal value, and a method to set that value at a later point in time. This is how it looks:

We will also add a new interface that will know how to accept a “visitor” and will be implemented by our specific terms:

ITermVisitor is a third interface that represents our second visitor:

ITermVisitor’s implementations will know what to do with each specific term.

Then onto UomConvertTerm:

This is a bit nicer, UomConvertTerm now has parameters from and to that better represent what we’re dealing with when we come across the UomConvert function.

Let’s implement our term visitor:

IUnitConverter knows how to get the conversion factor from our two units of measurement. Since we really don’t want to spend too much time getting into implementation details of this interface, it’s available on the Github page for this blog entry.

All we need to understand is that this is who has our “missing” information: we will call the GetConversionFactor method it exposes to get our decimal back. The visitor will then set the value of our term, basically mutating its state and storing the value.

Last, but not least, we need to update our grammar visitor:

By this point, you’re probably wondering, “what is going on?”

Let’s see these interactions in a diagram:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Let’s recap what’s happening here:

  1. We’re parsing an input string  that represents a formula.
  2. MyGrammarExpressionVisitor  exposes two things: a list of terms and a function.
  3. MyTermVisitor takes the collection of terms and has the necessary information to set the values of each term by calling SetValue.
  4. We invoke the function from point 2) with the newly hydrated list of terms and get a result.

As far as the grammar visitor goes, we’ve achieved our goal. Given an expression tree with terms, we return a single decimal result. There is still some work left for client code to be able to use our solution, though.

For starters, let’s go back to MyGrammarExpressionEvaluator, the class from post 2 that’s doing all the parsing. This class needs to expose our terms to the outside world. Currently the grammar visitor keeps that internally.

For the sake of simplicity, we’ll expose the function and list of terms from point 2) using a simple tuple (yay C# 7!)

This is what MyGrammarExpressionEvaluator looks like after the changes:

Note that we’re returning both the function and the raw terms that we need to set values to.

An important consequence of this is that it is up to the caller to hydrate the terms and pass the new list of terms to our function. To illustrate, let’s write some tests.

Let’s break down what happens in this test:

  1. Construct a term visitor (MyTermVisitor). Remember that, internally, this has a dependency on an IUnitConverter implementation that knows how to convert between units of measure.
  2. Evaluate the expression.
  3. For each resulting term, run it through our term visitor, hydrating it. This effectively replaces the string “UomConvert(MT,Kg)” of the formula with the decimal value “1000” (remember that 1 MT = 1000kg).
  4. Invoke the function returned by the grammar expression evaluator with the hydrated list of terms, thus calculating 2×1000.
  5. Compare results to expected.

Phew! That worked! We’ve successfully parsed the custom input string containing a mathematical expression and the custom term UomConvert within a formula. We first grab the rate of UomConvert and finally produce a result.

This is it for part 3. Next time, we will implement the FXRate converter function. The interesting part will be using an Observable stream of prices to recalculate our prices all the time. At each tick of our ever-changing FX rates, we will get the latest, real-time, results of our custom formula.

Check out the code on  GitHub

Author:

Carlos Fernandez

Senior Software Engineer, Adaptive Barcelona