ANTLR4 and expression parsing (Part 3)

Using ANTLR to parse and calculate expressions (Part 2)

Introduction

We ended part 2 with a working calculator, although it was basic. Given a simple mathematical expression we can get a result back. The goal of this post is to get the UomConvert expression working.

Implementing UomConvert

Let’s remind ourselves what UomConvert looks like (refer back to the grammar file). We want to parse something like the following example:

2 * UomConvert(MT,Kg)

The result of the above expression should be 2 multiplied by the conversion factor of kilograms from metric tons. This one is simple enough: 1MT = 1000 Kg. So the conversion factor from kilograms to metric tons is 1000 and our final result should be 2000.

We can use this for our tests after the implementation is complete.

Let's begin.

Returning a decimal is not enough

The first step is simple and it is the same as in part 2: we have to override the VisitUomConvertFunc method from our base class. We can extract the from and to units of measurement from the context, but then what? How do we get a decimal value?

Returning a decimal is no longer an option with the current implementation because we don’t have that information:

public override decimal VisitUomConvertFunc(MyGrammarParser.UomConvertFuncContext context)
{
    var fromUom = context.fromUomCode().GetText();
    var toUomCode = context.toUomCode().GetText();

    return 0;// ???
}

Another way to think about this is: we have a dependency on some external information that needs to be retrieved. All we have is a term with the symbols of the units we want to convert from and to: “UomConvert(MT,Kg)”. The actual value of this conversion rate comes from somewhere else.

Let’s get Func-y?

First, let’s change the result type from decimal to something that will give us a bit more flexibility. This is where we can work with C#’s Func type and we can have someone else giving us the missing data once we have it.

In the end, what we have is a calculation. The calculation is composed of terms and, to each term, we need to assign a decimal value. This rule will apply to all our expressions since we are, after all, a calculator. This pattern will repeat again when we implement FXRate: we will need to go and fetch the rates from somewhere.

Let’s have our visitor keep track of the unique set of terms that it has visited (those it can’t immediately assign a decimal value to!) and expose them so that we can give them a value later.

This is how we’ll represent a term:

public class ExpressionTerm
{
    public ExpressionTerm(string termId)
    {
        TermId = termId;
    }

    public string TermId { get; }
    public decimal Value { get; private set; }
    
    public void SetValue(decimal value) => Value = value;
}

The TermId in this case can be a string that is either a currency pair or a FromTo unit of measure conversion. Note that this initial representation is not ideal. We will come back to it soon, as it would be better to know exactly what we’re dealing with in each specific term.

Our visitor will now have the following fields/properties:

private readonly List<ExpressionTerm> _terms = new List<ExpressionTerm>();
public IReadOnlyList<ExpressionTerm> Terms => _terms.AsReadOnly();

Our new return type is then a Func that, given a list of terms with decimal values assigned to them, will result in a decimal (i.e. the result of our calculation). After applying the changes to all the overloads it looks like:

public class MyGrammarVisitor : MyGrammarBaseVisitor<Func<IReadOnlyList<ExpressionTerm>, decimal>>
{
    private readonly List<ExpressionTerm> _terms = new List<ExpressionTerm>();
    public IReadOnlyList<ExpressionTerm> Terms => _terms.AsReadOnly();
    
    // Math operators.
    // Ignore any input for these functions. They always return a number, we can just work with them.
    public override Func<IReadOnlyList<ExpressionTerm>, decimal> VisitNum(MyGrammarParser.NumContext context)
    {
        // Ignore any input here. As this is a number, we can just return it.
        return _ => decimal.Parse(context.GetText());
    }

    public override Func<IReadOnlyList<ExpressionTerm>, decimal> VisitMulDiv(MyGrammarParser.MulDivContext context)
    {
        var left = Visit(context.expr(0));
        var right = Visit(context.expr(1));
        // The _ variables means we don’t care about their value. 
        if (context.op.Type == MyGrammarParser.MUL) return _ => left(_) * right(_);
        return _ => left(_) / right(_);
    }

    public override Func<IReadOnlyList<ExpressionTerm>, decimal> VisitAddSub(MyGrammarParser.AddSubContext context)
    {
        var left = Visit(context.expr(0));
        var right = Visit(context.expr(1));

        if (context.op.Type == MyGrammarParser.ADD) return x => left(_) + right(_);
        return _ => left(_) - right(_);
    }

    public override Func<IReadOnlyList<ExpressionTerm>, decimal> VisitUomConvertFunc(MyGrammarParser.UomConvertFuncContext context)
    {
        var fromUom = context.fromUomCode().GetText();
        var toUom = context.toUomCode().GetText();
        var fromToUomConversionCode = $"{fromUom}{toUom}";
        // Todo: Come back to this ID.
        _terms.Add(new ExpressionTerm(fromToUomConversionCode)); 
        return x => x.First(t => t.TermId == fromToUomConversionCode).Value;
    }

    public override Func<IReadOnlyList<ExpressionTerm>, decimal> VisitParens(MyGrammarParser.ParensContext context)
    {
        // If we have a parenthesis, there's an expression inside
        // (that can be any of the allowed operators), so we visit that. 
        return Visit(context.expr());
    }
}

Let’s go through the VisitUomConvertFunc:

  1. We extract the fromUom and toUom codes.
  2. We construct a new ExpressionTerm and add it to our internal collection. We return a function that takes a “hydrated” list of terms. By “hydrated” we mean that the caller has already gotten all the necessary information and used the SetValue method on our terms. This is probably a good time to mention that there is currently no error handling and we just assume it’s there, but let’s keep things simple for now.

Coming back to our ExpressionTerm implementation, remember when we said “we will come back to it”? We were referring to our ExpressionTerm. Let’s improve it a bit.

The problem is that ExpressionTerm is too generic. Right now we are only implementing VisitUomConvertFunc, but we will also need to implement FxRateFunc (in a future blog post). How do we get around having different types of terms and different ways of assigning values to those terms before getting a result from our formula? Let’s crack the visitor pattern up to eleven to solve this one.

The visitor strikes again

That’s right, we can just use another visitor! This second visitor will take different implementations of a term, and will know how to get the respective information through its dependencies.

Let’s extract an interface and we’ll turn our ExpressionTerm into an abstract class to handle the common functionality of our terms. That is: they should all have a TermId and a decimal value, and a method to set that value at a later point in time. This is how it looks:

public interface ITerm
{
    string TermId { get; }
    decimal Value { get; }
    void SetValue(decimal value);
}
public abstract class ExpressionTermBase : ITerm
{
    protected ExpressionTermBase(string termId)
    {
        TermId = termId;
    }

    public string TermId { get; }
    public decimal Value { get; private set; }
    
    public void SetValue(decimal value) => Value = value;
}

We will also add a new interface that will know how to accept a “visitor” and will be implemented by our specific terms:

public interface IGrammarTerm : ITerm
{
    void Accept(ITermVisitor visitor);
}

ITermVisitor is a third interface that represents our second visitor:

public interface ITermVisitor
{
    IReadOnlyList<IGrammarTerm> GetAllTerms();
    void Visit(UomConvertTerm uomConversionTerm);
}

ITermVisitor’s implementations will know what to do with each specific term.

Then onto UomConvertTerm:

public class UomConvertTerm : ExpressionTermBase, IGrammarTerm
{
    public UomConvertTerm(string termId, string fromUom, string toUom) : base(termId)
    {
        FromUom = fromUom;
        ToUom = toUom;
    }

    public string FromUom { get; }

    public string ToUom { get; }

    public void Accept(ITermVisitor visitor)
    {
        visitor.Visit(this);
    }
}

This is a bit nicer, UomConvertTerm now has parameters from and to that better represent what we’re dealing with when we come across the UomConvert function.

Let’s implement our term visitor:

public class MyTermVisitor : ITermVisitor
{
    private readonly IUnitConverter _unitConverter = new UnitConverter();
    private List<IGrammarTerm> _terms = new List<IGrammarTerm>();
    public IReadOnlyList<IGrammarTerm> GetAllTerms() => _terms.AsReadOnly();
    
    public void Visit(UomConvertTerm uomConversionTerm)
    {
        var conversionFactor =
            _unitConverter.GetConversionFactor(uomConversionTerm.FromUom, uomConversionTerm.ToUom);
        uomConversionTerm.SetValue(conversionFactor);
        _terms.Add(uomConversionTerm);
    }
}

IUnitConverter knows how to get the conversion factor from our two units of measurement. Since we really don’t want to spend too much time getting into implementation details of this interface, it’s available on the Github page for this blog entry.

All we need to understand is that this is who has our “missing” information: we will call the GetConversionFactor method it exposes to get our decimal back. The visitor will then set the value of our term, basically mutating its state and storing the value.

Last, but not least, we need to update our grammar visitor:

public class MyGrammarExpressionVisitor : MyGrammarBaseVisitor<Func<IReadOnlyList<IGrammarTerm>, decimal>>
{
	private readonly List<IGrammarTerm> _terms = new List<IGrammarTerm>();
	public IReadOnlyList<IGrammarTerm> Terms => _terms.AsReadOnly();
	
	// Math operators.
	// Ignore any input for these. As they always return a number, we can just work with them.
	public override Func<IReadOnlyList<IGrammarTerm>, decimal> VisitNum(MyGrammarParser.NumContext context)
	{
   	 // Ignore any input here. As this is a number, we can just return it.
    	return _ => decimal.Parse(context.GetText());
	}

	public override Func<IReadOnlyList<IGrammarTerm>, decimal> VisitMulDiv(MyGrammarParser.MulDivContext context)
	{
	    var left = Visit(context.expr(0));
    	var right = Visit(context.expr(1));

    	if (context.op.Type == MyGrammarParser.MUL) return x => left(_) * right(_);
    	return _ => left(_) / right(_);
	}

	public override Func<IReadOnlyList<IGrammarTerm>, decimal> VisitAddSub(MyGrammarParser.AddSubContext context)
	{
    	var left = Visit(context.expr(0));
    	var right = Visit(context.expr(1));

    	if (context.op.Type == MyGrammarParser.ADD) return x => left(_) + right(_);
    	return _ => left(_) - right(_);
	}

	public override Func<IReadOnlyList<IGrammarTerm>, decimal> VisitUomConvertFunc(MyGrammarParser.UomConvertFuncContext context)
	{
    	var fromUom = context.fromUomCode().GetText();
    	var toUom = context.toUomCode().GetText();
    	var fromToUomConversionCode = $"{fromUom}{toUom}";
    	// Todo: Come back to this ID.
    	_terms.Add(new UomConvertTerm(fromToUomConversionCode, fromUom, toUom));
    	return x => x.First(t => t.TermId == fromToUomConversionCode).Value;
	}

	public override Func<IReadOnlyList<IGrammarTerm>, decimal> VisitParens(MyGrammarParser.ParensContext context)
	{
    	// If we have a parenthesis, there's an expression inside
    	// (that can be any of the allowed operators), so we visit that.
    	return Visit(context.expr());
	}
}

By this point, you’re probably wondering, “what is going on?”

Let’s see these interactions in a diagram:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Let’s recap what’s happening here:

  1. We’re parsing an input string that represents a formula.
  2. MyGrammarExpressionVisitor exposes two things: a list of terms and a function.
  3. MyTermVisitor takes the collection of terms and has the necessary information to set the values of each term by calling SetValue.
  4. We invoke the function from point 2) with the newly hydrated list of terms and get a result.

As far as the grammar visitor goes, we’ve achieved our goal. Given an expression tree with terms, we return a single decimal result. There is still some work left for client code to be able to use our solution, though.

For starters, let’s go back to MyGrammarExpressionEvaluator, the class from post 2 that’s doing all the parsing. This class needs to expose our terms to the outside world. Currently the grammar visitor keeps that internally.

For the sake of simplicity, we’ll expose the function and list of terms from point 2) using a simple tuple (yay C# 7!)

This is what MyGrammarExpressionEvaluator looks like after the changes:

public static class MyGrammarExpressionEvaluator
{
    public static (Func<IReadOnlyList<IGrammarTerm>,decimal> function, IReadOnlyList<IGrammarTerm> rawTerms) EvaluateExpression(string expression)
    {
        using (var stringReader = new StringReader(expression))
        {
            var inputStream = new AntlrInputStream(stringReader);
            var lexer = new MyGrammarLexer(inputStream);
            var tokens = new CommonTokenStream(lexer);
            var parser = new MyGrammarParser(tokens);
            // Parse the whole thing.
            var expressionTree = parser.expr();
            var visitor = new MyGrammarExpressionVisitor();
            // Now that we have a "parsed expression", let's visit all the nodes.
            var func = visitor.Visit(expressionTree);
            return (func, visitor.Terms);
        }
    }
}

Note that we’re returning both the function and the raw terms that we need to set values to.

An important consequence of this is that it is up to the caller to hydrate the terms and pass the new list of terms to our function. To illustrate, let’s write some tests.

[Test]
public void UomFactorWorks()
{
    ITermVisitor myTermVisitor = new MyTermVisitor();
    const string inputFunction = "2*UomConvert(MT,Kg)";
    
    var (function, rawTerms) = 
        MyGrammarExpressionEvaluator.EvaluateExpression(inputFunction);
    
     // Hydrate each term.
    rawTerms.ToList().ForEach(t => t.Accept(myTermVisitor));
    var hydratedTerms = myTermVisitor.GetAllTerms();
    
    var result = function.Invoke(hydratedTerms);
    
    Assert.That(result, Is.EqualTo(2000m));
}

Let’s break down what happens in this test:

  1. Construct a term visitor (MyTermVisitor). Remember that, internally, this has a dependency on an IUnitConverter implementation that knows how to convert between units of measure.
  2. Evaluate the expression.
  3. For each resulting term, run it through our term visitor, hydrating it. This effectively replaces the string “UomConvert(MT,Kg)” of the formula with the decimal value “1000” (remember that 1 MT = 1000kg).
  4. Invoke the function returned by the grammar expression evaluator with the hydrated list of terms, thus calculating 2×1000.
  5. Compare results to expected.

Phew! That worked! We’ve successfully parsed the custom input string containing a mathematical expression and the custom term UomConvert within a formula. We first grab the rate of UomConvert and finally produce a result.
Github Repo for this entry

 

Carlos Rodriguez Picture

Carlos Fernandez

Senior Software Engineer,
Adaptive Financial Consulting



×

Contact us

    By pressing "Send" I agree that I am happy to be contacted by Adaptive Financial Consulting. I can unsubscribe at any time. You can read more about our privacy policy here.