10th December 2013

Simple Binary Encoding, a new ultra-fast marshalling API in C++, Java and .NET

In this post we will be talking about Simple Binary Encoder, aka SBE. Martin Thompson – ex CTO at LMAX, now Real Logic – and Todd Montgomery – ex 29WEST CTO, now Informatica – have been working on a reference implementation for Simple Binary Encoding, a new marshalling standard for low/ultra low latency FIX. My colleagues and I at Adaptive have been porting their Java and C++ APIs to .NET.

Why?

You are probably wondering “why are those guys reinventing the wheel? serialization is a solved problem…”

In many cases, I would say you are right, but when it comes to low latency, and/or high throughput, systems this can become a limiting factor. Some time ago I built an FX price distribution engine for a major financial institution and managed to get reasonable latencies at 5x throughput of normal market conditions: sub-millisecond between reading a price tick from the wire, processing it, and then publishing, at the 99th percentile. We had to carefully design such a system. One of the major limiting factors was memory pressure causing stop-the-world GCs with resulting latency spikes. To limit GCs you have to limit allocation rates. We reached a point where the main source of allocation was our serialization API and there was not much we could do about it, without major rework. We were using Google Protocol buffers on this particular project.

SBE would have pushed the limit of this system further and offered more predictable latencies.

How it all started

I worked with Martin some time ago and ported the Disruptor to .NET. It was a good project and I had lot of fun, so when he told me a few weeks ago that he was working on some cool stuff with Todd, I immediately offered to join efforts and build a C# version of SBE.

What is SBE?

SBE is at the same time the name of a standard and the API – to make sure you guys get confused 😉

The standard is defined by the FIX community, or more accurately within this community, it is the “High Performance Work Group”, and it specifies:

what the schema describing your messages should look like, kind of a schema for schema;
how messages should be encoded on the wire;
how the API implementing the spec should behave.

As I said SBE is going to be used in FIX for low latency use case in finance but it’s absolutely NOT limited to FIX.

One can define schemas for completely different problem space and use SBE as a general purpose marshalling mechanism.

The toolchain

Martin and Todd were asked to create a reference implementation for this spec in Java and C++.

If you are familiar with Google Protocol Buffers the overall process to define your messages is very similar:

define your schema (XML based, as specified by the standard),
use SbeTool (a java based utility) to generate encoders and decoders for your messages in Java, C++, and/or C#,
use the generated stubs for the SBE API in your app.

Messages

You should find everything you would expect for modelling in the message format:

all sizes of signed and unsigned integers as primitive types (+ float and double),
composite structures, reusable across messages,
enums and bitsets,
repeating blocks, which can be nested,
variable length fields for strings and byte[]

The type system is portable. You can, for instance, use the Java API to decode messages created by the .NET API (or any combination).

There is also support for optional fields and versioning for backward compatibility, but note that this is still a work in progress, at the specification and API levels.

You said it’s fast?

It’s pretty damn fast… We have written several benchmarks with different messages shapes ranging from a car model to a market data tick and used Google Protocol Buffers as a benchmark. Why Protocol Buffers? Because it is available in all languages we were targeting and it is widely used and adopted. We will likely add benchmarks with other APIs later.

You can have a look and run those benchmarks on your own hardware. This is what I am seeing on my laptop:

20-40 million messages encoded or decoded on a single thread, per second
that’s 30-40ns average latency per message. Let me say it again… 30 to 40 NANO seconds.
depending on the benchmarks we see a 20 to 50 times throughput increase compared to Google Protocol Buffers.

Also, it is worth mentioning that the API does not allocate during encoding and decoding, which means no GC pressure and no resultant GC pauses on this thread and others.

This is a typical run of the .NET implementation, on my 3-year-old laptop.

Note that Java and C++ implementations run faster, we still have work to do!

Note: we try to make our benchmark as fair as possible, if you think you could get better performance out of Google Protocol Buffers using a different version of the API or different implementation of the benchmark, please let us know!

What’s coming next?

SBE is only in Beta phase and there is still lots to do but we think it is in a state where you can start giving it a spin: we would love to get your view and feedback, good or bad.

Feel free to post requests or questions in the issue tracker on GitHub

Fork it! Clone it!

You will find the main repository here.

Visit the wiki for the documentation.

If you are after the .NET implementation specifically, you can look at our fork.

NuGet package for .NET users

We published a NuGet package for .NET users, it contains:

SBE.dll, the generated code depends on it.
SbeTool.jar, to generate your encoders and decoders.
a small sample, to quickly get you up and running.

From Visual Studio, search for “SBE” for pre-release modules (we are still in beta!)